This is a script developed by UW Capstone Student Linzheng Zhang in Spring 2025, specifically developed for the March-April 2023 event

Some communication: 

Ryan: 
I'm curious if you could easily run your wildfire polygon identification algorithm for the time period March 25-April 5, 2023 (or point me to the relevant code you have and I'll touch it up to analyze that event)? The most usable output would be a geopandas dataframe that has columns: [datetime, geometry (of the identified polygon)]
I believe you have been working with the MODIS data, right? It might be good to include the number of data points that contribute to the polygon and some sort of median or average of their 'confidence' so we can know how reliable those data are
To be consistent with the other data, 15-minute time resolution would be excellent

Linzheng Zhang:
I’ll focus on identifying wildfire polygons for the specified time period and review the results. It appears the output already includes those columns.
Yes, this is MODIS data. (It might be helpful to include the number of data points contributing to each polygon, along with a median or average confidence level, to better assess data reliability.) That aligns with my thinking last quarter. I previously believed that a higher number of data points and greater confidence levels would strengthen the identification of true correlations.
Thank you for providing the 15-minute time window.


In [None]:
# Modify the code to calculate and add a new parameter to result_rows representing the total area of the wildfire cluster, taking into account cases where the wildfire is represented by a single point.
grouped = gdf_modis_wildfire_mar_apr_2023.groupby(gdf_modis_wildfire_mar_apr_2023['datetime'].dt.floor('H'))
result_rows = []
for datetime_value, group in grouped:
    points = group['geometry'].tolist()
    clustered_points = []
    # Track visited points
    visited = set()
    for i, point in enumerate(points):
        if i in visited:
            continue
        # Start a cluster
        cluster = [point]
        cluster_indices = [i]  # Track indices of points in this cluster
        visited.add(i)
        for j, other_point in enumerate(points):
            if j not in visited and point.distance(other_point) <= 2 :  # and point.distance(other_point) <= .1:
                cluster.append(other_point)
                cluster_indices.append(j)
                visited.add(j)
        # Create a polygon from the clustered points
        polygon = MultiPoint(cluster).convex_hull
        # Calculate aggregate values for the cluster
        cluster_data = group.iloc[cluster_indices]
        # Calculate total area of the polygon
        if polygon.geom_type == 'Polygon':
            total_area = polygon.area  # Area of the polygon in square degrees (adjust units if needed)
        elif polygon.geom_type == 'Point':
            # If a single point, use a default area (e.g., area of a circle with a radius of 0.01 degrees)
            default_radius = 0.01
            total_area = np.pi * (default_radius ** 2)
        else:
            total_area = 0  # Handle other geometry types if necessary
        # Calculate statistics for numerical attributes
        sum_frp = cluster_data['FRP'].sum()
        mean_frp = cluster_data['FRP'].mean()
        max_frp = cluster_data['FRP'].max()
        mean_confidence = cluster_data['CONFIDENCE'].mean()
        max_confidence = cluster_data['CONFIDENCE'].max()
        # For TYPE, get the most common value (mode)
        most_common_type = cluster_data['TYPE'].mode().iloc[0] if not cluster_data['TYPE'].empty else None
        # Count points in cluster (fire size indicator)
        point_count = len(cluster)
        result_rows.append({
            'datetime': datetime_value,
            'geometry': polygon,
            'point_count': point_count,
            'total_area': total_area,
            'sum_FRP': sum_frp,
            'mean_FRP': mean_frp,
            'max_FRP': max_frp,
            'mean_CONFIDENCE': mean_confidence,
            'max_CONFIDENCE': max_confidence,
            'TYPE': most_common_type
        })
# Create a new GeoDataFrame from the result with all the added attributes
gdf_modis_wildfire_result = gpd.GeoDataFrame(
    result_rows,
    columns=['datetime', 'geometry', 'point_count', 'total_area', 'sum_FRP', 'mean_FRP', 'max_FRP',
             'mean_CONFIDENCE', 'max_CONFIDENCE', 
             'TYPE'],
    crs="EPSG:4326"
)
gdf_modis_wildfire_result

In [None]:
# Plus counting the total_area
radius = 0.2  # Example radius in degrees (adjust based on your CRS)
candidates = []
for index_w, row_w in gdf_modis_wildfire_result.iterrows():
    # Identify overlap in time with the power grid disturbance
    wildfire_datetime = row_w.datetime
    filtered_outage_data_gdf = outage_data_gdf[
        (outage_data_gdf['outage_start_time'] >= wildfire_datetime - pd.Timedelta(hours=1)) &
        (outage_data_gdf['outage_start_time'] <= wildfire_datetime + pd.Timedelta(days=1))
    ]
    # Identify overlap in space with the power grid disturbance
    geom = row_w.geometry
    # Check if the geometry is a Point
    if geom.geom_type == 'Point':
        # Create a buffer (circular area) around the point
        buffered_wildfire_area = geom.buffer(radius)
        disp_geom = geom.coords[0]
    # Check if the geometry is a Polygon
    elif geom.geom_type == 'Polygon':
        buffered_wildfire_area = geom
        disp_geom = geom.exterior.coords[0]
    for f in range(len(filtered_outage_data_gdf)):
        if buffered_wildfire_area.intersects(filtered_outage_data_gdf['geometry'].iloc[f]):
            # Print information about the match
            print('candidate at {}! \n wildfire: \n \t {} \n outage: \n \t {} for {}'.format(
                wildfire_datetime,
                disp_geom,
                filtered_outage_data_gdf['outage_start_time'].iloc[f],
                filtered_outage_data_gdf['area affected'].iloc[f],
            ))
            # Save detailed information from both wildfire and outage
            candidates.append({
                # Original wildfire information
                'wildfire_datetime': wildfire_datetime,
                'wildfire_geometry': geom,
                # total_area
                'wildfire_total_area': row_w.total_area,
                # New wildfire metrics
                'wildfire_point_count': row_w.point_count,  # Size indicator
                'wildfire_sum_FRP': row_w.sum_FRP,          # Total fire radiative power
                'wildfire_mean_FRP': row_w.mean_FRP,        # Average intensity
                'wildfire_max_FRP': row_w.max_FRP,          # Peak intensity
                'wildfire_mean_CONFIDENCE': row_w.mean_CONFIDENCE,
                'wildfire_max_CONFIDENCE': row_w.max_CONFIDENCE,
                'wildfire_TYPE': row_w.TYPE,
                # Outage information
                'outage_start_time': filtered_outage_data_gdf['outage_start_time'].iloc[f],
                'outage_stop_time': filtered_outage_data_gdf['outage_stop_time'].iloc[f],
                'customers_affected': filtered_outage_data_gdf['customers_affected'].iloc[f],
                'outage_geometry': filtered_outage_data_gdf['geometry'].iloc[f],
                'area_affected': filtered_outage_data_gdf['area affected'].iloc[f],
            })
            break
    continue
# Convert candidates to a DataFrame for further analysis
candidates_df = pd.DataFrame(candidates)
candidates_df

The following three pieces of data are screened out by me according to your requirements. For this method, I mainly use hours as the condition for grouping related wildfires first. At the same time, point.distance(other_point) <= 2 is set as the distance limit for wildfire points in the cluster. For isolated wildfire points, I manually set a circle with the default radius as the condition for calculating the area.

In [None]:
from IPython.display import Image
Image("img/picture.png")

![title](img/picture.png)