#### Introduction

The purpose of this notebook, along with 01_data_setup_example.ipynb and 02_run_example.ipynb is to provide a tutorial of how you may want to use the pop_exp pacakge functions.

Please see 01_data_setup_example.ipynb and 02_run_example.ipynb before you work through this notebook!

This notebook is going to explore the results returned by functions in PopExp that were run the previous section of the tutorial. 

In the previous section, we found the number of people affected by any US wildfire disaster in 2016, 2017, and 2018, as well as the number of people affected by any wildfire disaster by ZCTA and each wildfire disaster by ZCTA.

In this section we'll explore the results from each of these function runs.

The first function run helped us find the total number of people affected by any wildfire disaster in 2016, 2017, and 2018. To use these results, we'll first read in and plot the original wildfire disaster dataset, and then read in the results and calculate the total number of people affected by any wildfire disaster.

We'll start by loading libraries and reading in necessary data. 

In [None]:
import geopandas as gpd 
import pandas as pd
import pathlib
import sys
import matplotlib.pyplot as plt
import glob

import matplotlib.cm as cm
import matplotlib.colors as mcolors
from mpl_toolkits.axes_grid1.inset_locator import inset_axes
from matplotlib.patches import Circle

We'll read in ZCTA data since the last three PopEx function runs involved ZCTAs, and we'll plot the wildfire disaster data over the California ZCTAs. We ran these functions to calculate national numbers, but we'll plot our exposure and results in California since it's smaller and we can see what's going on a bit better, since we're demonstrating how the functions work in this tutorial.

In [None]:
# Define the base path and data directory
base_path = pathlib.Path.cwd().parent
data_dir = base_path / "demo_data"

# Read the raw ZCTA data
zctas = gpd.read_file(data_dir / "01_raw_data" / "tl_2020_us_zcta520" / "tl_2020_us_zcta520.shp")

# Filter ZCTAs for California ZIP codes (90xxx to 96xxx)
zctas_ca = zctas[zctas['GEOID20'].str[:3].astype(int).between(900, 961)]

In [None]:
# Read in raw wildfire dataset
fires = gpd.read_file(data_dir / "01_raw_data"/ "wildfires_conus.geojson")

# Filter to wildfires in California that occurred between 2016 and 2018 (inclusive)
fires_ca = fires[(fires['wildfire_states'].str.contains('CA')) & 
                 (fires['wildfire_year'] >= 2016) & 
                 (fires['wildfire_year'] <= 2018)]

# transform everything to best crs for plotting california
teale_albers_crs = "EPSG:3310"

zctas_ca = zctas_ca.to_crs(teale_albers_crs)
fires_ca = fires_ca.to_crs(teale_albers_crs)

First, again just to get an idea of our exposure that we used in the first four function runs, we'll plot all the wildfire disasters in 2016-2018 on the same plot overlayed on ZCTAs. 

In [None]:
# plot the fires overlayed onto ZCTA boundaries
# Plot the ZCTA boundaries first
fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.boundary.plot(ax=ax, linewidth=0.5, edgecolor='black', zorder=1)

# Overlay the fire geometries with fill color
fires_ca.plot(ax=ax, color='red', alpha=0.5, edgecolor='red', zorder=2)

# Set plot title and labels
ax.set_title('Wildfire disaster boundaries in CA 2016-2018 on CA ZCTAs')
ax.set_axis_off()

output_path = data_dir / "03_results" / "wildfire_zcta_plot.pdf"
plt.savefig(output_path, format='pdf', bbox_inches='tight')

plt.show()

Nice, ok. Now in the first run of find_num_people_affected, we wanted to know the total people residing within 10 km of any wildfire disaster in the US in each of the years 2016-2018. Let's read in the results.
If a user ran the function this way, they'd probably be most likely interested in the total number of people affected by wildfire disasters in each year, so to explore the results of this run, we'll calculate that. We'll sum over the hazard IDs. In the results, the hazard IDs might not be unique and might be concatenated for overlapping hazards because we didn't calculate a total number of people affected by unique hazard. 

In [None]:
# read output
tot_af_any_wf = pd.read_parquet(data_dir / "03_results" / "num_people_affected_by_wildfire.parquet")
tot_af_any_wf.head()

# group by year, and sum over number of people affected
tot_af_any_wf_grouped = tot_af_any_wf.groupby('year')['num_people_affected'].sum().reset_index()
# maybe we want to round the output
tot_af_any_wf_grouped['num_people_affected'] = tot_af_any_wf_grouped['num_people_affected'].round()
tot_af_any_wf_grouped.head()

That's it, we got the results we wanted for the first run. 

Moving on to the second run. 

In this run we calculated the total number of people residing within 10km of each unique disaster in each year. Someone might have used this function if they wanted to identify the 5 wildfire disasters in each year that were close to the largest residential population, so let's find those disasters and plot them. 

In [None]:
# now we want to read in the output from the function that gave us our 
# denominator data
tot_unique_wf = pd.read_parquet(data_dir / "03_results" / "num_aff_by_unique_wildfire.parquet")
tot_unique_wf.head()

In [None]:
# want to group this data by year and find the 5 largest wildfires in each year
high_impact_wfs = tot_unique_wf.groupby('year').apply(lambda x: x.nlargest(5, 'num_people_affected')).reset_index(drop=True)
high_impact_wfs.head()

In [None]:
# prepare the wildfire data to join the top 5 disasters to, so we can get the geographic locations
fires_ca = fires_ca.rename(columns={'wildfire_id': 'ID_climate_hazard'})
fires_ca = fires_ca[['ID_climate_hazard', 'geometry']].copy()
# and join to get the geographic locations
high_impact_wfs = high_impact_wfs.merge(fires_ca, on='ID_climate_hazard', how='left')


Now let's plot these most impactful disasters on top of the Califonia ZCTAs, so we can see where they are. Let's add some circles proportional to the number of people that were residing within 10 km of each wildfire disaster.

In [None]:
# get the coordinates of each disaster
high_impact_wfs = high_impact_wfs.set_geometry('geometry')
high_impact_wfs = high_impact_wfs.to_crs(epsg=3310)
high_impact_wfs['latitude'] = high_impact_wfs['geometry'].centroid.y
high_impact_wfs['longitude'] = high_impact_wfs['geometry'].centroid.x

# For example, let's assume 'geometry' is the geometry column and 'radius' is the radius attribute
gdf = gpd.GeoDataFrame(high_impact_wfs, geometry=gpd.points_from_xy(high_impact_wfs.longitude, high_impact_wfs.latitude))
gdf['radius'] = high_impact_wfs['num_people_affected'] / 50
gdf['year'] = gdf['year'].astype('category')

In [None]:
colors = ['green', 'red', 'blue']
unique_years = gdf['year'].cat.categories
color_dict = {year: colors[i % len(colors)] for i, year in enumerate(unique_years)}

# Plot the ZCTA boundaries first
fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.boundary.plot(ax=ax, linewidth=0.5, edgecolor='black', zorder=1)

# Plot the wildfire geometries colored by year
for idx, row in gdf.iterrows():
    color = color_dict[row['year']]
    ax.plot(row.geometry.x, row.geometry.y, 'o', color=color, markersize=5, zorder=2)
    circle = Circle((row.geometry.x, row.geometry.y), row.radius, color=color, fill=True, alpha=0.2, zorder=2)
    ax.add_patch(circle)

# Set plot title and labels
ax.set_title('Five fires with largest population residing within 10km of fire boundary, by year 2016-2018')
# set subtitle
ax.text(0.5, 0.99, 'Circle size proportional to number of people affected', horizontalalignment='center', verticalalignment='center', transform=ax.transAxes)
ax.set_axis_off()

# Add a legend for the discrete colormap
handles = [plt.Line2D([0], [0], marker='o', color='w', markerfacecolor=color_dict[year], markersize=10, label=str(year)) for year in unique_years]
ax.legend(handles=handles, title='Year', loc='upper right')

# Create an inset map for LA area with adjusted position
ax_inset = inset_axes(ax, width="30%", height="30%", loc='lower left', bbox_to_anchor=(-0.10, 0.05, 1, 1), bbox_transform=ax.transAxes, borderpad=2)
la_zctas = zctas_ca[zctas_ca['GEOID20'].astype(int).between(90001, 91699)]
zctas_ca.boundary.plot(ax=ax_inset, linewidth=0.5, edgecolor='black')

# Plot the wildfire geometries colored by year in the inset map
for idx, row in gdf.iterrows():
    if row.geometry.within(la_zctas.unary_union):
        color = color_dict[row['year']]
        ax_inset.plot(row.geometry.x, row.geometry.y, 'o', color=color, markersize=5, zorder=2)
        circle = Circle((row.geometry.x, row.geometry.y), row.radius, color=color, fill=True, alpha=0.2, zorder=2)
        ax_inset.add_patch(circle)

# Set the extent of the inset map to the bounds of the LA ZCTAs
xmin, ymin, xmax, ymax = la_zctas.total_bounds
ax_inset.set_xlim(xmin, xmax)
ax_inset.set_ylim(ymin, ymax)

ax_inset.set_title('LA Area')
ax_inset.set_axis_off()

plt.show()

Ok - that's what we wanted from the second run.

Now, let's deal with the results of the third and fourth demonstrations we did, where we ran 'find_num_people_affected_by_geo'. In these cases, we found the number of people who resided within 10 km of any wildfire disaster by year and by ZCTA, and the number of people who resided within 10 km of each wildfire disaster by ZCTA. 

Why would someone want the number of people who resided within 10 km of any wildfire disaster by year and by ZCTA? A researcher may have wanted to find the number of people affected by any wildfire by ZCTA if they wanted to assess wildfire disaster exposure by ZCTA. They might want to know what proportion of people in each ZCTA lived within 10 km of any disaster boundary and were therefore exposed to fire, and then consider a ZCTA exposed if enough of its population was exposed. If we were doing that exposure assessment, we'd probably want to plot the proportion of people exposed to disasters by ZCTA. So, let's use our results to do that. You'll notice that to accomplish that, we also need to use the denominator data that we produced at the end of the last section, where we found the ZCTA-level population. 

Let's start by reading in that denominator data, and plotting the number of people who live in each ZCTA.

In [None]:
# want to read that file in
num_residing_by_zcta = pd.read_parquet(data_dir / "03_results" / "num_people_residing_by_zcta.parquet")
num_residing_by_zcta.head()

In [None]:
# clean zctas for plotting
zctas_ca.rename(columns={"ZCTA5CE20": "ID_spatial_unit"}, inplace=True)
zctas_ca = zctas_ca[["ID_spatial_unit", "geometry"]]
zctas_ca.head()

In [None]:
#num_residing_ca = num_residing_by_zcta[num_residing_by_zcta['ID_spatial_unit'].between(90000, 96100)].copy()
num_residing_ca = num_residing_by_zcta[pd.to_numeric(num_residing_by_zcta['ID_spatial_unit']).between(90000, 96100)].copy()
num_residing_ca.head()

In [None]:
# # select cols ID spatial unit and num_people_affected
num_residing_ca = num_residing_ca[["ID_spatial_unit", "num_people_affected"]]

In [None]:
zctas_ca.head() 
num_residing_ca.head()

In [None]:
# merge to zctas_ca geometry for plotting
zctas_ca = zctas_ca.merge(num_residing_ca, on="ID_spatial_unit", how="left")

In [None]:
la_zctas = zctas_ca[zctas_ca['ID_spatial_unit'].astype(int).between(90000, 91610)]
sf_zctas = zctas_ca[zctas_ca['ID_spatial_unit'].astype(int).between(94000, 94199)]

In [None]:
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.plot(column='num_people_affected', ax=ax, legend=True, cmap='viridis', linewidth=0.1, edgecolor='black')

# Set plot title and labels
ax.set_title('Population by 2020 ZCTA according to GHSL 2020 100m\n resolution gridded population dataset')
ax.set_axis_off()

# Create an inset map for LA area with adjusted position
ax_inset_la = inset_axes(ax, width="30%", height="30%", loc='lower left', bbox_to_anchor=(-0.4, 0.05, 1, 1), bbox_transform=ax.transAxes, borderpad=2)
zctas_ca.plot(column='num_people_affected', ax=ax_inset_la, cmap='viridis', linewidth=0.1, edgecolor='black')

# Set the extent of the inset map to the bounds of the LA ZCTAs
xmin, ymin, xmax, ymax = la_zctas.total_bounds
ax_inset_la.set_xlim(xmin, xmax)
ax_inset_la.set_ylim(ymin, ymax)

ax_inset_la.set_title('LA Area')
ax_inset_la.set_axis_off()

# Create an inset map for SF area with adjusted position
ax_inset_sf = inset_axes(ax, width="30%", height="30%", loc='lower left', bbox_to_anchor=(-0.4, 0.45, 1, 1), bbox_transform=ax.transAxes, borderpad=2)
zctas_ca.plot(column='num_people_affected', ax=ax_inset_sf, cmap='viridis', linewidth=0.1, edgecolor='black')

# Set the extent of the inset map to the bounds of the SF ZCTAs
xmin, ymin, xmax, ymax = sf_zctas.total_bounds
ax_inset_sf.set_xlim(xmin, xmax)
ax_inset_sf.set_ylim(ymin, ymax)

ax_inset_sf.set_title('Bay Area')
ax_inset_sf.set_axis_off()

plt.show()

Ok so those are our denominators, now let's get the number of people affected by any wildfire by ZCTA. 

In [None]:
# finally need to read number of people affected by wildfire 
# read output
wf_by_zcta = pd.read_parquet(data_dir / "03_results" / "num_people_affected_by_wildfire_by_zcta.parquet")
wf_by_zcta.head()