#### Introduction

The purpose of this notebook, along with 01_data_setup_example.ipynb and 02_run_example.ipynb is to provide a tutorial of how you may want to use the pop_exp pacakge functions.

Please see 01_data_setup_example.ipynb and 02_run_example.ipynb before you work through this notebook!

This notebook is going to explore the results returned by functions in PopExp that were run the previous section of the tutorial. 

In the previous section, we found the number of people affected by any US wildfire disaster in 2016, 2017, and 2018, as well as the number of people affected by any wildfire disaster by ZCTA and each wildfire disaster by ZCTA.

In this section we'll explore the results from each of these function runs.

The first function run helped us find the total number of people affected by any wildfire disaster in 2016, 2017, and 2018. To use these results, we'll first read in and plot the original wildfire disaster dataset, and then read in the results and calculate the total number of people affected by any wildfire disaster.

We'll start by loading libraries and reading in necessary data. 

In [None]:
import geopandas as gpd 
import pandas as pd
import pathlib
import sys
import matplotlib.pyplot as plt
import glob

We'll read in ZCTA data since the last three PopEx function runs involved ZCTAs, and we'll plot the wildfire disaster data over the California ZCTAs. We ran these functions to calculate national numbers, but we'll plot our exposure and results in California since it's smaller and we can see what's going on a bit better, since we're demonstrating how the functions work in this tutorial.

In [None]:
# Define the base path and data directory
base_path = pathlib.Path.cwd().parent
data_dir = base_path / "demo_data"

# Read the raw ZCTA data
zctas = gpd.read_file(data_dir / "01_raw_data" / "tl_2020_us_zcta520" / "tl_2020_us_zcta520.shp")

# Filter ZCTAs for California ZIP codes (90xxx to 96xxx)
zctas_ca = zctas[zctas['GEOID20'].str[:3].astype(int).between(900, 961)]

In [None]:
# Read in raw wildfire dataset
fires = gpd.read_file(data_dir / "01_raw_data"/ "wildfires_conus.geojson")

# Filter to wildfires in California that occurred between 2016 and 2018 (inclusive)
fires_ca = fires[(fires['wildfire_states'].str.contains('CA')) & 
                 (fires['wildfire_year'] >= 2016) & 
                 (fires['wildfire_year'] <= 2018)]

# transform zctas to the fire crs
zctas_ca = zctas_ca.to_crs(fires_ca.crs)

First, just to get an idea of our exposure that we used in the first four function runs, we'll plot all the wildfire disasters in 2016-2018 on the same plot overlayed on ZCTAs. 

In [None]:
# plot the fires overlayed onto ZCTA boundaries
# Plot the ZCTA boundaries first
fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.boundary.plot(ax=ax, linewidth=0.5, edgecolor='black', zorder=1)

# Overlay the fire geometries with fill color
fires_ca.plot(ax=ax, color='red', alpha=0.5, edgecolor='red', zorder=2)

# Set plot title and labels
ax.set_title('Wildfire disaster boundaries in CA 2016-2018 on CA ZCTAs')
ax.set_axis_off()
ax.legend()

output_path = data_dir / "03_results" / "wildfire_zcta_plot.pdf"
plt.savefig(output_path, format='pdf', bbox_inches='tight')

plt.show()

Nice, ok. Now in the first run, we wanted to know the total people residing within 10 km of any wildfire disaster in the US in each of the years 2016-2018. Let's read in the results.
If a user ran the function this way, they'd probably be most likely interested in the total number of people affected by wildfire disasters in each year, so for this run, we'll calculate that. We'll sum over the hazard IDs, which might be concatenated because we didn't calculate a total number of people affected by unique hazard. 

In [None]:
# read output
tot_af_any_wf = pd.read_csv(data_dir / "03_results" / "num_people_affected_by_wildfire.csv")
tot_af_any_wf.head()

# group by year, and sum over number of people affected
tot_af_any_wf_grouped = tot_af_any_wf.groupby('year')['num_people_affected'].sum().reset_index()
# maybe we want to round the output
tot_af_any_wf_grouped['num_people_affected'] = tot_af_any_wf_grouped['num_people_affected'].round()
tot_af_any_wf_grouped.head()

That's it for the first run. 

Moving on to the second run. 

In this run we calculated the total number of people residing within 10km of  each unique disaster in each year. Someone might have used this function if they wanted to identify the top 5 disasters in each year affecting the largest population, so let's find those disasters and plot them. 

In [None]:
# now we want to read in the output from the function that gave us our 
# denominator data


In [None]:
# want to read that file in
num_residing_by_zcta = pd.read_csv(data_dir / "03_results" / "num_people_residing_by_zcta.csv")
num_residing_by_zcta.head()
type(num_residing_by_zcta['ID_spatial_unit'][0])

In [None]:
zctas_ca.head()

In [None]:
# read zctas for plotting
zctas_ca.rename(columns={"ZCTA5CE20": "ID_spatial_unit"}, inplace=True)
zctas_ca = zctas_ca[["ID_spatial_unit", "geometry"]]
zctas_ca.head()

In [None]:
num_residing_ca = num_residing_by_zcta[num_residing_by_zcta['ID_spatial_unit'].between(90000, 96100)].copy()
num_residing_ca.head()

In [None]:
# convert id spatial unit to string
num_residing_ca['ID_spatial_unit'] = num_residing_ca['ID_spatial_unit'].astype(str)


In [None]:
# # select cols ID spatial unit and num_people_affected
num_residing_ca = num_residing_ca[["ID_spatial_unit", "num_people_affected"]]

In [None]:
zctas_ca.head() 
num_residing_ca.head()

In [None]:
# merge to zctas_ca geometry for plotting
zctas_ca = zctas_ca.merge(num_residing_ca, on="ID_spatial_unit", how="left")

In [None]:
zctas_ca

In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.plot(column='num_people_affected', ax=ax, legend=True, cmap='viridis', linewidth=0.1, edgecolor='black')

# Set plot title and labels
ax.set_title('Population by 2020 ZCTA according to GHSL 2020 100m resolution gridded population dataset')
ax.set_axis_off()

plt.show()

In [None]:
la_zctas = zctas_ca[zctas_ca['ID_spatial_unit'].astype(int).between(90000, 91610)]
sf_zctas = zctas_ca[zctas_ca['ID_spatial_unit'].astype(int).between(94000, 94199)]

In [None]:
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.plot(column='num_people_affected', ax=ax, legend=True, cmap='viridis', linewidth=0.1, edgecolor='black')

# Set plot title and labels
ax.set_title('Population by 2020 ZCTA according to GHSL 2020 100m resolution gridded population dataset')
ax.set_axis_off()

# Create an inset map for LA area with adjusted position
ax_inset_la = inset_axes(ax, width="30%", height="30%", loc='lower left', bbox_to_anchor=(-0.4, 0.05, 1, 1), bbox_transform=ax.transAxes, borderpad=2)
zctas_ca.plot(column='num_people_affected', ax=ax_inset_la, cmap='viridis', linewidth=0.1, edgecolor='black')

# Set the extent of the inset map to the bounds of the LA ZCTAs
xmin, ymin, xmax, ymax = la_zctas.total_bounds
ax_inset_la.set_xlim(xmin, xmax)
ax_inset_la.set_ylim(ymin, ymax)

ax_inset_la.set_title('LA Area')
ax_inset_la.set_axis_off()

# Create an inset map for SF area with adjusted position
ax_inset_sf = inset_axes(ax, width="30%", height="30%", loc='lower left', bbox_to_anchor=(-0.4, 0.45, 1, 1), bbox_transform=ax.transAxes, borderpad=2)
zctas_ca.plot(column='num_people_affected', ax=ax_inset_sf, cmap='viridis', linewidth=0.1, edgecolor='black')

# Set the extent of the inset map to the bounds of the SF ZCTAs
xmin, ymin, xmax, ymax = sf_zctas.total_bounds
ax_inset_sf.set_xlim(xmin, xmax)
ax_inset_sf.set_ylim(ymin, ymax)

ax_inset_sf.set_title('Bay Area')
ax_inset_sf.set_axis_off()

plt.show()

In [None]:
# finally need to read number of people affected by wildfire 



First, we'll plot the wildfire disasters that did occur on top of ZCTA boundaries, then we'll plot the number of people who resided in each ZCTA, and finally we'll plot the number of people affected by a wildfire disaster by ZCTA.

In [None]:
base_path = pathlib.Path.cwd().parent
data_dir = base_path / "demo_data" 

# read output
wf_by_zcta = pd.read_csv(data_dir / "03_results" / "num_people_affected_by_wildfire.csv")

wf_by_zcta.head()


In [None]:
# read in raw wildfire dataset 
fires = gpd.read_file(data_dir / "01_raw_data"/ "wildfires_conus.geojson")
# fitler to wildfire states contains 'CA'
fires_ca = fires[(fires['wildfire_states'].str.contains('CA')) & 
                 (fires['wildfire_year'] >= 2015) & 
                 (fires['wildfire_year'] <= 2019)]

fires_ca.plot()

In [None]:
zctas = gpd.read_file(data_dir / "01_raw_data" / "tl_2020_us_zcta520" / "tl_2020_us_zcta520.shp")
zctas.head()

In [None]:
# read zctas for plotting
zctas.rename(columns={"ZCTA5CE20": "ID_spatial_unit"}, inplace=True)
zctas = zctas[["ID_spatial_unit", "geometry"]]
zctas.head()



In [None]:
zctas_ca = zctas[zctas['ID_spatial_unit'].str[:3].astype(int).between(900, 961)]

In [None]:
zctas_ca.plot()
zctas_ca.boundary.plot()

In [None]:
# Define the base path and data directory
base_path = pathlib.Path.cwd().parent
data_dir = base_path / "demo_data"

# Read the raw ZCTA data
zctas = gpd.read_file(data_dir / "01_raw_data" / "tl_2020_us_zcta520" / "tl_2020_us_zcta520.shp")

# Filter ZCTAs for California ZIP codes (90xxx to 96xxx)
zctas_ca = zctas[zctas['GEOID20'].str[:3].astype(int).between(900, 961)]

# Read in raw wildfire dataset
fires = gpd.read_file(data_dir / "01_raw_data"/ "wildfires_conus.geojson")

# Filter to wildfires in California that occurred between 2015 and 2019 (inclusive)
fires_ca = fires[(fires['wildfire_states'].str.contains('CA')) & 
                 (fires['wildfire_year'] >= 2015) & 
                 (fires['wildfire_year'] <= 2019)]

zctas_ca = zctas_ca.to_crs(fires_ca.crs)

# Plot the ZCTA boundaries first
fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.boundary.plot(ax=ax, linewidth=0.5, edgecolor='black', zorder=1)

# Overlay the fire geometries with fill color
fires_ca.plot(ax=ax, color='red', alpha=0.5, edgecolor='red', zorder=2)

# Set plot title and labels
ax.set_title('Wildfire disaster boundaries in CA 2016-2018 on CA ZCTAs')
ax.set_axis_off()
ax.legend()

output_path = data_dir / "03_results" / "wildfire_zcta_plot.pdf"
plt.savefig(output_path, format='pdf', bbox_inches='tight')

plt.show()

In [None]:
# want to read that file in
num_residing_by_zcta = pd.read_csv(data_dir / "03_results" / "num_people_residing_by_zcta.csv")
num_residing_by_zcta.head()
type(num_residing_by_zcta['ID_spatial_unit'][0])

# # select cols ID spatial unit and num_people_affected
# num_residing_ca = num_residing_ca[["ID_spatial_unit", "num_people_affected"]]

# # merge to zctas_ca
# zctas_ca = zctas_ca.merge(num_residing_ca, on="ID_spatial_unit", how="left")





In [None]:
zctas_ca.head()

In [None]:
num_residing_ca = num_residing_by_zcta[num_residing_by_zcta['ID_spatial_unit'].between(90000, 96100)]
num_residing_ca.head()
# convert id spatial unit to string
num_residing_ca['ID_spatial_unit'] = num_residing_ca['ID_spatial_unit'].astype(str)

# # select cols ID spatial unit and num_people_affected
num_residing_ca = num_residing_ca[["ID_spatial_unit", "num_people_affected"]]

# # merge to zctas_ca
zctas_ca = zctas_ca.merge(num_residing_ca, on="ID_spatial_unit", how="left")


In [None]:
print(num_residing_by_zcta)

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pathlib
import rasterio
from rasterio.plot import show
from rasterio.mask import mask
from rasterio.warp import calculate_default_transform, reproject, Resampling


In [None]:
# Define the base path and data directory
base_path = pathlib.Path.cwd().parent
data_dir = base_path / "demo_data"

# Read the raw ZCTA data
zctas = gpd.read_file(data_dir / "01_raw_data" / "tl_2020_us_zcta520" / "tl_2020_us_zcta520.shp")

# Filter ZCTAs for California ZIP codes (90xxx to 96xxx)
zctas_ca = zctas[zctas['GEOID20'].str[:3].astype(int).between(900, 961)]


In [None]:
with rasterio.open(raster_path) as src:
    out_image, out_transform = mask(src, zctas_ca.geometry, crop=True)
    out_meta = src.meta.copy()
# Calculate the bounds of the masked raster
left, bottom, right, top = out_transform * (0, 0), out_transform * (out_image.shape[2], out_image.shape[1])

# Reproject the masked raster to the Albers Equal Area CRS
albers_crs = "EPSG:5070"
transform, width, height = calculate_default_transform(
    raster_crs, albers_crs, right - left, top - bottom, left, bottom, right, top)
kwargs = out_meta.copy()
kwargs.update({
    'crs': albers_crs,
    'transform': transform,
    'width': width,
    'height': height
})

reprojected_image = np.empty((out_image.shape[0], height, width), dtype=out_image.dtype)

for i in range(out_image.shape[0]):
    reproject(
        source=out_image[i],
        destination=reprojected_image[i],
        src_transform=out_transform,
        src_crs=raster_crs,
        dst_transform=transform,
        dst_crs=albers_crs,
        resampling=Resampling.nearest
    )

# Plot the reprojected masked raster values within each ZCTA
fig, ax = plt.subplots(figsize=(10, 10))
show(reprojected_image, transform=transform, ax=ax, cmap='viridis')
zctas_ca.to_crs(albers_crs).boundary.plot(ax=ax, linewidth=0.5, edgecolor='blue')

# Set plot title and labels
ax.set_title('Masked Raster Values within California ZCTAs (Albers Equal Area)')
ax.set_axis_off()

plt.show()

In [None]:
type(out_meta)

In [None]:
# Reproject the masked raster to the Albers Equal Area CRS
albers_crs = "EPSG:5070"
transform, width, height = calculate_default_transform(
    raster_crs, albers_crs, out_image.shape[2], out_image.shape[1], *out_transform.bounds)
kwargs = out_meta.copy()
kwargs.update({
    'crs': albers_crs,
    'transform': transform,
    'width': width,
    'height': height
})

In [None]:
print(zctas_ca.crs)
print(raster_crs)

In [None]:
# Mask the raster with the ZCTAs
with rasterio.open(raster_path) as src:
    out_image, out_transform = mask(src, zctas_ca.geometry, crop=True)
    out_meta = src.meta.copy()

# Update the metadata with the new dimensions, transform, and CRS
out_meta.update({
    "driver": "GTiff",
    "height": out_image.shape[1],
    "width": out_image.shape[2],
    "transform": out_transform,
    "crs": raster_crs
})

# Plot the masked raster values within each ZCTA
fig, ax = plt.subplots(figsize=(10, 10))
show(out_image, transform=out_transform, ax=ax, cmap='viridis')
zctas_ca.boundary.plot(ax=ax, linewidth=0.5, edgecolor='blue')

# Set plot title and labels
ax.set_title('Masked Raster Values within California ZCTAs')
ax.set_axis_off()

plt.show()

In [None]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pathlib
import rasterio
from rasterio.plot import show
from rasterio.mask import mask
from rasterio.warp import calculate_default_transform, reproject, Resampling

# Define the base path and data directory
base_path = pathlib.Path.cwd().parent
data_dir = base_path / "demo_data"

# Read the raw ZCTA data
zctas = gpd.read_file(data_dir / "01_raw_data" / "tl_2020_us_zcta520" / "tl_2020_us_zcta520.shp")

# Filter ZCTAs for California ZIP codes (90xxx to 96xxx)
zctas_ca = zctas[zctas['GEOID20'].str[:3].astype(int).between(900, 961)]

# Read the raster data
raster_path = data_dir / "01_raw_data" / "GHS_POP_E2020_GLOBE_R2023A_54009_100_V1_0.tif"
with rasterio.open(raster_path) as src:
    raster_crs = src.crs  # Get the raster CRS

# Convert the ZCTAs to the raster CRS
zctas_ca = zctas_ca.to_crs(raster_crs)

# Mask the raster with the ZCTAs
with rasterio.open(raster_path) as src:
    out_image, out_transform = mask(src, zctas_ca.geometry, crop=True)
    out_meta = src.meta.copy()

# Update the metadata with the new dimensions, transform, and CRS
out_meta.update({
    "driver": "GTiff",
    "height": out_image.shape[1],
    "width": out_image.shape[2],
    "transform": out_transform,
    "crs": raster_crs
})

# Reproject the masked raster to the Albers Equal Area CRS
albers_crs = "EPSG:5070"
transform, width, height = calculate_default_transform(
    raster_crs, albers_crs, out_image.shape[2], out_image.shape[1], *out_transform.bounds)
kwargs = out_meta.copy()
kwargs.update({
    'crs': albers_crs,
    'transform': transform,
    'width': width,
    'height': height
})

reprojected_image = np.empty((out_image.shape[0], height, width), dtype=out_image.dtype)

for i in range(out_image.shape[0]):
    reproject(
        source=out_image[i],
        destination=reprojected_image[i],
        src_transform=out_transform,
        src_crs=raster_crs,
        dst_transform=transform,
        dst_crs=albers_crs,
        resampling=Resampling.nearest
    )

# Plot the reprojected masked raster values within each ZCTA
fig, ax = plt.subplots(figsize=(10, 10))
show(reprojected_image, transform=transform, ax=ax, cmap='viridis')
zctas_ca.to_crs(albers_crs).boundary.plot(ax=ax, linewidth=0.5, edgecolor='blue')

# Set plot title and labels
ax.set_title('Masked Raster Values within California ZCTAs (Albers Equal Area)')
ax.set_axis_off()

plt.show()

In [None]:





# Read the raster data
raster_path = data_dir / "01_raw_data" / "GHS_POP_E2020_GLOBE_R2023A_54009_100_V1_0.tif"
with rasterio.open(raster_path) as src:
    # Mask the raster with the ZCTAs
    out_image, out_transform = mask(src, zctas_ca.geometry, crop=True)
    out_meta = src.meta.copy()

# Update the metadata with the new dimensions, transform, and CRS
out_meta.update({
    "driver": "GTiff",
    "height": out_image.shape[1],
    "width": out_image.shape[2],
    "transform": out_transform,
    "crs": src.crs
})

# Plot the masked raster values within each ZCTA
fig, ax = plt.subplots(figsize=(10, 10))
show(out_image, transform=out_transform, ax=ax, cmap='viridis')
zctas_ca.boundary.plot(ax=ax, linewidth=0.5, edgecolor='blue')

# Set plot title and labels
ax.set_title('Masked Raster Values within California ZCTAs')
ax.set_axis_off()

plt.show()

In [None]:
# set both zctas and fires to albers equal area
zctas_ca = zctas_ca.to_crs(fires_ca.crs)



In [None]:
fig, ax = plt.subplots(figsize=(10, 10))
zctas_ca.boundary.plot(ax=ax, linewidth=0.5, edgecolor='blue', label='ZCTA Boundaries')

# Overlay the fire geometries
fires_ca.plot(ax=ax, color='red', label='Fire Boundaries')

# Set plot title and labels
ax.set_title('California ZCTAs with Fire Boundaries')
ax.set_axis_off()
ax.legend()

plt.show()


#fig, ax = plt.subplots(figsize=(10, 10))
# ax = zctas_ca.boundary.plot(linewidth=0.5, edgecolor='k', label='ZCTA Boundaries')  # Plot ZCTA boundaries
# fires_ca.boundary.plot(ax=ax, linewidth=1, edgecolor='red', alpha=0.5, label='Fire Boundaries')  # Overlay fire boundaries with transparency

# # Set plot title and labels
# ax.set_title('California ZCTAs with Fire Boundaries')
# ax.set_axis_off()
# ax.legend()


In [None]:
zctas = gpd.read_file(data_dir / "01_raw_data" / "tl_2020_us_zcta520" / "tl_2020_us_zcta520.shp")
