# Wildfire Impacts on Air Quality in Colorado

Wildfires in Colorado burn over 70,000 acres of land per year, and have a noticeable impact on air quality, both visually and physically. This project sets out to examine the relationship between wildfire occurences and the top two contributors to the air quality index (AQI), PM2.5 and Ozone. Both temporal trends (across 5 years, 2019 to 2024) and geographic trends across Colorado will be examined. Air quality data is provided hourly from a variety of monitoring stations in Colorado from the EPA, and wildfire incidents are reported as-occured by the NASA FIRMs website using VIIRS satellite detection. By examining variations in air quality during wildfire seasons, we seek to uncover actionable insights into seasonal and regional pollution dynamics, contributing to a better understanding of the environmental impacts of wildfires in Colorado. Additionally, multiple health risks are associated with the primary pollutants (Ozone and PM2.5), and determining the relationship between pollutants and fire incidence can help inform public health decisions.


## Package and Path Management



We'll start by loading in the necessary packages:
* **Pandas** - Data manipulation and analysis
* **Matplotlib** - Data visualization
* **Seaborn** - Data visualization
* **OS** - OS functionality
* **Datetime** - Date and time manipulation
* **GeoPandas** - Geographic data storage and handling
* **Logging** - Handles internal logs
* **Typing** - Data type handling
* **Shapely** - Geographic point aggregation
* **Folium** - Interactive map plotting
* **Webbrowser** - Pulls up interactive plots in browser
* **Plotly** - Interactive plotting
* **Statsmodels** - For time series decomposition

In [1]:
import os, sys
import pandas as pd
import numpy as np
import geopandas as gpd
import logging
from typing import Optional, List
from shapely.geometry import Point
import matplotlib.pyplot as plt
import folium
import webbrowser
import datetime
import matplotlib.animation as animation
import plotly.graph_objects as go
from shapely.geometry import Point
from plotly.subplots import make_subplots
import statsmodels.api as sm
from folium.plugins import HeatMap, HeatMapWithTime, TimestampedGeoJson, MarkerCluster
from io import BytesIO
import base64

src_path = os.path.abspath(os.path.join(os.getcwd(), '../', 'src'))
data_path = os.path.abspath(os.path.join(os.getcwd(), '../', 'data'))
visuals_path = os.path.abspath(os.path.join(os.getcwd(), '../', 'visuals'))
if src_path not in sys.path:
    sys.path.append(src_path)
if data_path not in sys.path:
    sys.path.append(data_path)
if visuals_path not in sys.path:
    sys.path.append(visuals_path)

## Data Collection and Processing

Next, we'll start loading the data from our two sources:

In [2]:
# Import classes
from aqi_wf_processor_2 import *

if not os.path.exists("../data/aqi_data/aqi_processed/pm25_aqi_2019_2024.csv") and not os.path.exists("../data/aqi_data/aqi_processed/ozone_aqi_2019_2024.csv"):
    # Paths and settings
    wildfire_csv = "../data/large_data/fire_archive_SV-C2_584955.csv"
    aqi_csv = "../data/large_data/Colorado_AQI_2019_2024.csv"
    wildfire_output_dir = "../data/wildfire_data/wildfire_processed/"
    aqi_output_dir = "../data/aqi_data/aqi_processed/"
    county_shapefile = "../data/co_shapefile/counties/counties_19.shp"
    start_year = 2019
    end_year = 2024

    # Process Wildfire Data
    wildfire_processor = WildfireProcessor(
        wildfire_filepath=wildfire_csv,
        start_year=start_year,
        end_year=end_year,
        output_dir=wildfire_output_dir,
        county_shapefile=county_shapefile
    )
    wildfire_processor.process_wildfire(year_range=(start_year, end_year))

    # Load processed wildfire data for AQI processing
    processed_wildfire_csv = "../data/wildfire_data/wildfire_processed/wildfire_processed_2019_2024_n.csv"

    # Process AQI Data
    aqi_processor = AQIProcessor(
        aqi_filepath=aqi_csv,
        wildfire_filepath=processed_wildfire_csv,
        start_year=start_year,
        end_year=end_year,
        output_dir=aqi_output_dir,
        county_shapefile=county_shapefile)

    aqi_processor.process_aqi(years_to_process=list(range(start_year, end_year+1)))

    #save df by pollutant
    df = pd.read_csv(f"../data/aqi_data/aqi_processed/aqi_final_{start_year}_{end_year}_30.csv")
    pm25_df = df[df["Parameter"].str.upper() == "PM2.5"]
    ozone_df = df[df["Parameter"].str.upper() == "OZONE"]
    pm25_df.to_csv(f"../data/aqi_data/aqi_processed/pm25_aqi_{start_year}_{end_year}_30.csv", index=False)
    ozone_df.to_csv(f"../data/aqi_data/aqi_processed/ozone_aqi_{start_year}_{end_year}_30.csv", index=False)
else:
    print("Data already processed")

Data already processed


## Data Collection

Before talking about the data source and collection process, we can talk about the date range that we're using for the analysis - March 23 - Sept 23 is what's defaulted in the API .py code and not sure if we want to change that, but can justify it here with hsitorical background (e.g. bad year for forest fires, etc.).

I changed it slightly to only look at a month for testing purposes (quicker download).

In [3]:
ozone_dp = '../data/aqi_data/aqi_processed/ozone_aqi_2019_2024.csv'
pm25_dp = '../data/aqi_data/aqi_processed/pm25_aqi_2019_2024.csv'
wildfire_dp = '../data/wildfire_data/wildfire_processed/wildfire_processed_2019_2024.csv'
state_shapefile = '../data/co_shapefile/counties/counties_19.shp'
from geo_plots import *
# Example usage with data between 2023 and 2024
geo_plots = GeoPlots(ozone_dp, pm25_dp, wildfire_dp, state_shapefile, visuals_path, 2023, 2024)
geo_plots.plot_stations()
# show plot


Background on EPA AQI data and API system for how we're pulling it.

We'll check first to see if we already have a dataset downloaded before we go through the process of downloading it again.

In [4]:
# Use wildfire data downloader to download 2024 wildfire

download_info = [
        {
            "download_id": "575994",
            "data_source": "modis-c6.1",
            "url": ("https://urldefense.com/v3/__https://firms.modaps.eosdis.nasa.gov/data/download/"
                    "DL_FIRE_M-C61_575994.zip__;!!NCZxaNi9jForCP_SxBKJCA!V6g2hfAXPmWgf7I5lH9wj4Mfl9l-9NzD5-"
                    "Xw7_9qSGknhHOT0__q1KLHus-P_CPWdE-tAgEfdHGVrXmHVmZofQ$")
        }]
wf_downloader = WildfireDataDownloader(download_info, output_dir=data_path)
wf_downloader.run()

NameError: name 'WildfireDataDownloader' is not defined

## Data Import

We can load the downloaded CSV into a dataframe and start poking around. We can take a peek at the first couple values:

In [None]:
AQI_data = pd.read_csv(os.path.join(data_path, filename))
# Applying the correct variable types to the categories:
AQI_data['Parameter'] = AQI_data['Parameter'].astype('category')
AQI_data.head()

Traceback (most recent call last):
  File "c:\Users\adamf\.vscode\extensions\ms-python.python-2025.2.0-win32-x64\python_files\python_server.py", line 133, in exec_user_input
    retval = callable_(user_input, user_globals)
  File "<string>", line 1, in <module>
NameError: name 'filename' is not defined



Working next on the wildfire dataset (for reference, confidence is detection confidence percentage, brightness is the brightness temperature of the pixel (K), and frp is fire radiative power(MW)):

In [None]:
# Convert out from json to csv, then read into pandas
wf_converter = WildfireDataConverter(extracted_dir=data_path,
        start_year=2023,
        end_year=2024,
        output_csv=None)
wf_converter.convert_to_csv()
wf_df = pd.read_csv(os.path.join(data_path, "colorado_wildfires_2023_2024.csv"))
# Summarize
wf_df.head()

Traceback (most recent call last):
  File "c:\Users\adamf\.vscode\extensions\ms-python.python-2025.2.0-win32-x64\python_files\python_server.py", line 133, in exec_user_input
    retval = callable_(user_input, user_globals)
  File "<string>", line 2, in <module>
NameError: name 'WildfireDataConverter' is not defined



Now we'll convert each of the datasets (splitting out the AQI data by parameter) into GeoDataSets for easy plotting and manipulation.

## Inital Views
Plotting out a view of all of the stations used to collect data:

In [None]:
# extract each unique site name with latitude and longitude
site_df = AQI_data[['SiteName', 'Latitude', 'Longitude']].drop_duplicates()
# Plot the site locations on the state map of colorado using the shape file
state_map = geopandas.read_file('../src/County_Data_2020.shp')
fig, ax = plt.subplots(figsize=(10, 10))
state_map.boundary.plot(ax=ax, color="black")
sns.scatterplot(data=site_df, x='Longitude', y='Latitude', c='red', marker="o", s=10)
plt.grid(False)
ax.set_axis_off()
plt.show()

Traceback (most recent call last):
  File "c:\Users\adamf\.vscode\extensions\ms-python.python-2025.2.0-win32-x64\python_files\python_server.py", line 133, in exec_user_input
    retval = callable_(user_input, user_globals)
  File "<string>", line 2, in <module>
NameError: name 'AQI_data' is not defined



In [None]:
# Geomap of wildfires
fig2, ax2 = plt.subplots(figsize=(10, 10))
state_map.boundary.plot(ax=ax2, color = 'black')
sns.scatterplot(data=wf_df, x='longitude', y='latitude', hue='brightness', size='brightness')
plt.grid(False)
plt.show()

Traceback (most recent call last):
  File "c:\Users\adamf\.vscode\extensions\ms-python.python-2025.2.0-win32-x64\python_files\python_server.py", line 133, in exec_user_input
    retval = callable_(user_input, user_globals)
  File "<string>", line 3, in <module>
NameError: name 'state_map' is not defined

