# Homework 3

## Common Analysis

More and more frequently summers in the western US have been characterized by wildfires with smoke billowing across multiple western states. There are many proposed causes for this: climate change, US Forestry policy, growing awareness, just to name a few. Regardless of the cause, the impact of wildland fires is widespread as wildfire smoke reduces the air quality of many cities. There is a growing body of work pointing to the negative impacts of smoke on health, tourism, property, and other aspects of society.

This notebook has dependencies on [Pyproj](https://pyproj4.github.io/pyproj/stable/index.html), the [geojson](https://pypi.org/project/geojson/) module and on the wildfire user module. Pyproj and geojson can be installed via pip. The wildfire user module should be downloaded from the course website, unzipped, and moved into the folder pointed to by your PYTHONPATH system variable.

### License
Snippets of the below code were taken from examples was developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. This code is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.1 - August 16, 2024

A copy of the examples referred is also available in this repository under the folder named `reference_code`.

### Preliminaries
First we start with some imports and some constant definitions.

In [1]:
%%capture
%pip install pyproj
%pip install geojson
%pip install polars

In [2]:
#
#    IMPORTS
# 
#    Import some standard python modules
import os, json, time
#
#    The module pyproj is a standard module that can be installed using pip or your other favorite
#    installation tool. This module provides tools to convert between different geodesic coordinate systems
#    and for calculating distances between points (coordinates) in a specific geodesic system.
#
from pyproj import Transformer, Geod
#
#    The 'wildfire' module is a user module. This module is available from the course website. The module
#    includes one object, a Reader, that can be used to read the GeoJSON files associated with the
#    wildefire dataset. The module also contains a sample datafile that is GeoJSON compliant and that
#    contains a small number of wildfires extracted from the main wildfire dataset.
#
import geojson

# Other generic imports
from datetime import datetime
import polars as pl

The data on the wildland fires was downloaded from the [Combined wildland fire datasets for the United States and certain territories, 1800s-Present (combined wildland fire polygons) dataset](https://www.sciencebase.gov/catalog/item/61aa537dd34eb622f699df81). This is saved as a json file, and is not committed in the project repository. This is owing to the fact that size of this file exceeds the max limit allowed on gitlab. 

In this project, we'll be analysing the wildfires w.r.t the city of Stockton, CA.

In [3]:
#
#    CONSTANTS
#
DATA_FILE = "resources/USGS_Wildland_Fire_Combined_Dataset.json"

CITY_LOCATION = {
    'city'   : 'Stockton',
    'latlon' : [37.975556, -121.300833] 
}

## Load the wildfire data using the GeoJSON module

In this example we use the GeoJSON module ([documentation](https://pypi.org/project/geojson/), [GitHub repo](https://github.com/jazzband/geojson)) to load the sample file. This module works mostly the way you would expect. GeoJSON is mostly just JSON, so actually, you don't even really need to use the GeoJSON module. However, that module will do some conversion of Geo type things to something useful. However, this example, and the examples that follow, do not rely on specific Geo features from geojson.

In [4]:
#
#    Open a file, load it with the geojson loader
#
print(f"Attempting to open '{DATA_FILE}'")
geojson_file = open(DATA_FILE,"r")
print(f"Using GeoJSON module to load file '{DATA_FILE}'")
gj_data = geojson.load(geojson_file)
geojson_file.close()
#
#    Print the keys from the object
#
gj_keys = list(gj_data.keys())
print("The loaded JSON dictionary has the following keys:")
print(gj_keys)
print()
#
#    For all GeoJSON type things, the most important part of the file are the 'features'. 
#    In the case of the wildfire dataset, each feature is a polygon (ring) of points that define the bounary of a fire
#
count = len(gj_data['features'])

print(f"Found {count} features in the variable 'gj_data' ")

Attempting to open 'resources/USGS_Wildland_Fire_Combined_Dataset.json'
Using GeoJSON module to load file 'resources/USGS_Wildland_Fire_Combined_Dataset.json'
The loaded JSON dictionary has the following keys:
['displayFieldName', 'fieldAliases', 'geometryType', 'spatialReference', 'fields', 'features']

Found 135061 features in the variable 'gj_data' 


Checking the structure of one of the features, to understand the data schema. Each feature contains information about a wildfire.

In [5]:
#
#    print the first item in the list of features
#
gj_feature = gj_data['features'][0]
print(f"Sample wildfire feature from the loaded gj_data['features']")
print(json.dumps(gj_feature, indent=4))

Sample wildfire feature from the loaded gj_data['features']
{
    "attributes": {
        "OBJECTID": 1,
        "USGS_Assigned_ID": 1,
        "Assigned_Fire_Type": "Wildfire",
        "Fire_Year": 1860,
        "Fire_Polygon_Tier": 1,
        "Fire_Attribute_Tiers": "1 (1)",
        "GIS_Acres": 3940.20708940724,
        "GIS_Hectares": 1594.5452365353703,
        "Source_Datasets": "Comb_National_NIFC_Interagency_Fire_Perimeter_History (1)",
        "Listed_Fire_Types": "Wildfire (1)",
        "Listed_Fire_Names": "Big Quilcene River (1)",
        "Listed_Fire_Codes": "No code provided (1)",
        "Listed_Fire_IDs": "",
        "Listed_Fire_IRWIN_IDs": "",
        "Listed_Fire_Dates": "Listed Other Fire Date(s): 2006-11-02 - NIFC DATE_CUR field (1)",
        "Listed_Fire_Causes": "",
        "Listed_Fire_Cause_Class": "Undetermined (1)",
        "Listed_Rx_Reported_Acres": null,
        "Listed_Map_Digitize_Methods": "Other (1)",
        "Listed_Notes": "",
        "Processing_Not

For this project, we are interested in analysing the wildfires in the last 60 years. So lets filter the data to consider the features with `Fire_Year` in the last 60 years.

In [6]:
# Finding wildfires for the last 60 years.

start_year = datetime.now().year - 60

wildfires_list = [d for d in gj_data['features'] if d['attributes']["Fire_Year"] >= start_year]
print(f"Total number of wildfires since {start_year} is {len(wildfires_list)}")

print(f"Sample wildfire from the filtered list")
print(json.dumps(wildfires_list[0], indent=4))

Total number of wildfires since 1964 is 117191
Sample wildfire from the filtered list
{
    "attributes": {
        "OBJECTID": 14600,
        "USGS_Assigned_ID": 14600,
        "Assigned_Fire_Type": "Wildfire",
        "Fire_Year": 1964,
        "Fire_Polygon_Tier": 1,
        "Fire_Attribute_Tiers": "1 (1), 3 (3)",
        "GIS_Acres": 65338.8776355638,
        "GIS_Hectares": 26441.70565918891,
        "Source_Datasets": "Comb_National_NIFC_Interagency_Fire_Perimeter_History (1), Comb_National_WFDSS_Interagency_Fire_Perimeter_History (1), Comb_State_California_Wildfire_Polygons (1), Comb_National_USFS_Final_Fire_Perimeter (1)",
        "Listed_Fire_Types": "Wildfire (3), Likely Wildfire (1)",
        "Listed_Fire_Names": "COYOTE (4)",
        "Listed_Fire_Codes": "No code provided (4)",
        "Listed_Fire_IDs": "0 (3)",
        "Listed_Fire_IRWIN_IDs": "",
        "Listed_Fire_Dates": "Listed Wildfire Discovery Date(s): 1964-09-22 (2) | Listed Other Fire Date(s): 1964-09-22 - DATE

## Distance computations

#### Computation using Pyproj

Geodetic distance computations account for Earth's curved, ellipsoidal shape, as distances between points are arcs, not straight lines. Since curvature varies by location and direction, calculating these distances accurately requires specialized tools. Fortunately, libraries like [Pyproj](https://pyproj4.github.io/pyproj/stable/index.html) handle these complexities by converting coordinates between systems and computing distances on Earth’s surface.

#### Computing Distance Between a Location and a Wildfire

To find the distance between a fire and a location (e.g., a city), we need to decide what "distance" means, given fires' irregular shapes. Options include measuring to the closest perimeter point or to the fire's centroid, with sample codes [here](reference_code/wildfire_geo_proximity_example.ipynb)

The first code calculates the shortest distance to the closest perimeter point, returning the distance and its coordinates. The second code computes the average distance from all perimeter points to the location, which approximates the centroid distance.

Here, we’ll use the first approach—finding the shortest distance to the city. Let's start by defining helper functions.

In [7]:
#
#    Transform feature geometry data
#
#    The function takes one parameter, a list of ESRI:102008 coordinates that will be transformed to EPSG:4326
#    The function returns a list of coordinates in EPSG:4326
def convert_ring_to_epsg4326(ring_data=None):
    converted_ring = list()
    #
    # We use a pyproj transformer that converts from ESRI:102008 to EPSG:4326 to transform the list of coordinates
    to_epsg4326 = Transformer.from_crs("ESRI:102008","EPSG:4326")
    # We'll run through the list transforming each ESRI:102008 x,y coordinate into a decimal degree lat,lon
    for coord in ring_data:
        lat,lon = to_epsg4326.transform(coord[0],coord[1])
        new_coord = lat,lon
        converted_ring.append(new_coord)
    return converted_ring

In [8]:
#    
#    The function takes two parameters
#        A place - which is coordinate point (list or tuple with two items, (lat,lon) in decimal degrees EPSG:4326
#        Ring_data - a list of decimal degree coordinates for the fire boundary
#
#    The function returns a list containing the shortest distance to the perimeter and the point where that is
#
def shortest_distance_from_place_to_fire_perimeter(place=None,ring_data=None):
    # convert the ring data to the right coordinate system
    ring = convert_ring_to_epsg4326(ring_data)    
    # create a epsg4326 compliant object - which is what the WGS84 ellipsoid is
    geodcalc = Geod(ellps='WGS84')
    closest_point = list()
    # run through each point in the converted ring data
    for point in ring:
        # calculate the distance
        d = geodcalc.inv(place[1],place[0],point[1],point[0])
        # convert the distance to miles
        distance_in_miles = d[2]*0.00062137
        # if it's closer to the city than the point we have, save it
        if not closest_point:
            closest_point.append(distance_in_miles)
            closest_point.append(point)
        elif closest_point and closest_point[0]>distance_in_miles:
            closest_point = list()
            closest_point.append(distance_in_miles)
            closest_point.append(point)
    return closest_point


Now we proceed to compute the distances from our city (Stockton, CA) to each wildland fire. We use the function defined above to calculate the shortest distance from the city to the wilfland fire perimeter. For this, we first select a few fields of interest

In [9]:
wildfire_list = list()
fields_of_interest = ['OBJECTID', 
                      'USGS_Assigned_ID', 
                      'Assigned_Fire_Type', 
                      'Fire_Year', 
                      'Fire_Polygon_Tier', 
                      'Fire_Attribute_Tiers', 
                      'GIS_Acres', 
                      'GIS_Hectares', 
                    #   'Source_Datasets', 
                    #   'Listed_Fire_Types', 
                    #   'Listed_Fire_Names', 
                    #   'Listed_Fire_Codes', 
                    #   'Listed_Fire_IDs', 
                    #   'Listed_Fire_IRWIN_IDs', 
                    #   'Listed_Fire_Dates', 
                    #   'Listed_Fire_Causes', 
                    #   'Listed_Fire_Cause_Class', 
                    #   'Listed_Rx_Reported_Acres', 
                    #   'Listed_Map_Digitize_Methods', 
                    #   'Listed_Notes', 
                    #   'Processing_Notes', 
                    #   'Wildfire_Notice', 
                    #   'Prescribed_Burn_Notice', 
                      'Wildfire_and_Rx_Flag', 
                      'Overlap_Within_1_or_2_Flag', 
                      'Circleness_Scale', 
                      'Circle_Flag', 
                      'Exclude_From_Summary_Rasters', 
                      'Shape_Length', 
                      'Shape_Area']


for index, wf_feature in enumerate(wildfires_list):

    attributes = wf_feature['attributes']
    geometry = wf_feature['geometry']

    if 'rings' in geometry:
        ring_data = geometry['rings'][0]
    elif 'curveRings' in geometry:
        ring_data = geometry['curveRings'][0]
    else:
        raise Exception("HEY! No compatible geometry in this fire data!!!")
    
    try:
        # Compute using the shortest distance to any point on the perimeter
        distance = shortest_distance_from_place_to_fire_perimeter(CITY_LOCATION['latlon'],ring_data)

        wildfire =  {k: attributes[k] for k in fields_of_interest}
        wildfire["Closest_Distance_Miles"] = distance[0]

        wildfire_list.append(wildfire)
    except Exception as e:
        print(f"Exception while processing a wildfire. Index:{index}, USGS_Assigned_ID:{attributes['USGS_Assigned_ID']}")
        print(e)


Exception while processing a wildfire. Index:91887, USGS_Assigned_ID:109605
0
Exception while processing a wildfire. Index:92506, USGS_Assigned_ID:110224
0
Exception while processing a wildfire. Index:92921, USGS_Assigned_ID:110639
0
Exception while processing a wildfire. Index:93713, USGS_Assigned_ID:111431
0
Exception while processing a wildfire. Index:94179, USGS_Assigned_ID:111897
0
Exception while processing a wildfire. Index:94692, USGS_Assigned_ID:112410
0
Exception while processing a wildfire. Index:94697, USGS_Assigned_ID:112415
0
Exception while processing a wildfire. Index:95693, USGS_Assigned_ID:113411
0
Exception while processing a wildfire. Index:95947, USGS_Assigned_ID:113665
0
Exception while processing a wildfire. Index:96020, USGS_Assigned_ID:113738
0
Exception while processing a wildfire. Index:96048, USGS_Assigned_ID:113766
0
Exception while processing a wildfire. Index:96087, USGS_Assigned_ID:113805
0
Exception while processing a wildfire. Index:96591, USGS_Assigne

We notice that we encountered errors while calculating the shortest distance for a few wildfires. Since this value is very small in comparision to the actual dataset, we proceed to ignore those specific wildfires for our analysis. This should not have a significant impact our analysis.

We proceed to save this processed and enriched(with distance calculation) data on wildland fires into a csv file. This file will be used in the following steps of this analysis 

In [10]:
import pandas as pd

wildfires_df_pandas = pd.DataFrame(wildfire_list)
wildfires_df_pandas.to_csv("generated_files/intermediate/wildfires_with_distances.csv", index=False)

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd
