## Purpose

The purpose of this file is to filter the raw data into the information we need to create smoke estimators. Specifically we need to limit the data to: 
1. Fires within 1250 miles of Pahrump, Nevada (Nye County)
2. Fires which occurred within the last 60 years (1963-2023)

We will begin by importing python libraries. The user may need to install pyproj (converts between different geodesic coordinate systems and for calculating distances between points (coordinates) in a specific geodesic system) and geojson using pip. The 'wildfire' module is a user module. This module is available from the course website. The module includes one object, a Reader, that can be used to read the GeoJSON files associated with the wildefire dataset. The module also contains a sample datafile that is GeoJSON compliant and that contains a small number of California wildfires extracted from the main wildfire dataset. Some of the code below was taken from the wildfire_geo_proximity_example notebook created by Professor McDonald. The notebook is licensed for use in DATA 512, a course in the UW MS Data Science degree program. This code is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.0 - August 13, 2023

In [1]:
import os, json, time
from pyproj import Transformer, Geod
from wildfire.Reader import Reader as WFReader
import geojson
from datetime import datetime
import pandas as pd

Transform the feature geometry into different coordinate system (EPSG:4326). Code originally written by Professor McDonald, modified by Emily Creeden.

In [2]:
#    Transform feature geometry data
#
#    The function takes one parameter, a list of ESRI:102008 coordinates that will be transformed to EPSG:4326
#    The function returns a list of coordinates in EPSG:4326
def convert_ring_to_epsg4326(ring_data=None):
    converted_ring = list()
    #
    # We use a pyproj transformer that converts from ESRI:102008 to EPSG:4326 to transform the list of coordinates
    to_epsg4326 = Transformer.from_crs("ESRI:102008","EPSG:4326")
    # We'll run through the list transforming each ESRI:102008 x,y coordinate into a decimal degree lat,lon
    for coord in ring_data:
        lat,lon = to_epsg4326.transform(coord[0],coord[1])
        new_coord = lat,lon
        converted_ring.append(new_coord)
    return converted_ring

Next we need to find the shortest distance between our city and each fire. We choose to measure the distance from the perimeter of the fire rather than the center. We do this because we believe proximity to the town is important in determining the amount of smoke generated. The edge of the fire may be close to town generating lots of smoke, whereas the center may be several miles away generating less smoke. We estimate the town's coordinates to be in the middle of the specified town area on GoogleMaps. We do this because we want to estimate smoke for the town as a whole and did not want to bias towards one side or another. We will later calculate the smoke estimate using the proximity to town.

Code originally written by Professor McDonald, modified by Emily Creeden.

In [3]:
#    The function takes two parameters
#        A place - which is coordinate point (list or tuple with two items, (lat,lon) in decimal degrees EPSG:4326
#        Ring_data - a list of decimal degree coordinates for the fire boundary
#
#    The function returns a list containing the shortest distance to the perimeter and the point where that is
#
def shortest_distance_from_place_to_fire_perimeter(place=None,ring_data=None):
    # convert the ring data to the right coordinate system
    ring = convert_ring_to_epsg4326(ring_data)    
    # create a epsg4326 compliant object - which is what the WGS84 ellipsoid is
    geodcalc = Geod(ellps='WGS84')
    closest_point = list()
    # run through each point in the converted ring data
    for point in ring:
        # calculate the distance
        d = geodcalc.inv(place[1],place[0],point[1],point[0])
        # convert the distance to miles
        distance_in_miles = d[2]*0.00062137
        # if it's closer to the city than the point we have, save it
        if not closest_point:
            closest_point.append(distance_in_miles)
            closest_point.append(point)
        elif closest_point and closest_point[0]>distance_in_miles:
            closest_point = list()
            closest_point.append(distance_in_miles)
            closest_point.append(point)
    return closest_point


Now we will load the geojson data into the wildfire reader. Code originally written by Professor McDonald, modified by Emily Creeden. Users should change the SAMPLE_DATA_FILENAME to reflect where they have stored the USGS_Wildland_Fire_Combined_Dataset.json if not in 2 directories above the current directory as specified in the README.

In [4]:
#
#    This bit of code opens a new wildfire reader, gets the header information and prints it to the screen
#
SAMPLE_DATA_FILENAME = '../../USGS_Wildland_Fire_Combined_Dataset.json'
print(f"Attempting to open '{SAMPLE_DATA_FILENAME}' with wildfire.Reader() object")
wfreader = WFReader(SAMPLE_DATA_FILENAME)
print()
#
#    OPTIONAL: Now print the header - it contains some useful information
#
#header_dict = wfreader.header()
#header_keys = list(header_dict.keys())
#print("The header has the following keys:")
#print(gj_keys)
#print()
#print("Header Dictionary")
#print(json.dumps(header_dict,indent=4))

Attempting to open '../../USGS_Wildland_Fire_Combined_Dataset.json' with wildfire.Reader() object



Next we will get a list of features (fires) in the data. This section may take a while to run. Code originally written by Professor McDonald, modified by Emily Creeden.

In [None]:
#Loading all feature data from raw USGS data.
#MAX_FEATURE_LOAD = 100
feature_list = list()
feature_count = 0
# A rewind() on the reader object makes sure we're at the start of the feature list
http://localhost:8890/notebooks/Documents/01%20MSDS/512%20-%20Human%20Centered%20Design/data-512-wildfire-project/scr/data_acquisition.ipynb## This way, we can execute this cell multiple times and get the same result 
wfreader.rewind()
# Now, read through each of the features, saving them as dictionaries into a list
feature = wfreader.next()
while feature:
    feature_list.append(feature)
    feature_count += 1
    # if we're loading a lot of features, print progress
    if (feature_count % 10000) == 0:
        print(f"Loaded {feature_count} features")
    # loaded the max we're allowed then break
    '''if feature_count >= MAX_FEATURE_LOAD:
        break'''
    feature = wfreader.next()
#
#    Print the number of items (features) we think we loaded
print(f"Loaded a total of {feature_count} features")
#
#    Just a validation check - did all the items we loaded get into the list?
print(f"Variable 'feature_list' contains {len(feature_list)} features")

Specify the coordinates of the town you want to measure distances to. As noted above, we use the center of Pahrump as a proxy to the whole town. Code originally written by Professor McDonald, modified by Emily Creeden.

In [6]:
#Building out the city location
CITY_LOCATIONS = {
    'pahrump' :     {'city'   : 'Pahrump', 
                     'latlon' : [36.231143, -116.017339]}}

The below code calculates the distance from the closest edge of the fire to the town. It may run for ~1 hour. Code originally written by Professor McDonald, modified by Emily Creeden.

In [9]:
#    Get a city from our CITY_LOCATIONS constant as our starting position
place = CITY_LOCATIONS["pahrump"]

fire_id = []
shortest_dist_from_edge = []
features_processed = 0

for wf_feature in feature_list:
    #Try/Except to catch fires which aren't in a ring shape
    try:
        ring_data = wf_feature['geometry']['rings'][0]
        distance = shortest_distance_from_place_to_fire_perimeter(place['latlon'],ring_data)
        fire_id.append(wf_feature['attributes']['OBJECTID'])
        shortest_dist_from_edge.append(round(distance[0], 2))
    except KeyError:
        print("{0} fire is in {1} shape, ignoring.".format(wf_feature['attributes']['OBJECTID'], list(wf_feature['geometry'].keys())[0]))
    #Incrementing the fires processed counter and saving every 10000 entries to avoid lost work
    features_processed = features_processed + 1
    if features_processed % 1000 == 0:
        print("Processed {0} features".format(features_processed))
    if features_processed % 10000 == 0:
        dist_df = pd.DataFrame({'OBJECTID': fire_id, 'shortest_dist': shortest_dist_from_edge})
        dist_df.to_csv('../intermediate_data/fire_distances.csv', index=False)

#Saving the final file
dist_df = pd.DataFrame({'OBJECTID': fire_id, 'shortest_dist': shortest_dist_from_edge})
dist_df.to_csv('../intermediate_data/fire_distances.csv', index=False)

Processed 1000 features
Processed 2000 features
Processed 3000 features
Processed 4000 features
Processed 5000 features
Processed 6000 features
Processed 7000 features


KeyboardInterrupt: 

We will read in the file created above in the event that the programmer saved their outputs and returned to the work.

In [None]:
#Reading file in as pandas df
fire_dist_df = pd.read_csv('../intermediate_data/fire_distances.csv')

Now we will keep only fires which occurr within 1250 miles from our town.

In [None]:
#Keeping only fires <1250 miles away
lim_fires_df = fire_dist_df.loc[fire_dist_df['shortest_dist']< 1250]
print("There are {0} fires within 1250 miles of Pahrump, NV".format(len(lim_fires_df)))

#Saving those fires
lim_fires_df.to_csv('../intermediate_data/close_fires.csv', index = False)

Now we will reload the wildfire data to create a table with fire attributes which will later be used to estimate smoke in Pahrump on an annual basis. We are keeping only fires which occurred in or after 1963. The below code loads the table...

In [None]:
#Loading wildfire data to get features
wf_file = open('../../USGS_Wildland_Fire_Combined_Dataset.json')
 
#Makes a dictionary from file
wf_dict = json.load(wf_file)

...and the following code extracts the relevent columms.

In [None]:
#Creating new lists for relevent columns
objectid = []
Assigned_Fire_Type = []
Fire_Year = []
GIS_Acres = []
Overlap_Within_1_or_2_Flag = []

#Parsing through wf_dict['features'] list for each attribute seeing fire year
#if in range, add to the lim_df list
for fire in wf_dict['features']:
    fire_count = 0
    if fire['attributes']['Fire_Year'] >= 1963:
        objectid.append(fire['attributes']['OBJECTID'])
        Assigned_Fire_Type.append(fire['attributes']['Assigned_Fire_Type'])
        Fire_Year.append(fire['attributes']['Fire_Year'])
        GIS_Acres.append(fire['attributes']['GIS_Acres'])
        Overlap_Within_1_or_2_Flag.append(fire['attributes']['Overlap_Within_1_or_2_Flag'])
        fire_count += 1
        if fire_count % 1000 == 0:
            print("Processed {0} fires".format(fire_count))

Now we will combine the lists created above into a single pandas df and save it to the intermediate files in the event the programmer wants to return to it later.

In [None]:
#Creating the feature DF
feature_df = pd.DataFrame({'OBJECTID':objectid,
                            'Assigned_Fire_Type':Assigned_Fire_Type,
                            'Fire_Year':Fire_Year,
                            'GIS_Acres':GIS_Acres,
                            'Overlap_Within_1_or_2_Flag' : Overlap_Within_1_or_2_Flag})

#Saving the feature_df in the event that the programmer wants to come back to it later
feature_df.to_csv('../intermediate_data/fire_features.csv', index=False)

Opening the fire features file in the event that the programmer wanted to return to their work later in the day.

In [None]:
#Reading fire features in as pandas df
fire_feature_df = pd.read_csv('../intermediate_data/fire_features.csv')

#Also reading the the close fires as a pandas df
close_fires_df= pd.read_csv('../intermediate_data/close_fires.csv')

Next we inner join the post-1963 fire attributes with their distances to calculate the smoke estimate. We will save this dataframe for later use in the data_processing script.

In [None]:
#Inner joining to get only the fires which are after 1963 (inclusive) and within 1250 miles of our town.
filtered_fire_df = pd.merge(fire_feature_df, close_fires_df, how = 'inner', left_on='OBJECTID', right_on='OBJECTID')

#Saving file
filtered_fire_df.to_csv('../intermediate_data/filtered_fire_info.csv', index = False)

#Checking the output
if filtered_fire_df['Fire_Year'].max() > 2023:
    print("ERROR - fires after 2023 included in data")
else:
    print("The max fire year is {0}".format(filtered_fire_df['Fire_Year'].max()))
if filtered_fire_df['Fire_Year'].min() < 1963:
    print("ERROR - fires before 1963 included in data")
else:
    print("The min fire year is {0}".format(filtered_fire_df['Fire_Year'].min()))
if filtered_fire_df['shortest_dist'].max() > 1250:
    print("ERROR - fires beyond 1250 miles included in data")
else:
    print("The fartherst fire is {0} miles from town".format(filtered_fire_df['shortest_dist'].max()))
    print("The closest fire is {0} miles from town".format(filtered_fire_df['shortest_dist'].min()))
print("There are {0} fires after 1963 within 1250 miles of Pahrump, NV".format(len(filtered_fire_df)))