# Project Part 1 - Common Analysis

**Author: Logan O'Brien** 

## Step 1: Create the Fire Smoke Estimates

First, I import the packages I will need for this assignment

In [47]:
import os, json, time
from pyproj import Transformer, Geod

#importing [4]
from wildfire.Reader import Reader as WFReader

import numpy as np
import requests
from tqdm import tqdm

### Load Data

**We begin by reading in the data:**
- Note, I heavily rely upon Dr. McDonald's "wildfire_geo_proximity_example.ipynb" to learn how the wildfire data is structured as well as how to work with it [3]. Additionally, I reused, with modification, the following _____ cells from that notebook below.

**---------------------------BEGIN PROF CODE REUSE ---------------------------**

In [48]:
# I modified this file path to make it point to where I stored the file
# SAMPLE_DATA_FILENAME = './wildfire/Wildfire_short_sample.json'
SAMPLE_DATA_FILENAME = './data/USGS_Wildland_Fire_Combined_Dataset.json'

**TBD remove excess outputs:**

In [49]:
#
#    This bit of code opens a new wildfire reader, gets the header information and prints it to the screen
#
print(f"Attempting to open '{SAMPLE_DATA_FILENAME}' with wildfire.Reader() object")
wfreader = WFReader(SAMPLE_DATA_FILENAME)
print()
#
#    Now print the header - it contains some useful information
#
header_dict = wfreader.header()
header_keys = list(header_dict.keys())
print("The header has the following keys:")

#-----LO: I replaced "gj_keys" in the print statement below with what is there now ------------
print(header_keys)

print()
print("Header Dictionary")
print(json.dumps(header_dict,indent=4))


Attempting to open './data/USGS_Wildland_Fire_Combined_Dataset.json' with wildfire.Reader() object

The header has the following keys:
['displayFieldName', 'fieldAliases', 'geometryType', 'spatialReference', 'fields']

Header Dictionary
{
    "displayFieldName": "",
    "fieldAliases": {
        "OBJECTID": "OBJECTID",
        "USGS_Assigned_ID": "USGS Assigned ID",
        "Assigned_Fire_Type": "Assigned Fire Type",
        "Fire_Year": "Fire Year",
        "Fire_Polygon_Tier": "Fire Polygon Tier",
        "Fire_Attribute_Tiers": "Fire Attribute Tiers",
        "GIS_Acres": "GIS_Acres",
        "GIS_Hectares": "GIS_Hectares",
        "Source_Datasets": "Source Datasets",
        "Listed_Fire_Types": "Listed Fire Types",
        "Listed_Fire_Names": "Listed Fire Names",
        "Listed_Fire_Codes": "Listed Fire Codes",
        "Listed_Fire_IDs": "Listed Fire IDs",
        "Listed_Fire_IRWIN_IDs": "Listed Fire IRWIN IDs",
        "Listed_Fire_Dates": "Listed Fire Dates",
        "Listed

In [50]:
#
#    This sample code will load the whole sample file, or a small amount of the complete dataset.
#
MAX_FEATURE_LOAD = 500
feature_list = list()
feature_count = 0
# A rewind() on the reader object makes sure we're at the start of the feature list
# This way, we can execute this cell multiple times and get the same result 
wfreader.rewind()
# Now, read through each of the features, saving them as dictionaries into a list
feature = wfreader.next()
while feature:
    feature_list.append(feature)
    feature_count += 1
    # if we're loading a lot of features, print progress
    if (feature_count % 100) == 0:
        print(f"Loaded {feature_count} features")
    # loaded the max we're allowed then break
    if feature_count >= MAX_FEATURE_LOAD:
        break
    feature = wfreader.next()
#
#    Print the number of items (features) we think we loaded
print(f"Loaded a total of {feature_count} features")
#
#    Just a validation check - did all the items we loaded get into the list?
print(f"Variable 'feature_list' contains {len(feature_list)} features")




Loaded 100 features
Loaded 200 features
Loaded 300 features
Loaded 400 features
Loaded 500 features
Loaded a total of 500 features
Variable 'feature_list' contains 500 features


Note: based on reading through [3] and using code snippets from it to explore the data, I believe each feature corresponds to a fire. 

In [51]:
#
#    The 'feature_list' variable was created when we read the sample file in a code cell above
#    Now, we're just going to look at one single feature - see what is in there
#
SLOT = 0
wf_feature = feature_list[SLOT]

# Print everyting in this dictionary (i.e., wf_feature) - it's long
print(f"The wildfire feature from slot '{SLOT}' of the loaded 'feature_list'")
print(json.dumps(wf_feature, indent=4))

The wildfire feature from slot '0' of the loaded 'feature_list'
{
    "attributes": {
        "OBJECTID": 1,
        "USGS_Assigned_ID": 1,
        "Assigned_Fire_Type": "Wildfire",
        "Fire_Year": 1860,
        "Fire_Polygon_Tier": 1,
        "Fire_Attribute_Tiers": "1 (1)",
        "GIS_Acres": 3940.20708940724,
        "GIS_Hectares": 1594.5452365353703,
        "Source_Datasets": "Comb_National_NIFC_Interagency_Fire_Perimeter_History (1)",
        "Listed_Fire_Types": "Wildfire (1)",
        "Listed_Fire_Names": "Big Quilcene River (1)",
        "Listed_Fire_Codes": "No code provided (1)",
        "Listed_Fire_IDs": "",
        "Listed_Fire_IRWIN_IDs": "",
        "Listed_Fire_Dates": "Listed Other Fire Date(s): 2006-11-02 - NIFC DATE_CUR field (1)",
        "Listed_Fire_Causes": "",
        "Listed_Fire_Cause_Class": "Undetermined (1)",
        "Listed_Rx_Reported_Acres": null,
        "Listed_Map_Digitize_Methods": "Other (1)",
        "Listed_Notes": "",
        "Processing

In [52]:
#
#    Every feature has a 'geometry' which specifies geo coordinates that make up each geographic thing
#    In the case of the wildfire data, most wildfires are bounded shapes, circles, squares, etc. This is
#    represented by shapes called 'rings' in GeoJSON.
# 
# Get the geometry for the feature we pulled from the feature_list
wf_geometry = wf_feature['geometry']
# The largest shape (ring) is supposed to be item zero in the list of 'rings'
wf_bigest_ring = wf_geometry['rings'][0]

print(f"The largest ring of wf_feature['features'][{SLOT}]['rings'] consists of {len(wf_bigest_ring)} points.")

The largest ring of wf_feature['features'][0]['rings'] consists of 768 points.


### Convert geometry system:

In [53]:
#
#    Transform feature geometry data
#
#    The function takes one parameter, a list of ESRI:102008 coordinates that will be transformed to EPSG:4326
#    The function returns a list of coordinates in EPSG:4326
def convert_ring_to_epsg4326(ring_data=None):
    converted_ring = list()
    #
    # We use a pyproj transformer that converts from ESRI:102008 to EPSG:4326 to transform the list of coordinates
    to_epsg4326 = Transformer.from_crs("ESRI:102008","EPSG:4326")
    # We'll run through the list transforming each ESRI:102008 x,y coordinate into a decimal degree lat,lon
    for coord in ring_data:
        lat,lon = to_epsg4326.transform(coord[0],coord[1])
        new_coord = lat,lon
        converted_ring.append(new_coord)
    return converted_ring

In [54]:
#
#   Convert one ring from the default to EPSG
#
#   There are two options here - depending upon whether you loaded data useing GeoJSON or the wildfire.Reader
#
#ring_in_epsg4326 = convert_ring_to_epsg4326(gj_bigest_ring)
#
ring_in_epsg4326 = convert_ring_to_epsg4326(wf_bigest_ring)
#
print(f"Ring consists of {len(ring_in_epsg4326)} points.")
#
#    If you want to print them out you can see what they look like converted.
# print(ring_in_epsg4326)
# for point in ring_in_epsg4326:
#    print(f"{point[0]},{point[1]}")


Ring consists of 768 points.


## Locate Fires Closest to City:

Next, according to the assignment instructions, my smoke estimate should only consider fires within 1250 miles of my assigned city, Richland, WA. So, let us filter for fires within this range.

"The second bit of code calculates the average distance of all perimeter points to the city (place) and returns that average as the distance. This is not quite what the centroid would be, but it is probably fairly close." [3].

**Modification Removed first function**

In [55]:
#    
#    The function takes two parameters
#        A place - which is coordinate point (list or tuple with two items, (lat,lon) in decimal degrees EPSG:4326
#        Ring_data - a list of decimal degree coordinates for the fire boundary
#
#    The function returns the average miles from boundary to the place
#
def average_distance_from_place_to_fire_perimeter(place=None,ring_data=None):
    # convert the ring data to the right coordinate system
    ring = convert_ring_to_epsg4326(ring_data)    
    # create a epsg4326 compliant object - which is what the WGS84 ellipsoid is
    geodcalc = Geod(ellps='WGS84')
    # create a list to store our results
    distances_in_meters = list()
    # run through each point in the converted ring data
    for point in ring:
        # calculate the distance
        d = geodcalc.inv(place[1],place[0],point[1],point[0])
        distances_in_meters.append(d[2])
    #print("Got the following list:",distances_in_meters)
    # convert meters to miles
    distances_in_miles = [meters*0.00062137 for meters in distances_in_meters]
    # the esri polygon shape (the ring) requires that the first and last coordinates be identical to 'close the region
    # we remove one of them so that we don't bias our average by having two of the same point
    distances_in_miles_no_dup = distances_in_miles[1:]
    # now, average miles
    average = sum(distances_in_miles_no_dup)/len(distances_in_miles_no_dup)
    return average

Compute the distances

**Note: I used Google to find the latitute and longitude for my city [5].**

I created a constant in a form that would play nicely with the code snippets I reused from [3].

In [56]:
#LOGAN's Code:
# Define the data for my city, Richland WA:
CITY = {
    'city' : 'Richland',
    'latlon' : [46.28569070, -119.28446210]
}

**Modified cell below by replacing Redding with my city as the starting location and removing resulting irrelevant code and comments. I also added a line to extract the id from each feature/fire**
- removed portion computing shortest distance to closest point approach
- Commented out feature attributes I didn't need
- Commented out the two print statements below the distance variable definition
- added fires_in_range dictionary


In [57]:
fires_in_range_list = []

for wf_feature in tqdm(feature_list):
    #print(f"{place['city']}")
    #--------LO added next line--------
    wf_id = wf_feature['attributes']['OBJECTID']
    
    wf_year = wf_feature['attributes']['Fire_Year']
#     wf_name = wf_feature['attributes']['Listed_Fire_Names'].split(',')[0]
#     wf_size = wf_feature['attributes']['GIS_Acres']
#     wf_type = wf_feature['attributes']['Assigned_Fire_Type']
    ring_data = wf_feature['geometry']['rings'][0]
    #
    #     Compute using the average distance to all points on the perimeter
    #
    distance = average_distance_from_place_to_fire_perimeter(CITY['latlon'],ring_data)
#     print(f"Fire '{wf_name}', ID: {wf_id} ({wf_size:1.2f} acres) from {wf_year} was an average {distance:1.2f} miles to {CITY['city']}")
    ring = convert_ring_to_epsg4326(ring_data)
    perimeter_start = ring[0]
#     print(f"\tOne perimiter point lat,lon {perimeter_start[0]},{perimeter_start[1]}")
    
    #--------LO: new code below -----------
    # Make sure the fire occurred within the required time range and distance from Richland
    if (1963 <= wf_year) and (distance <= 1250):        
        #-----------TBD add more fields as needed and switch from list to dictionary
        fires_in_range_list.append(wf_id)
        
print(fires_in_range_list)
    


100%|█████████████████████████████████████████████████████████████████████████████████| 500/500 [00:47<00:00, 10.46it/s]

[]





I wanted to test how long it takes to 

**---------------------------END PROF CODE REUSE ---------------------------**

## Create an Estimate for the Smoke Severity



fire size: "GIS_Acres"
distance from city
duration???
- what duration is the EPA AQI provided for? Yearly or for each fire? If annually, I should factor the number of fires into the equation.
    * If I wanted to be fancy, I could do a dwindling thing where each week that passes without a fire drops the estimate, while each consecutive week with a fire multiplies it. 
    * How do I take fire season into account?

Size:
- What is the distribution of fire sizes EDA? According to lecture: about 94% of all fires are between 0 and 5000 acres, but can go up to 1.4M acres.
- So I need to allow for a huge range of values without my estimate going crazy. 
- A thought just occurred to me, in a simplistic way, total acreage burned in a year will represent duration, because large fires will probably burn for longer. Also, a bit more of a stretch, but a year with more frequent fires may have higher total acreage burned. 
    * However, a counter to this is what if there was one really big fire, then the AQI would be really bad for a bit but fine the rest of the year.

Distance from city: 
- Close fires should have much worse air quality than far away
- distance from city matters a lot more than size, because if a fire is far away, the AQI won't be bad at all. 

Summary: how many fires occurred in given year that were "close" to the city and significantly large?

**Description of my Smoke Estimate:**

Our next task is to create a smoke estimate.
First, I consider what will I allow for my range of values. According to airnow.gov, the AQI index ranges from 0 to 500, where 0 is the cleanest air and 500 is the worst [6]. Now, because the assignment requires me to compare my smoke estimate to the AQI information from the EPA, I would like to try to encourage a similar range for most of my estimates. 

Next, I consider what factors I want to be incorporated into my smoke severity estimate. The assignment instructions provide several helpful ideas saying "It seems reasonable that a large fire, that burns a large number of acres, and that is close to a city would put more smoke into a city than a small fire that is much further away." After giving it some thought, I will base my estimate off 1). each fire's size, measured in the number of acres burned, and 2). its distance from Richland as calculated earlier above.

Let's think about each of these factors, starting with acres burned. Now, it makes sense that my smoke estimate should be increase for larger fires and decrease for smaller fires, all else being equal. Additionally, a fire closer to Richland should result in poorer smoke conditions than a fire far away. Taken together, these two factors are at odds with each other, and so I decided to take their ratio for my smoke estimate - yielding: ***acres_burned / distance_from_city***.

There are several more changes to make to this estimate to improve it a bit. First, we can convert the acres burned to square miles burned so that the numerator and denominator have the same units. As an aside, the resulting units are now 1/mi, and to be honest I am not sure how to think about a smoke estimator with a unit attached to it, so for the sake of this assignment, I will not concern myself with the estimate's units. 

While designing my estimator, I revisited the course lecture notes from 10-30-23 and looked at the professor's findings from his exporatory data analysis. According to lecture, about 94% of all fires are between 0 and 5000 acres (approx. 7.8 square miles), but the largest was 1.4M acres (approx. 2,100 square miles). In contrast, the distances I will consider will be no greater than 1,250 miles from Richland. Because the majority of these areas burned are small relative to the distances, I added a scaling factor to the numerator so it will be weighted more evenly, by multiplying by 100. 

I also wanted to reduce the frequency of more extreme values in my smoke estimates, and so I enforce a heuristic that the minimum distance a fire can be from the city is 1 mile, and that any fire closer than that is treated by the estimate as 1 mile. Simillarly, I enforce a minimum size of burned area (0.5 square miles). Lastly, I want to ensure my estimate has a minimum value of 1, so I take the argmax of the expression and thus my final estimate is:
$$argmax(\frac{\text{(square_mi_burned * 100)}}{\text{distance_from_city}}, 1)$$

Below is the function I defined to calculate my smoke estimate.

In [12]:
#distance_from_city should be provided in miles
def calc_smoke_severity_est(acres_burned, distance_from_city):
    SCALING_FACTOR = 100
    
    #convert acres to square miles. I got formula from Google [7]
    square_mi_burned = acres_burned / 640
    
    #to simplify the estimate, set 1 as the minimum
    #distance and 0.5 as the minimum square miles burned
    if distance_from_city < 1:
        distance_from_city = 1
    
    if square_mi_burned < 0.5:
        square_mi_burned = 0.5
    
    # I learned how to do argmax() in python from [8]
    return np.argmax([(square_mi_burned * SCALING_FACTOR) / distance_from_city, 1])

Next steps: 
- actually filter data to get your list of fires
- apply your estimator to the fires
- Acquire AQI from EPA
- Compare your results to EPA

## References:
- Learning to use .gitignore:
    * https://stackoverflow.com/questions/19663093/apply-gitignore-on-an-existing-repository-already-tracking-large-number-of-file
    * https://stackoverflow.com/questions/40441450/relative-parent-directory-path-for-gitignore
- Learning to work with GIS data:
    * [3] wildfire_geo_proximity_example.ipynb. //"This code example was developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. This code is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.0 - August 13, 2023"
- [4]. Proffessor's wildfire module
- [5]. https://www.latlong.net/place/west-richland-wa-usa-25340.html
- [6]. https://www.airnow.gov/aqi/aqi-basics/
- [7]. https://www.google.com/search?q=how+to+convert+acres+to+square+miles&rlz=1C1RXQR_enUS1019US1019&oq=how+to+convert+from+acres+to+s&gs_lcrp=EgZjaHJvbWUqCAgDEAAYFhgeMgYIABBFGDkyCAgBEAAYFhgeMggIAhAAGBYYHjIICAMQABgWGB4yCAgEEAAYFhgeMgoIBRAAGIYDGIoFMgoIBhAAGIYDGIoF0gEIOTU3MmowajeoAgCwAgA&sourceid=chrome&ie=UTF-8
- [8]. https://www.geeksforgeeks.org/numpy-argmax-python/
- [9]. epa_air_quality_history_example.ipynb //"This code example was developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. This code is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.1 - September 5, 2023"
- [11]. https://www.census.gov/library/reference/code-lists/ansi.html#cousub 
- [10]. https://www2.census.gov/geo/docs/reference/codes2020/cousub/st53_wa_cousub2020.txt