## Purpose

The purpose of this file is to use the filtered fire data created in the data_acquisition script to generate a smoke estimate for Pahrump, NV on an annual and per-fire basis.

### Decisions and Assumptions in Creation of Smoke Estimate
To calculate smoke estimates we first considered which features of fires might be most relevent. We decided on type of fire, acres burned, distance to town, recency of other fires in the same area, and year of fire recording. We felt these were the most likely elements which were easily accessible in our dataset and they aligned with information from [U.S. Department of the Interior Pubilcations](https://www.doi.gov/wildlandfire/news/increasing-wildfires-are-causing-greater-air-pollution). If given more time, it would also be interesting to pull vegetation for burned areas (e.g., arid climates may have less to burn and thus less smoke) as well as weather conditions (e.g., windy conditions could disperse smoke more readily than stagnant conditions).

We had the following initial set of questions regarding our features:
   1. Do perscribed burns create as much smoke as wildfires?
   2. How is acres burned related to smoke production?
   3. How does distance to the edge of the fire relate to smoke dispersion?
   4. How much does burning an area reduce smoke if its burned again within 2 years?
   5. How reliable are estimates for burned area and thus smoke per year?

We completed the following research (numbering consistent with questions above) and determined the following mathematical relationships for each feature.
    
   1. Per David Frisbey's 2008 thesis "[A comparison of smoke emissions from prescribed burns and wildfires](https://scholarworks.sjsu.edu/cgi/viewcontent.cgi?referer=&httpsredir=1&article=4554&context=etd_theses)", "The results suggest that the smoke impacts of a wildfire may not be any greater than a prescribed burn when compared using the methodology. This research demonstrates how a combination of the fuel load and the size of the burn may be more significant in controlling downwind concentration of PM10 than the atmospheric conditions. Even when there is a planned burn under prescribed meteorological conditions there can be significant impacts if the size of the burn and fuel loading are not also considered." Referencing [Forest Service Professionals Prepare for a Prescribed Burn](https://www.fs.usda.gov/features/professionals-prepare-for-prescribed-burn), we can see that forestry services do take fuel moisture, forest stand characteristics, historical data, terrain, and elevation into account. Given David's findings, and assuming the Forest Service correctly accounts for the fuel variables to create less intense blazes, we will multiply the smoke estimate for perscribed burns by 0.50.  
    
   2. Composition of fuel certainly impacts the amount of smoke generated per acre burned. Given we are not currently bringing in additional vegetation information, we will assume a linear relationship between acres burned and smoke produced.
    
   3. Many factors impact smoke dispursion (direction of terrain, wind, other atmospheric conditions), but we will focus on distance to town as our primary variable. We will assume that smoke dispurses like light (1/distance^2) relationship, see [Nasa.gov](https://imagine.gsfc.nasa.gov/features/yba/M31_velocity/lightcurve/more.html#:~:text=Notice%20that%20as%20the%20distance,proportional%20to%20the%20distance%20squared) for more details.
    
   4. Fires which attempt to burn in areas burnt within 2 years should produce much less smoke than the original fires which burned there (see [Burn out: Frequent fires are changing Western landscapes](https://www.hcn.org/articles/climate-desk-wildfire-burn-out-frequent-fires-are-changing-western-landscapes#:~:text=However%2C%20overall%2C%20re%2Dburns,just%20fine%2C%E2%80%9D%20he%20said.) for more details). We will multiply smoke estimates for fires burning in burned areas by a factor of 0.20. 
    
   5. The USGS wildland fire metadata note that "Areas burned prior to 1984 in this dataset represent only a fraction of what actually burned. While areas burned on or after 1984 are much more accurate and complete, errors still can and do occur." Given the underestimation of acres burned, we will multiple acreage estimates for fires prior to 1984 by 1.5.
    
Using the above research, we arrive at a base smoke estimate of: (Acres burned)/(Distance to town)^2. "Smokes" are in acres/miles^2. If parts of the acres burned were previously burned within the past 2 years, we will multiply that number of acres by 0.2 to indicate they will produce less smoke. If fires occurred prior to 1984 we will multiply the entire smoke estimate by 1.5, and if the fire was prescribed we will multiply the entire smoke estimate by 0.5.

Next we needed to decide how to combine individual fire smoke estimates to produce an annual estimate of smoke in Pahrump, NV. Because we are compiling smoke estimates at the annual level not monthly level an "amortization" of smoke throughout the year did not feel necessary. Further, parsing fires by sub-year dates is unreliable due to multiple dates recorded from merging datasets (even years is somewhat unreliable, see USGS metadata for more information), leading to a difficult attribution throughout the year for a monthly/amortized view. Finally, we assume most fires are contained to "fire season", making it unlikely that smoke from 1 year will bleed into the next year.

To create annual smokiness estimates we will average all individual fire smokiness estimates over a year timespan. We chose to do this because when comparing with other estimates of pollution (e.g., AQI) we felt the average of values would be most appropriate.

First we begin by loading common python libraries and the filtered fire data created in the data_acquisiton script.

In [117]:
#Imports
import os, json, time
import pandas as pd
import numpy as np

#Importing the filtered fire data for 1963 onward within 1250 miles of Pahrump, NV
fire_data = pd.read_csv('../intermediate_data/filtered_fire_info.csv')

#Checking the output
if fire_data['Fire_Year'].max() > 2023:
    print("ERROR - fires after 2023 included in data")
else:
    print("The max fire year is {0}".format(fire_data['Fire_Year'].max()))
if fire_data['Fire_Year'].min() < 1963:
    print("ERROR - fires before 1963 included in data")
else:
    print("The min fire year is {0}".format(fire_data['Fire_Year'].min()))
if fire_data['shortest_dist'].max() > 1250:
    print("ERROR - fires beyond 1250 miles included in data")
else:
    print("The fartherst fire is {0} miles from town".format(fire_data['shortest_dist'].max()))
    print("The closest fire is {0} miles from town".format(fire_data['shortest_dist'].min()))
print("There are {0} fires after 1963 within 1250 miles of Pahrump, NV.".format(len(fire_data)))

#Outputting example of what file looks like
#print()
#print("The filtered fire data looks like:")
#print(fire_data.head())

The max fire year is 2020
The min fire year is 1963
The fartherst fire is 1249.99 miles from town
The closest fire is 8.52 miles from town
There are 81351 fires after 1963 within 1250 miles of Pahrump, NV.


First, we will calculate how many acres have been burned in the past 2 years, versus acres not burned in the past 2 years. This is relevent when applying the 0.2 "previous burn reduction factor." In this process we will also cast any NaN values to 0 to enable math calculations.

In [118]:
#Replace Nans with 0s
fire_data = fire_data.fillna(0)

#Defining what the values for preburn_acres should be
def burn_splitter(row):
    if row['Overlap_Within_1_or_2_Flag'] == 0:
        return 0
    if row['Overlap_Within_1_or_2_Flag'] != 0:
        try:
            return float(row['Overlap_Within_1_or_2_Flag'].split(' ')[22][1:])
        except ValueError as ve:
            return -1
    return row['preburn_acres']

#Apply the function to each row
fire_data['preburn_acres'] = fire_data.apply(burn_splitter, axis=1)

#See if we changed all the values
error_burn = len(fire_data.loc[fire_data['preburn_acres']== -1])
if error_burn >0:
    print("Warning - {0} preburned acres could not be read.".format(error_burn))
    print("This is {0}% of all fires".format(round((error_burn/len(fire_data))*100, 2)))
    if error_burn/len(fire_data) < 10:
        print("Because this is < 10% of fires, changing the misattributed preburned acres to 0")
        fire_data.loc[fire_data['preburn_acres']==-1, 'preburn_acres'] = 0
    else:
        print("This is > 10% of fires. User should decide what to do with these acres")

#Create a column containing "new_burn_acres" which is GIS_Acres - preburn acres
fire_data['new_burn_acres'] = fire_data['GIS_Acres']-fire_data['preburn_acres']

#View new df
#print(fire_data.loc[fire_data['Overlap_Within_1_or_2_Flag']!=0].head())

This is 6.57% of all fires
Because this is < 10% of fires, changing the misattributed preburned acres to 0


Now we will calculate the smoke for each of the wildfires.

In [121]:
#Adjusting for pre-1984 estimates
fire_data.loc[fire_data['Fire_Year']<1984, 'smoke_estimate'] = 0

#Defining what the values for preburn_acres should be
def smoke_est(row):
    smoke_estimate = (row['new_burn_acres']/(row['shortest_dist']**2) +
                    (row['preburn_acres']*0.2)/(row['shortest_dist'])**2)
    if row['Fire_Year'] < 1984:
        smoke_estimate = smoke_estimate * 1.5
    if row['Assigned_Fire_Type'] in ('Prescribed Fire', 'Unknown - Likely Prescribed Fire'):
        smoke_estimate = smoke_estimate * 0.5
    return smoke_estimate

#Apply the function to each row
fire_data['smoke_estimate'] = fire_data.apply(smoke_est, axis=1)

#Check out the results
#print(fire_data.loc[fire_data['Overlap_Within_1_or_2_Flag']!=0].head())

Next we will combine the individual fire smoke estimates to create annual smoke estimates.

In [132]:
#Grouping by year and summing all smokiness estimates
annual_smoke_estimates = pd.DataFrame({
    'Annual_Smoke_Estimate': fire_data.groupby(['Fire_Year'])['smoke_estimate'].sum()}).reset_index()

#Save the file to a csv
annual_smoke_estimates.to_csv('../clean_data/annual_smoke_estimate.csv')

#Print results
print("The annual smoke estimates are as follows:")
print(annual_smoke_estimates)

The annual smoke estimates are as follows:
    Fire_Year  Annual_Smoke_Estimate
0        1963               2.123318
1        1964               7.581578
2        1965               2.318274
3        1966               9.220398
4        1967               9.041263
5        1968               8.405211
6        1969               3.905857
7        1970              27.210594
8        1971               5.544000
9        1972               5.892116
10       1973              10.040598
11       1974               7.296173
12       1975              11.195640
13       1976               6.579943
14       1977               9.379401
15       1978               4.047137
16       1979              16.806874
17       1980              21.217983
18       1981              67.954338
19       1982              19.109597
20       1983              10.964045
21       1984              16.247834
22       1985              25.112015
23       1986              16.580361
24       1987              26.01