# Part 1 - Common Analysis | Wildfire Data Pull

Common Analysis is the first step of our borader project that focuses on the impact of the wildfires in US. In this common analysis, we are going into the first analyze in detail the fires of our alloted city and come up with an estimate. This is a rough estimate the wildfire impacts on the city which might have a more profound negative effect on  health, tourism, property, and other aspects of society.

# Step 0 - Data Acquistion

## Data Description

Assigned City - Bismarck, North Dakota [46.825905, -100.778275.]

Objective - Analyze wildfire impacts on the city assigned

In this step we aquire the data that contains the information of all the wildfires from 1800s to the present. [(Combined wildland fire datasets for the United States and certain territories, 1800s-Present (combined wildland fire polygons)](https://www.sciencebase.gov/catalog/item/61aa537dd34eb622f699df81) The initial task is to ingest this data and get the data relevant to the assigned city (Bismarck) 

The data resides in a GeoJSON zip file (“Wildland Fire Polygons Fire Feature Data Open Source GeoJSON Files”). The file that contains the data of interest is in [USGS_Wildland_Fire_Combined_Dataset.json](https://drive.google.com/file/d/14BO0R0uLR8kYZ_r7529Pa61C2-qJd0dx/view?usp=drive_link).

## Data Ingestion

For data ingestion I will be referring to the code sample provided [here](https://drive.google.com/file/d/1RFVyIQ5bfrGZH72V8ItnRrnFiTaHYb3e/view?usp=drive_link). This contains the code that illustrate how to perform some basic geodetic computations related to the Wildfire course project. The notebook is structure as a set of examples that illustrate something about the structure of the data or illustrate a way to compute specific values. This notebook is not a tutorial on performing geodetic computations, but illustrates a number of key concepts. This notebook should provide enough information to complete the Wildfire assignment. The complete [Wildfire dataset](https://www.sciencebase.gov/catalog/item/61aa537dd34eb622f699df81) can be retrieved from a US government repository. I have noticed that the repository is sometimes "down" and does not respond. It probably makes sense to get the dataset as soon as possible.

This notebook has dependencies on [Pyproj](https://pyproj4.github.io/pyproj/stable/index.html), the [geojson](https://pypi.org/project/geojson/) module and on the wildfire user module. Pyproj and geojson can be installed via pip. The wildfire user module should be downloaded from the course website, unzipped, and moved into the folder pointed to by your PYTHONPATH system variable.

### License
This code example was developed by Dr. David W. McDonald for use in DATA 512, a course in the UW MS Data Science degree program. This code is provided under the [Creative Commons](https://creativecommons.org) [CC-BY license](https://creativecommons.org/licenses/by/4.0/). Revision 1.0 - August 13, 2023

## Setup

We first set the working dependencies and constants that are required to process the GeoJSON. Sample code has been obtained from [this notebook](https://drive.google.com/file/d/1RFVyIQ5bfrGZH72V8ItnRrnFiTaHYb3e/view)

The setup contains the following steps
1. Import all relevant packages
2. Define all the relevant constants that will be used throughout the script.
3. Reading the GeoJSON Reader module from the [sample code](https://drive.google.com/drive/folders/1MskdQFkAoYK414BkRJjCGvGw0MQd-P_6?usp=drive_link)

In [1]:
# pip install geojson

In [2]:
# Import the sys module, which provides access to variables used or maintained by the Python interpreter
import sys

# Insert a directory path to the system path, allowing Python to search for modules in that directory
sys.path.insert(1, '../dependencies/wildfire/wildfire')

# Import the pandas library and alias it as pd
import pandas as pd

# Import the numpy library and alias it as np
import numpy as np

# Import the json module, which provides methods for working with JSON data
import json

# Import the geodesic function from the geopy.distance module
from geopy.distance import geodesic

# Import the Transformer and Geod classes from the pyproj module
from pyproj import Transformer, Geod

# Import the Reader class from the Reader module and alias it as WFReader
from Reader import Reader as WFReader

# There is a GeoJSON reader that you might try if you wanted to read the data. It has its own quirks.
# There will be an example below that reads the sample file "Wildfire_short_sample.json"

import geojson
import time

'C:\\Users\\shwet\\Downloads\\GeoJSON Files\\GeoJSON Exports\\USGS_Wildland_Fire_Combined_Dataset.json'

In [4]:
SAMPLE_DATA_FILENAME = r"C:\Users\shwet\Downloads\Data512 - ProjectPart1_CommonAnalysis_ShwetaManjunath\dependencies\wildfire\wildfire\Wildfire_short_sample.json"

# print out where we think we're going to find the sample data
print(f"{SAMPLE_DATA_FILENAME=}")

SAMPLE_DATA_FILENAME='C:\\Users\\shwet\\Downloads\\Data512 - ProjectPart1_CommonAnalysis_ShwetaManjunath\\dependencies\\wildfire\\wildfire\\Wildfire_short_sample.json'


In [12]:
# The 'Wildfire_short_sample.json' is an extraction from the full 'USGS_Wildland_Fire_Combined_Dataset.json'
# dataset extracting some of the major wildfires in California. These were extracted by name, with possible names
# coming from https://en.wikipedia.org/wiki/List_of_California_wildfires

# The sample file includes data for 13 fires, mostly oriented around the uniqueness of the name. Naming conventions
# for wildfires is really adhoc, which makes finding any named fire in the dataset a disambiguation mess. The point
# of the sample is to provide something small to test with before committing to processing the much larger full dataset.

# data file that contains the GeoJSON 
EXTRACT_FILENAME = r"C:\Users\shwet\Downloads\GeoJSON Files\GeoJSON Exports\USGS_Wildland_Fire_Combined_Dataset.json"
EXTRACT_FILENAME

'C:\\Users\\shwet\\Downloads\\GeoJSON Files\\GeoJSON Exports\\USGS_Wildland_Fire_Combined_Dataset.json'

In [6]:
# A dictionary of some city locations from the US west coast states.
CITY_DATA = {
    'bismarck' :     {'city'   : 'bismarck', 
                    'latlon' : [46.825905, -100.778275]}}

You have a number of different options for working with the wildfire data. Python has GeoJSON readers and ArcGIS file readers as distribution modules you can install. If you are a pandas pro, it supposedly has GeoJSON and ArcGIS data readers.

# Step 1 - Load the GeoJSON Data

This information is primarily used to obtain the fire estimate. The common research question that you are to answer is:
What are the estimated smoke impacts on your assigned city for the last 60 years? 
We have to create an annual estimate of wildfire smoke in our assigned city. This estimate is just a number that we can eventually use to build a predictive model. Technically, smoke impact should probably be considered the health, tourism, economic or other social problems that result from the smoke. For this we'll generically call your estimate the wildfire smoke impact. We will consider other potential social and economic impacts during Part 2 of the course project. For now, we need some kind of number to represent an estimate of the smoke our city saw during each annual fire season.
 

Why is this an estimate of fire smoke? These are estimates because of a number of problems that are not easy to resolve and simplifications to make this course project reasonable for just a few weeks of work. One example is that actual smoke impact is based on wind direction over a course of several days, the intensity of the fire, and its duration. However, the fire polygon data only gives a year for each fire - it does not provide specific start and end dates for the fire

Your smoke estimate should adhere to the following conditions:
1. The estimate only considers the last 60 years of wildland fires (1963-2023).
2. The estimate only considers fires that are within 1250 miles of your assigned city.
3. An annual fire season will run from May 1st through October 31st.


## Load the wildfire data using the geojson module

In this example we use the GeoJSON module ([documentation](https://pypi.org/project/geojson/), [GitHub repo](https://github.com/jazzband/geojson)) to load the sample file. This module works mostly the way you would expect. GeoJSON is mostly just JSON, so actually, you don't even really need to use the GeoJSON module. However, that module will do some conversion of Geo type things to something useful. However, this example, and the examples that follow, do not rely on specific Geo features from geojson.

In [7]:
# Open a file, load it with the geojson loader

print(f"Attempting to open '{SAMPLE_DATA_FILENAME}'")
geojson_file = open(SAMPLE_DATA_FILENAME,"r")
print(f"Using GeoJSON module to load sample file '{SAMPLE_DATA_FILENAME}'")
gj_data = geojson.load(geojson_file)
geojson_file.close()

Attempting to open 'C:\Users\shwet\Downloads\Data512 - ProjectPart1_CommonAnalysis_ShwetaManjunath\dependencies\wildfire\wildfire\Wildfire_short_sample.json'
Using GeoJSON module to load sample file 'C:\Users\shwet\Downloads\Data512 - ProjectPart1_CommonAnalysis_ShwetaManjunath\dependencies\wildfire\wildfire\Wildfire_short_sample.json'


In [8]:

# Print the keys from the object

gj_keys = list(gj_data.keys())
print("The loaded JSON dictionary has the following keys:")
print(gj_keys)
print()

# For all GeoJSON type things, the most important part of the file are the 'features'. 
# In the case of the wildfire dataset, each feature is a polygon (ring) of points that define the bounary of a fire

count = 0
for feature in gj_data['features']:
    count += 1
    #print(json.dumps(feature,indent=4))
    time.sleep(0.5)   # this slows the output to fix output rate limits for Jupyter

print(f"Found {count} features in the variable 'gj_data' ")

The loaded JSON dictionary has the following keys:
['displayFieldName', 'fieldAliases', 'geometryType', 'spatialReference', 'fields', 'features']

Found 13 features in the variable 'gj_data' 


In [9]:
# Get the first item in the list of features

SLOT = 0
gj_feature = gj_data['features'][SLOT]

# Print everyting in this dictionary (i.e., gj_feature) - it's long

print(f"The wildfire feature from slot '{SLOT}' of the loaded gj_data['features']")
print(json.dumps(gj_feature, indent=4))


The wildfire feature from slot '0' of the loaded gj_data['features']
{
    "attributes": {
        "OBJECTID": 4956,
        "USGS_Assigned_ID": 4956,
        "Assigned_Fire_Type": "Wildfire",
        "Fire_Year": 1932,
        "Fire_Polygon_Tier": 1,
        "Fire_Attribute_Tiers": "1 (1), 3 (3)",
        "GIS_Acres": 219999.23754748085,
        "GIS_Hectares": 89030.53273921262,
        "Source_Datasets": "Comb_National_NIFC_Interagency_Fire_Perimeter_History (1), Comb_National_USFS_Final_Fire_Perimeter (1), Comb_National_WFDSS_Interagency_Fire_Perimeter_History (1), Comb_State_California_Wildfire_Polygons (1)",
        "Listed_Fire_Types": "Wildfire (3), Likely Wildfire (1)",
        "Listed_Fire_Names": "MATILIJA (4)",
        "Listed_Fire_Codes": "No code provided (4)",
        "Listed_Fire_IDs": "0 (3)",
        "Listed_Fire_IRWIN_IDs": "",
        "Listed_Fire_Dates": "Listed Wildfire Discovery Date(s): 1932-09-07 (2) | Listed Other Fire Date(s): 1899-12-30 - REVDATE field (1), 

In [10]:

# Every feature has a 'geometry' which specifies geo coordinates that make up each geographic thing
# In the case of the wildfire data, most wildfires are bounded shapes, circles, squares, etc. This is
# represented by shapes called 'rings' in GeoJSON.

# Get the geometry for the feature we pulled from the feature_list
gj_geometry = gj_feature['geometry']
# The largest shape (ring) is supposed to be item zero in the list of 'rings'
gj_bigest_ring = gj_geometry['rings'][0]

print(f"The largest ring of gj_feature['features'][{SLOT}]['rings'] consists of {len(gj_bigest_ring)} points.")

The largest ring of gj_feature['features'][0]['rings'] consists of 1132 points.


## Load the wildfire data using the WildFire Reader module

In the following cells we provide small code snippets that do the following:

Create a wildfire Reader() object and use it to open the sample data file. Once, opened, we have access to the header information so we print that to show the structure of that data.
Use the Reader() object and the next() method to read a set of wildfire features. The small sample file should have 13 of them.
Print one example feature showing the dictionary data structure of a feature.
Access the geometry of one specific feature to get the 'ring' boundary of that specific fire - which is a list of geodetic coordinates.
Note that some of the output cells are quite long. Once you understand what they are illustrating you might want to collapse them or comment out the print statements that generate the output.

Another note regarding terminology. In the GeoJSON standard, something that is to be represented geographically is generically called a 'feature'. In the case of the wildfire dataset every 'feature' is a wildfire. These terms are used somewhat interchangably below.

We are using the custom-built model, from the wildfire module to read the metadata of the JSON to understand the underlying metadata

In [13]:
print(f"Attempting to open '{EXTRACT_FILENAME}' with wildfire.Reader() object")
wfreader = WFReader(EXTRACT_FILENAME)
print()

header_dict = wfreader.header()
header_keys = list(header_dict.keys())
#print("The header has the following keys:")
#print(gj_keys)
print()
print("Header Dictionary")
print(json.dumps(header_dict,indent=4))

Attempting to open 'C:\Users\shwet\Downloads\GeoJSON Files\GeoJSON Exports\USGS_Wildland_Fire_Combined_Dataset.json' with wildfire.Reader() object


Header Dictionary
{
    "displayFieldName": "",
    "fieldAliases": {
        "OBJECTID": "OBJECTID",
        "USGS_Assigned_ID": "USGS Assigned ID",
        "Assigned_Fire_Type": "Assigned Fire Type",
        "Fire_Year": "Fire Year",
        "Fire_Polygon_Tier": "Fire Polygon Tier",
        "Fire_Attribute_Tiers": "Fire Attribute Tiers",
        "GIS_Acres": "GIS_Acres",
        "GIS_Hectares": "GIS_Hectares",
        "Source_Datasets": "Source Datasets",
        "Listed_Fire_Types": "Listed Fire Types",
        "Listed_Fire_Names": "Listed Fire Names",
        "Listed_Fire_Codes": "Listed Fire Codes",
        "Listed_Fire_IDs": "Listed Fire IDs",
        "Listed_Fire_IRWIN_IDs": "Listed Fire IRWIN IDs",
        "Listed_Fire_Dates": "Listed Fire Dates",
        "Listed_Fire_Causes": "Listed Fire Causes",
        "Listed_Fire_Cause_Class"

Now we will load all the features of our dataset. The GeoJson contains information of all the fires from 1880s till the current date. This blows up the size of the data and makes it hard for the system to read it directly. Hence, here, we are storing all the wildfire features in a list for easy access. We are also logging the number of features processed

In [14]:
#
#    This sample code will load the whole sample file, or a small amount of the complete dataset.
#
MAX_FEATURE_LOAD = 100
feature_list = list()
feature_count = 0
# A rewind() on the reader object makes sure we're at the start of the feature list
# This way, we can execute this cell multiple times and get the same result 
wfreader.rewind()
# Now, read through each of the features, saving them as dictionaries into a list
feature = wfreader.next()
while feature:
    feature_list.append(feature)
    feature_count += 1
    # if we're loading a lot of features, print progress
    if (feature_count % 100) == 0:
        print(f"Loaded {feature_count} features")
   
    feature = wfreader.next()
#
#    Print the number of items (features) we think we loaded
print(f"Loaded a total of {feature_count} features")
#
#    Just a validation check - did all the items we loaded get into the list?
print(f"Variable 'feature_list' contains {len(feature_list)} features")




Loaded 100 features
Loaded 200 features
Loaded 300 features
Loaded 400 features
Loaded 500 features
Loaded 600 features
Loaded 700 features
Loaded 800 features
Loaded 900 features
Loaded 1000 features
Loaded 1100 features
Loaded 1200 features
Loaded 1300 features
Loaded 1400 features
Loaded 1500 features
Loaded 1600 features
Loaded 1700 features
Loaded 1800 features
Loaded 1900 features
Loaded 2000 features
Loaded 2100 features
Loaded 2200 features
Loaded 2300 features
Loaded 2400 features
Loaded 2500 features
Loaded 2600 features
Loaded 2700 features
Loaded 2800 features
Loaded 2900 features
Loaded 3000 features
Loaded 3100 features
Loaded 3200 features
Loaded 3300 features
Loaded 3400 features
Loaded 3500 features
Loaded 3600 features
Loaded 3700 features
Loaded 3800 features
Loaded 3900 features
Loaded 4000 features
Loaded 4100 features
Loaded 4200 features
Loaded 4300 features
Loaded 4400 features
Loaded 4500 features
Loaded 4600 features
Loaded 4700 features
Loaded 4800 features
L

Loaded 37900 features
Loaded 38000 features
Loaded 38100 features
Loaded 38200 features
Loaded 38300 features
Loaded 38400 features
Loaded 38500 features
Loaded 38600 features
Loaded 38700 features
Loaded 38800 features
Loaded 38900 features
Loaded 39000 features
Loaded 39100 features
Loaded 39200 features
Loaded 39300 features
Loaded 39400 features
Loaded 39500 features
Loaded 39600 features
Loaded 39700 features
Loaded 39800 features
Loaded 39900 features
Loaded 40000 features
Loaded 40100 features
Loaded 40200 features
Loaded 40300 features
Loaded 40400 features
Loaded 40500 features
Loaded 40600 features
Loaded 40700 features
Loaded 40800 features
Loaded 40900 features
Loaded 41000 features
Loaded 41100 features
Loaded 41200 features
Loaded 41300 features
Loaded 41400 features
Loaded 41500 features
Loaded 41600 features
Loaded 41700 features
Loaded 41800 features
Loaded 41900 features
Loaded 42000 features
Loaded 42100 features
Loaded 42200 features
Loaded 42300 features
Loaded 424

Loaded 75200 features
Loaded 75300 features
Loaded 75400 features
Loaded 75500 features
Loaded 75600 features
Loaded 75700 features
Loaded 75800 features
Loaded 75900 features
Loaded 76000 features
Loaded 76100 features
Loaded 76200 features
Loaded 76300 features
Loaded 76400 features
Loaded 76500 features
Loaded 76600 features
Loaded 76700 features
Loaded 76800 features
Loaded 76900 features
Loaded 77000 features
Loaded 77100 features
Loaded 77200 features
Loaded 77300 features
Loaded 77400 features
Loaded 77500 features
Loaded 77600 features
Loaded 77700 features
Loaded 77800 features
Loaded 77900 features
Loaded 78000 features
Loaded 78100 features
Loaded 78200 features
Loaded 78300 features
Loaded 78400 features
Loaded 78500 features
Loaded 78600 features
Loaded 78700 features
Loaded 78800 features
Loaded 78900 features
Loaded 79000 features
Loaded 79100 features
Loaded 79200 features
Loaded 79300 features
Loaded 79400 features
Loaded 79500 features
Loaded 79600 features
Loaded 797

Loaded 111900 features
Loaded 112000 features
Loaded 112100 features
Loaded 112200 features
Loaded 112300 features
Loaded 112400 features
Loaded 112500 features
Loaded 112600 features
Loaded 112700 features
Loaded 112800 features
Loaded 112900 features
Loaded 113000 features
Loaded 113100 features
Loaded 113200 features
Loaded 113300 features
Loaded 113400 features
Loaded 113500 features
Loaded 113600 features
Loaded 113700 features
Loaded 113800 features
Loaded 113900 features
Loaded 114000 features
Loaded 114100 features
Loaded 114200 features
Loaded 114300 features
Loaded 114400 features
Loaded 114500 features
Loaded 114600 features
Loaded 114700 features
Loaded 114800 features
Loaded 114900 features
Loaded 115000 features
Loaded 115100 features
Loaded 115200 features
Loaded 115300 features
Loaded 115400 features
Loaded 115500 features
Loaded 115600 features
Loaded 115700 features
Loaded 115800 features
Loaded 115900 features
Loaded 116000 features
Loaded 116100 features
Loaded 1162

In [15]:
# Get the first item in the list of features

SLOT = 0
gj_feature = gj_data['features'][SLOT]

# Print everyting in this dictionary (i.e., gj_feature) - it's long

print(f"The wildfire feature from slot '{SLOT}' of the loaded gj_data['features']")
print(json.dumps(gj_feature, indent=4))


The wildfire feature from slot '0' of the loaded gj_data['features']
{
    "attributes": {
        "OBJECTID": 4956,
        "USGS_Assigned_ID": 4956,
        "Assigned_Fire_Type": "Wildfire",
        "Fire_Year": 1932,
        "Fire_Polygon_Tier": 1,
        "Fire_Attribute_Tiers": "1 (1), 3 (3)",
        "GIS_Acres": 219999.23754748085,
        "GIS_Hectares": 89030.53273921262,
        "Source_Datasets": "Comb_National_NIFC_Interagency_Fire_Perimeter_History (1), Comb_National_USFS_Final_Fire_Perimeter (1), Comb_National_WFDSS_Interagency_Fire_Perimeter_History (1), Comb_State_California_Wildfire_Polygons (1)",
        "Listed_Fire_Types": "Wildfire (3), Likely Wildfire (1)",
        "Listed_Fire_Names": "MATILIJA (4)",
        "Listed_Fire_Codes": "No code provided (4)",
        "Listed_Fire_IDs": "0 (3)",
        "Listed_Fire_IRWIN_IDs": "",
        "Listed_Fire_Dates": "Listed Wildfire Discovery Date(s): 1932-09-07 (2) | Listed Other Fire Date(s): 1899-12-30 - REVDATE field (1), 

In [16]:

# Every feature has a 'geometry' which specifies geo coordinates that make up each geographic thing
# In the case of the wildfire data, most wildfires are bounded shapes, circles, squares, etc. This is
# represented by shapes called 'rings' in GeoJSON.

# Get the geometry for the feature we pulled from the feature_list
gj_geometry = gj_feature['geometry']
# The largest shape (ring) is supposed to be item zero in the list of 'rings'
gj_bigest_ring = gj_geometry['rings'][0]

print(f"The largest ring of gj_feature['features'][{SLOT}]['rings'] consists of {len(gj_bigest_ring)} points.")

The largest ring of gj_feature['features'][0]['rings'] consists of 1132 points.


# Step 2 - Pulling Wildfire Data

## Getting Distance information

Once you are able to read the wildfire data, you need to be able to find all of the fires within a specified distance from your city. This is a geodetic distance computation. If you are a GIS wizard, you'll know how to do this in your sleep. However, for most of us, who have never done this before, an example is really helpful. 

In [17]:
# helper function
def convert_ring_to_epsg4326(ring_data=None):
    """
    Converts a list of coordinates from a specified coordinate system (ESRI:102008) to EPSG:4326 (WGS84).

    Parameters:
        ring_data (list): List of coordinate pairs in the original coordinate system.

    Returns:
        list: List of coordinate pairs in EPSG:4326 (WGS84).
    """
    
    # Initialize an empty list to store the converted coordinates
    converted_ring = list()
    
    # Create a Transformer from ESRI:102008 to EPSG:4326
    to_epsg4326 = Transformer.from_crs("ESRI:102008", "EPSG:4326")
    
    # Iterate through each coordinate in the input ring_data
    for coord in ring_data:
        # Transform the coordinate to EPSG:4326
        lat, lon = to_epsg4326.transform(coord[0], coord[1])
        new_coord = lat, lon
        # Append the transformed coordinate to the converted_ring list
        converted_ring.append(new_coord)
    
    # Return the list of converted coordinates
    return converted_ring


In [18]:
# helper function
def shortest_distance_from_place_to_fire_perimeter(place=None, ring_data=None):
    """
    Calculates the shortest distance from a specified place to a fire perimeter defined by a ring of coordinates.

    Parameters:
        place (tuple): A tuple representing the (latitude, longitude) of the specified place.
        ring_data (list): List of coordinate pairs defining the fire perimeter.

    Returns:
        list or None: A list containing the shortest distance in miles and the corresponding coordinate.
                      Returns None if the input data is invalid or empty.
    """
    
    # Convert the ring coordinates to EPSG:4326 (WGS84)
    ring = convert_ring_to_epsg4326(ring_data)
    
    # Create a Geod object for geodetic calculations
    geodcalc = Geod(ellps='WGS84')
    
    # Initialize a list to store the closest point and distance
    closest_point = list()

    # Iterate through each point in the ring
    for point in ring:
        # Calculate the geodetic distance between the specified place and the current point in the ring
        d = geodcalc.inv(place[1], place[0], point[1], point[0])
        
        # Convert the distance to miles
        distance_in_miles = d[2] * 0.00062137
        
        # Check if closest_point is empty or if the current distance is shorter than the stored distance
        if not closest_point or closest_point[0] > distance_in_miles:
            # Update closest_point with the current distance and point
            closest_point = [distance_in_miles, point]

    # Return the closest_point list, or None if no valid data is available
    return closest_point if closest_point else None


In [58]:
def process_wf_feature(wf_feature, city_latlon):
    """
    Process a wildfire feature and calculate the shortest distance from the specified city.

    Parameters:
        wf_feature (dict): Wildfire feature with geometry and attributes.
        city_latlon (tuple): Tuple representing the (latitude, longitude) of the city.

    Returns:
        tuple or None: A tuple containing the OBJECTID and the shortest distance from the city.
                      Returns None if the input data is invalid or if the feature has an irregular shape.
    """
    try:
        # Extract ring data from the wildfire feature
        ring_data = wf_feature['geometry']['rings'][0]
        
        # Calculate the shortest distance from the city to the fire perimeter
        distance = shortest_distance_from_place_to_fire_perimeter(city_latlon, ring_data)
        
        # Return OBJECTID and the rounded shortest distance
        return wf_feature['attributes']['OBJECTID'], round(distance[0], 2)
    except KeyError:
        # Handle the case where the feature has an irregular shape
        print("{0} fire has an irregular shape; skipping.".format(wf_feature['attributes']['OBJECTID']))
    return None

def process_features(feature_list, city_latlon):
    """
    Process a list of wildfire features and create a DataFrame with OBJECTID and shortest distances.

    Parameters:
        feature_list (list): List of wildfire features.
        city_latlon (tuple): Tuple representing the (latitude, longitude) of the city.

    Returns:
        pd.DataFrame: DataFrame containing OBJECTID and shortest distances.
    """
    fire_ids = []
    shortest_dist_from_edge_list = []

    features_processed = 0
    for wf_feature in feature_list:
        # Process each wildfire feature and obtain the result
        result = process_wf_feature(wf_feature, city_latlon)
        
        # Check if the result is valid
        if result:
            fire_ids.append(result[0])
            shortest_dist_from_edge_list.append(result[1])

        features_processed += 1
        if features_processed % 1000 == 0:
            print("Processed {0} features".format(features_processed))
        if features_processed % 10000 == 0:
            # Create a DataFrame and save it to a CSV file periodically
            dist_df = pd.DataFrame({'OBJECTID': fire_ids, 'shortest_dist': shortest_dist_from_edge_list})
            dist_df.to_csv('distance.csv', index=False)

    # Create the final DataFrame with OBJECTID and shortest distances
    return pd.DataFrame({'OBJECTID': fire_ids, 'shortest_dist': shortest_dist_from_edge_list})

In [19]:
# Example usage
city_data = CITY_DATA["bismarck"]
# city_data = {'latlon': city_data['latlon']}  # Replace with actual city coordinates
distance_df = process_features(feature_list, city_data['latlon'])
distance_df.to_csv('distance.csv', index=False)

Processed 1000 features
Processed 2000 features
Processed 3000 features
Processed 4000 features
Processed 5000 features
Processed 6000 features
Processed 7000 features
Processed 8000 features
Processed 9000 features
Processed 10000 features
Processed 11000 features
Processed 12000 features
Processed 13000 features
Processed 14000 features
Processed 15000 features
Processed 16000 features
Processed 17000 features
Processed 18000 features
Processed 19000 features
Processed 20000 features
Processed 21000 features
Processed 22000 features
Processed 23000 features
Processed 24000 features
Processed 25000 features
Processed 26000 features
Processed 27000 features
Processed 28000 features
Processed 29000 features
Processed 30000 features
Processed 31000 features
Processed 32000 features
Processed 33000 features
Processed 34000 features
Processed 35000 features
Processed 36000 features
Processed 37000 features
Processed 38000 features
Processed 39000 features
Processed 40000 features
Processed

## Getting other attributes of the fire

It seems reasonable that a large fire, that burns a large number of acres, and that is close to a city would put more smoke into a city than a small fire that is much further away. One task is to define your smoke estimate and then apply it to every fire for your city. Should your smoke estimate be cumulative during each year or somehow amortized over the fire season? Document what goes into your decision making and what you did to create your smoke estimate.

**Note** - I have considered the fires that have been burned since 1963

In [52]:
# Extracting data for Bismarck, North Dakota (Riverside County): Combined dataset (11gb)

filtered_fires = []
feature_count = 0
cnt = 0

wf_reader = WFReader(EXTRACT_FILENAME)
wf_feature = wf_reader.next()

In [53]:
while wf_feature:
    
    if  wf_feature['attributes']['Fire_Year'] >= 1963:
        if 'rings' in wf_feature['geometry']:
            filtered_fires.append(wf_feature['attributes'])

        
    feature_count += 1
    if feature_count % 1000 == 0:
            print("Processed {0} fires".format(feature_count))
    
    wf_feature = wf_reader.next()


Processed 1000 fires
Processed 2000 fires
Processed 3000 fires
Processed 4000 fires
Processed 5000 fires
Processed 6000 fires
Processed 7000 fires
Processed 8000 fires
Processed 9000 fires
Processed 10000 fires
Processed 11000 fires
Processed 12000 fires
Processed 13000 fires
Processed 14000 fires
Processed 15000 fires
Processed 16000 fires
Processed 17000 fires
Processed 18000 fires
Processed 19000 fires
Processed 20000 fires
Processed 21000 fires
Processed 22000 fires
Processed 23000 fires
Processed 24000 fires
Processed 25000 fires
Processed 26000 fires
Processed 27000 fires
Processed 28000 fires
Processed 29000 fires
Processed 30000 fires
Processed 31000 fires
Processed 32000 fires
Processed 33000 fires
Processed 34000 fires
Processed 35000 fires
Processed 36000 fires
Processed 37000 fires
Processed 38000 fires
Processed 39000 fires
Processed 40000 fires
Processed 41000 fires
Processed 42000 fires
Processed 43000 fires
Processed 44000 fires
Processed 45000 fires
Processed 46000 fir

In [54]:
len(filtered_fires)

117543

In [51]:
filtered_fires

[{'OBJECTID': 14299,
  'USGS_Assigned_ID': 14299,
  'Assigned_Fire_Type': 'Wildfire',
  'Fire_Year': 1963,
  'Fire_Polygon_Tier': 1,
  'Fire_Attribute_Tiers': '1 (1), 3 (3)',
  'GIS_Acres': 40992.45827111476,
  'GIS_Hectares': 16589.05930244248,
  'Source_Datasets': 'Comb_National_NIFC_Interagency_Fire_Perimeter_History (1), Comb_SubState_MNSRBOPNCA_Wildfires_Historic (1), Comb_SubState_BLM_Idaho_NOC_FPER_Historica_Fire_Polygons (1), Comb_National_BLM_Fire_Perimeters_LADP (1)',
  'Listed_Fire_Types': 'Wildfire (1), Likely Wildfire (3)',
  'Listed_Fire_Names': 'RATTLESNAKE (4)',
  'Listed_Fire_Codes': 'No code provided (4)',
  'Listed_Fire_IDs': '1963-NA-000000 (2)',
  'Listed_Fire_IRWIN_IDs': '',
  'Listed_Fire_Dates': 'Listed Wildfire Discovery Date(s): 1963-08-06 (3) | Listed Wildfire Controlled Date(s): 1963-12-31 (3)',
  'Listed_Fire_Causes': 'Unknown (3)',
  'Listed_Fire_Cause_Class': 'Undetermined (4)',
  'Listed_Rx_Reported_Acres': None,
  'Listed_Map_Digitize_Methods': 'Digitiz

In [55]:
filtered_fires_df = pd.DataFrame(filtered_fires)

In [56]:
filtered_fires_df.head()

Unnamed: 0,OBJECTID,USGS_Assigned_ID,Assigned_Fire_Type,Fire_Year,Fire_Polygon_Tier,Fire_Attribute_Tiers,GIS_Acres,GIS_Hectares,Source_Datasets,Listed_Fire_Types,...,Processing_Notes,Wildfire_Notice,Prescribed_Burn_Notice,Wildfire_and_Rx_Flag,Overlap_Within_1_or_2_Flag,Circleness_Scale,Circle_Flag,Exclude_From_Summary_Rasters,Shape_Length,Shape_Area
0,14299,14299,Wildfire,1963,1,"1 (1), 3 (3)",40992.458271,16589.059302,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (1), Likely Wildfire (3)",...,,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,,,0.385355,,No,73550.428118,165890600.0
1,14300,14300,Wildfire,1963,1,"1 (1), 3 (3)",25757.090203,10423.524591,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (2), Likely Wildfire (2)",...,,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,,,0.364815,,No,59920.576713,104235200.0
2,14301,14301,Wildfire,1963,1,"1 (5), 3 (15), 5 (1)",45527.210986,18424.208617,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (6), Likely Wildfire (15)",...,,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,,,0.320927,,No,84936.82781,184242100.0
3,14302,14302,Wildfire,1963,1,"1 (1), 3 (3), 5 (1)",10395.010334,4206.711433,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (2), Likely Wildfire (3)",...,,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,,,0.428936,,No,35105.903602,42067110.0
4,14303,14303,Wildfire,1963,1,"1 (1), 3 (3)",9983.605738,4040.2219,Comb_National_NIFC_Interagency_Fire_Perimeter_...,"Wildfire (1), Likely Wildfire (3)",...,,Wildfire mapping prior to 1984 was inconsisten...,Prescribed fire data in this dataset represent...,,,0.703178,,No,26870.456126,40402220.0


### Join all the information

In [57]:
filtered_fires_df.to_csv('wildfire_attributes.csv', index=False)

In this notebook we focused on collecting al the relevant to the fildfire. In the upcoming notebooks we will manipulate the said data and come up with an smoke estimate.