<a href="https://colab.research.google.com/github/mnocerino23/Wildfire-Forecaster/blob/main/Elevation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this Jupyter notebook, I correct for a few minor errors in the dataset and add in the final feature: elevation at which the fire took place.

In [148]:
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')

#Read in the two datasets. The first contains over 110,000 fires from 2001-2015 while the second has 1,000 more recent, larger fires.
wildfire_set1 = pd.read_csv('/content/drive/MyDrive/Data_Science_Projects/Wildfires/wildfires1_w_snow.csv')
wildfire_set2 = pd.read_csv('/content/drive/MyDrive/Data_Science_Projects/Wildfires/wildfires2_w_snow.csv')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


  exec(code_obj, self.user_global_ns, self.user_ns)


In [149]:
print(wildfire_set1.shape)
print(wildfire_set2.shape)

(114558, 38)
(1197, 38)


Before starting to build our classifiers, I take care of a few small issues and add an additional feature. From inspecting the dataset, I found that some invalid coordinates with (latitude = 0,longitude = 0) appear in the datasets so we quickly take care of that issue with the code below:

In [150]:
for index, row in wildfire_set1.iterrows():
  if wildfire_set1.at[index,'Latitude'] == 0 and wildfire_set1.at[index,'Longitude'] == 0:
    wildfire_set1.drop([index], inplace = True)
wildfire_set1.reset_index()

for index, row in wildfire_set2.iterrows():
  if wildfire_set2.at[index,'Latitude'] == 0 and wildfire_set2.at[index,'Longitude'] == 0:
    wildfire_set2.drop([index], inplace = True)
wildfire_set2.reset_index()

Unnamed: 0.1,index,Unnamed: 0,Year,Name,AcresBurned,Fire Size Rank,Cause,SOURCE_REPORTING_UNIT_NAME,DaysBurn,Discovery Month,...,PRCP_6M,PRCP_RS,DX90_2M,DP10_2M,Receives Snow,Snow Station,River Basin,Mar_SP,Mar_WC,Mar_Dens
0,0,0,2016,Soberanes Fire,132127.0,G,,,83.0,Jul,...,14.11,21.42,0.0,1.0,0,,,0.0,0.0,0.00
1,1,1,2016,Erskine Fire,48019.0,G,,,18.0,Jun,...,4.68,4.88,15.0,4.0,1,mineral_king,Kaweah,36.0,16.0,0.44
2,2,2,2016,Chimney Fire,46344.0,G,,,24.0,Aug,...,2.52,8.09,43.0,0.0,0,,,0.0,0.0,0.00
3,3,3,2016,Blue Cut Fire,36274.0,G,,,7.0,Aug,...,3.41,6.45,43.0,0.0,0,,,0.0,0.0,0.00
4,4,4,2016,Gap Fire,33867.0,G,,,1.0,Aug,...,18.03,54.17,0.0,2.0,1,parks_creek,Shasta,77.0,34.0,0.44
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1151,1192,1192,2019,Eagle Fire,9.0,B,,,,Oct,...,0.49,12.66,48.0,0.0,0,,,0.0,0.0,0.00
1152,1193,1193,2019,Long Fire,2.0,B,,,,Jun,...,67.97,69.29,0.0,17.0,1,eureka_lake,Feather,110.0,48.0,0.44
1153,1194,1194,2019,Cashe Fire,,B,,,,Nov,...,3.29,21.47,13.0,0.0,0,,,0.0,0.0,0.00
1154,1195,1195,2019,Oak Fire,,B,,,,Oct,...,0.00,0.00,0.0,0.0,0,,,0.0,0.0,0.00


From inspecting the shape of the dataframes before and after, we see that this shaved off around 40 invalid coordinates from the second dataset.

In [151]:
print(wildfire_set1.shape)
print(wildfire_set2.shape)

(114558, 38)
(1156, 38)


# Add in the final feature: Elevation
We need to make a post request to the open elevation API (https://developer.mapquest.com/documentation/open/elevation-api/#:~:text=The%20Open%20Elevation%20API%20provides,by%20the%20lat%2Flng%20collection) which allows us to get elevation given latitude and longitude in an efficient manner as opposed to web requesting each elevation individually (would take over 24 hours to run). 


Below, we create a dictionary which has a key location mapped to a list of dictionaries each holding the individual fire coordinates which is the format required for post requests to the API as described in its github documentation. (https://github.com/Jorl17/open-elevation/blob/master/docs/api.md)

In [152]:
#import python's requests and json libraries for this tak
import requests
import json

In [153]:
#create a feature has_elevation that will denote whether the elevation for a fire has already been calculated (to avoid rerunning the same calculation)

wildfire_set1['Has_Elevation'] = 0
wildfire_set2['Has_Elevation'] = 0

The function below creates batches of coordinates (in the API's preferred format) within the wildfires dataset. We will feed this coordinates into the API post request so we can only do batches of 1500 coordinates at a time so that we don't max out the API with too large of a request.

In [154]:
def batch_of_coordinates(df):
  #use the .loc function to limit rows we take in to those that still don't have elevation yet
  not_visited = df.loc[df['Has_Elevation'] == 0]
  coordinates = []
  for index, row in not_visited.iterrows():
    #get 1500 pairs coordinates at a time so that we can send this information into the requests.post function
    #we can only do batches of 1500 coordinates so that we don't max out the API with massive requests
    if len(coordinates) < 1500:
      d = {}
      #add the coordinates into the dictionary d and create a list of dictionaries (formatting the API request requires)
      d["latitude"] = df.at[index,"Latitude"]
      d["longitude"] = df.at[index,"Longitude"]
      coordinates.append(d)
      #once this coordinate has been added, we denote that it now will elevation (doesn't need to be repeated)
      df.at[index, 'Has_Elevation'] = 1
    else:
      break
  #return a list of 1500 coordinates that we will feed in to the API via python post request, which will return all the associated elevations efficiently
  return coordinates

In [155]:
elevations = []
final = {}

Below, we have a while loop that keeps getting new batches of coordinates that will be sent to the API request as long as there are still 0's in the Has_Elevation columns (meaning columns that haven't received their calculated elevation). 

In [156]:
while 0 in list(wildfire_set1['Has_Elevation']):
  coord = batch_of_coordinates(wildfire_set1)
  final["locations"] = coord

  j = json.dumps(final)
  #convert the string into a json object using .loads() so that we can pass it into the post request to the API
  json_object = json.loads(j)
  r = requests.post(url= 'https://api.open-elevation.com/api/v1/lookup', json= json_object, timeout = 30)
  #r.text is the json object returned by the post request. We convert it to a readable form in python using loads()
  y = json.loads(r.text)
  for item in y['results']:
    elevations.append(item['elevation'])

In [157]:
print(len(elevations))

114558


Now, repeat this same process with the second dataset:

In [158]:
elevations2 = []
final2 = {}

In [159]:
while 0 in list(wildfire_set2['Has_Elevation']):
  coord = batch_of_coordinates(wildfire_set2)
  final2["locations"] = coord

  j = json.dumps(final2)
  json_object = json.loads(j)
  r = requests.post(url= 'https://api.open-elevation.com/api/v1/lookup', json= json_object, timeout = 30)
  y = json.loads(r.text)
  for item in y['results']:
    elevations2.append(item['elevation'])

In [160]:
print(len(elevations2))

1156


Unit conversion: change elevation in meters to elevation in feet by multipying by 3.2808

In [161]:
elevations_ft = []
elevations2_ft = []

for ele in elevations:
  elevations_ft.append(3.2808*ele)
for ele2 in elevations2:
  elevations2_ft.append(3.2808*ele2)

In [162]:
wildfire_set1['Elevation'] = elevations_ft
wildfire_set2['Elevation'] = elevations2_ft

In [163]:
wildfire_set1.head(5)

Unnamed: 0.1,Unnamed: 0,Year,Name,AcresBurned,Fire Size Rank,Cause,SOURCE_REPORTING_UNIT_NAME,DaysBurn,Discovery Month,Discovered DOY,...,DX90_2M,DP10_2M,Receives Snow,Snow Station,River Basin,Mar_SP,Mar_WC,Mar_Dens,Has_Elevation,Elevation
0,0,2005,FOUNTAIN,0.1,A,Miscellaneous,Plumas National Forest,1.0,Feb,33.0,...,0.0,19.0,1.0,eureka_lake,Feather,79.6,34.0,0.43,1,2965.8432
1,1,2004,PIGEON,0.25,A,Lightning,Eldorado National Forest,1.0,May,133.0,...,0.0,3.0,1.0,ward_creek_2,Lake Tahoe,108.6,38.1,0.35,1,6207.2736
2,2,2004,SLACK,0.1,A,Debris Burning,Eldorado National Forest,1.0,Jun,152.0,...,0.0,11.0,1.0,ward_creek_2,Lake Tahoe,108.6,38.1,0.35,1,3454.6824
3,3,2004,DEER,0.1,A,Lightning,Eldorado National Forest,5.0,Jun,180.0,...,0.0,3.0,1.0,echo_summit,American,87.2,28.4,0.33,1,7759.092
4,4,2004,STEVENOT,0.1,A,Lightning,Eldorado National Forest,5.0,Jun,180.0,...,0.0,3.0,1.0,echo_summit,American,87.2,28.4,0.33,1,7598.3328


In [164]:
wildfire_set2.head(5)

Unnamed: 0.1,Unnamed: 0,Year,Name,AcresBurned,Fire Size Rank,Cause,SOURCE_REPORTING_UNIT_NAME,DaysBurn,Discovery Month,Discovered DOY,...,DX90_2M,DP10_2M,Receives Snow,Snow Station,River Basin,Mar_SP,Mar_WC,Mar_Dens,Has_Elevation,Elevation
0,0,2016,Soberanes Fire,132127.0,G,,,83.0,Jul,,...,0.0,1.0,0,,,0.0,0.0,0.0,1,961.2744
1,1,2016,Erskine Fire,48019.0,G,,,18.0,Jun,,...,15.0,4.0,1,mineral_king,Kaweah,36.0,16.0,0.44,1,3389.0664
2,2,2016,Chimney Fire,46344.0,G,,,24.0,Aug,,...,43.0,0.0,0,,,0.0,0.0,0.0,1,1049.856
3,3,2016,Blue Cut Fire,36274.0,G,,,7.0,Aug,,...,43.0,0.0,0,,,0.0,0.0,0.0,1,4192.8624
4,4,2016,Gap Fire,33867.0,G,,,1.0,Aug,,...,0.0,2.0,1,parks_creek,Shasta,77.0,34.0,0.44,1,3244.7112


In [165]:
wildfire_set1.to_csv('wildfire_set1_w_allfeatures.csv', index = False)
wildfire_set2.to_csv('wildfire_set2_w_allfeatures.csv', index = False)

I went back to manually check the accuracy of the elevation of random fires throughout the different parts of each dataset, confirming that the above code does function without error. The csv's that I create above contain all the necessary features, and we can now perform some final preprocessing, drop null values, and begin model building.