# Fire Extent Prediction from Historical Weather Data

###### Adam Swan, Jeff Warchall, Omar Younis

## Problem Statement

XXX

### County Location Data

From the California [Open Data Portal](https://data.ca.gov/dataset/ca-geographic-boundaries/resource/b0007416-a325-4777-9295-368ea6b710e6) we obtained a .SHP file of the geographic locations of the counties of California.  A .csv file containing the latitude and longitude of the centers of the counties was then extracted using [MyGeodata Converter](https://mygeodata.cloud/converter/shp-to-csv) stored in the data folder then placed into the ```location``` variable.

In [None]:
import requests
import pandas as pd
import time
from IPython.display import clear_output

In [None]:
counties = pd.read_csv('data/CA_Counties_Location.csv')
location = counties[['NAMELSAD', 'INTPTLAT', 'INTPTLON']]

### Fire Data

The fire data was obtained from the State of California [Geoportal](https://gis.data.ca.gov/), which generated the url used to fetch the data in the cell below.  We obtained all of the fire from 1952 to the present.  These data are described in the data dictionary at the end of Section 02 - Preprocessing.


In [None]:
url = 'https://egis.fire.ca.gov/arcgis/rest/services/FRAP/FirePerimeters_FS/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json'
res = requests.get(url)
data = res.json()

The data is presented in .json format and the features are stored in a dataframe named ```df``` and then exported to a .csv file.

In [None]:
list_of_records = []
list_of_features = data['features'][0]['attributes'].keys()
for index, value in enumerate(data['features']):
    list_of_records.append(data['features'][index]['attributes'].values())
    
df = pd.DataFrame(list_of_records, columns = list_of_features)
df.to_csv('./data/raw_fires.csv')

### California Weather Data

Weather data was obtained from [World Weather Online](https://www.worldweatheronline.com/developer/) (WWO)).  First, using the county latitude and longitude data a list of the centers of each county in California is created.  Then, since the data available on WWO only goes back to July 1, 2008, list containing every year, month, and day in the range 07/01/2008 to 12/31/2020 are created.

In [None]:
lats = list(location['INTPTLAT'])
longs = list(location['INTPTLON'])
years = list(range(2008, 2021))
months = ['01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12']
days = [31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]

The WWO API requires a query string to be passed.  These queries are assembled in the following loop to create a unique query for every day in our range of interest.

In [None]:
urls = []
for year in years:
    for i, month in enumerate(months):
        if month > '06' or year > 2008:
            for (j, lat) in enumerate(lats): 
                start_date = str(year)+ '-' + month + '-' + '01'
                end_date = str(year)+ '-' + month + '-' + str(days[i])
                long = longs[j]
                url = 'http://api.worldweatheronline.com/premium/v1/past-weather.ashx?key=xxxxxxxxxxxxxxxxxxxxx'
                url = url + '&format=json'
                url = url + '&q=' + str(lat) + ',' + str(long)
                url = url + '&tp=24'
                url = url + '&date=' + start_date
                url = url + '&enddate=' + end_date
                urls.append(url)

The list of urls is then passed to the API and the returned .json documents are stored in a list.

In [None]:
data = []
for i, url in enumerate(urls):
    time.sleep(1)
    res = requests.get(url)
    data.append(res.json())
    clear_output()
    print(f'{i+1} of 3480 requests complete')

Finally, we iterate through the .json documents to compile an array of records and save them to a data frame.  This data frame is then exported to the data folder.

In [None]:
list_of_features = list(data[0]['data']['weather'][0].keys())
list_of_features.append('Lat and Long')

list_of_records = []
for i in range(len(data)):
    for dictionary in data[i]['data']['weather']:
        list_of_values = []
        for value in dictionary.values():
            list_of_values.append(value)
        list_of_values.append(data[i]['data']['request'][0]['query'])
        list_of_records.append(list_of_values)

df = pd.DataFrame(list_of_records, columns = list_of_features)
df.to_csv('./data/clean_daily_weather.csv')