## Beginning Analysis Notebook

In this notebook we will begin analysis of the California wildfires dataset. The goal is to find classification techniques that accurately give us predictive power of california wildfires.

In [1]:
#Packages we will use
import noaa_data as noaa
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from datetime import datetime

  import pandas.util.testing as tm


First we are going to import in the wildfire data.

In [2]:
wf = noaa.get_wf_data(r'Cal_Wildfires/California_Fire_Incidents.csv', 50)

In [3]:
wf['Started']

0       2013-08-17T15:25:00Z
1       2013-05-30T15:28:00Z
2       2013-07-15T13:43:00Z
3       2013-08-10T16:30:00Z
5       2013-07-22T22:15:00Z
                ...         
1555    2019-06-08T14:37:00Z
1556    2019-05-01T16:46:00Z
1633    2019-11-25T12:02:02Z
1634    2019-10-22T19:20:44Z
1635    2019-10-14T15:32:20Z
Name: Started, Length: 984, dtype: object

In [4]:
from datetime import datetime

In [5]:
wf['Started'] = wf['Started'].apply(lambda x: datetime.strptime(str(x[:10]),"%Y-%m-%d"))

In [6]:
wf.head()

Unnamed: 0,AcresBurned,ArchiveYear,Counties,Extinguished,Latitude,Longitude,Name,Started,coords
0,257314.0,2013,Tuolumne,2013-09-06T18:30:00Z,37.857,-120.086,Rim Fire,2013-08-17,"(37.857, -120.086)"
1,30274.0,2013,Los Angeles,2013-06-08T18:30:00Z,34.585595,-118.423176,Powerhouse Fire,2013-05-30,"(34.585595, -118.423176)"
2,27531.0,2013,Riverside,2013-07-30T18:00:00Z,33.7095,-116.72885,Mountain Fire,2013-07-15,"(33.7095, -116.72885)"
3,27440.0,2013,Placer,2013-08-30T08:00:00Z,39.12,-120.65,American Fire,2013-08-10,"(39.12, -120.65)"
5,22992.0,2013,Fresno,2013-09-24T20:15:00Z,37.279,-119.318,Aspen Fire,2013-07-22,"(37.279, -119.318)"


In [7]:
wf.ArchiveYear.max()

2019

In [10]:
# Lets see how many fires in 2019
wf.loc[wf.ArchiveYear == 2019].count()

AcresBurned     171
ArchiveYear     174
Counties        174
Extinguished    127
Latitude        174
Longitude       174
Name            174
Started         174
coords          174
dtype: int64

Now lets start to load in our climate and location data. Lets take 6 years from 2013 - 2019, just to be complete about it. This will be a lot of data.

In [23]:
tmax = noaa.get_mo_data_noaa('TMAX', 2013, 2019) # Mean Max temp

working on year 2013
1
1001
2001
3001
4001
5001
6001
7001
working on year 2014
1
1001
2001
3001
4001
5001
6001
7001
working on year 2015
1
1001
2001
3001
4001
5001
6001
7001
working on year 2016
1
1001
2001
3001
4001
5001
6001
working on year 2017
1
1001
2001
3001
4001
5001
6001
working on year 2018
1
1001
2001
3001
4001
5001
6001
working on year 2019
1
1001
2001
3001
4001
5001
6001


In [24]:
tmin = noaa.get_mo_data_noaa('TMIN', 2013, 2019) # Mean Min temp

working on year 2013
1
1001
2001
3001
4001
5001
6001
7001
working on year 2014
1
1001
2001
3001
4001
5001
6001
7001
working on year 2015
1
1001
2001
3001
4001
5001
6001
7001
working on year 2016
1
1001
2001
3001
4001
5001
6001
working on year 2017
1
1001
2001
3001
4001
5001
6001
working on year 2018
1
1001
2001
3001
4001
5001
6001
working on year 2019
1
1001
2001
3001
4001
5001
6001


In [12]:
tavg = noaa.get_mo_data_noaa('TAVG', 2013, 2019) # Average temp for month

working on year 2013
1
1001
2001
3001
4001
5001
6001
7001
working on year 2014
1
1001
2001
3001
4001
5001
6001
7001
working on year 2015
1
1001
2001
3001
4001
5001
6001
7001
working on year 2016
1
1001
2001
3001
4001
5001
6001
working on year 2017
1
1001
2001
3001
4001
5001
6001
working on year 2018
1
1001
2001
3001
4001
5001
6001
working on year 2019
1
1001
2001
3001
4001
5001
6001


In [13]:
emxt = noaa.get_mo_data_noaa('EMXT', 2013, 2019) # Single max temp for month

working on year 2013
1
1001
2001
3001
4001
5001
6001
7001
working on year 2014
1
1001
2001
3001
4001
5001
6001
7001
working on year 2015
1
1001
2001
3001
4001
5001
6001
7001
working on year 2016
1
1001
2001
3001
4001
5001
6001
working on year 2017
1
1001
2001
3001
4001
5001
6001
working on year 2018
1
1001
2001
3001
4001
5001
6001
working on year 2019
1
1001
2001
3001
4001
5001
6001


In [14]:
emnt = noaa.get_mo_data_noaa('EMNT', 2013, 2019) # Single min temp for month

working on year 2013
1
1001
2001
3001
4001
5001
6001
7001
working on year 2014
1
1001
2001
3001
4001
5001
6001
7001
working on year 2015
1
1001
2001
3001
4001
5001
6001
7001
working on year 2016
1
1001
2001
3001
4001
5001
6001
working on year 2017
1
1001
2001
3001
4001
5001
6001
working on year 2018
1
1001
2001
3001
4001
5001
6001
working on year 2019
1
1001
2001
3001
4001
5001
6001


In [15]:
cldd = noaa.get_mo_data_noaa('CLDD', 2013, 2019) #Cooling degree days

working on year 2013
1
1001
2001
3001
4001
5001
6001
7001
working on year 2014
1
1001
2001
3001
4001
5001
6001
7001
working on year 2015
1
1001
2001
3001
4001
5001
6001
7001
working on year 2016
1
1001
2001
3001
4001
5001
6001
working on year 2017
1
1001
2001
3001
4001
5001
6001
working on year 2018
1
1001
2001
3001
4001
5001
6001
working on year 2019
1
1001
2001
3001
4001
5001
6001


In [16]:
prcp = noaa.get_mo_data_noaa('PRCP', 2013, 2019) #Avg precipitation for month

working on year 2013
1
1001
2001
3001
4001
5001
6001
7001
8001
working on year 2014
1
1001
2001
3001
4001
5001
6001
7001
8001
9001
working on year 2015
1
1001
2001
3001
4001
5001
6001
7001
8001
9001
working on year 2016
1
1001
2001
3001
4001
5001
6001
7001
8001
9001
working on year 2017
1
1001
2001
3001
4001
5001
6001
7001
8001
9001
working on year 2018
1
1001
2001
3001
4001
5001
6001
7001
8001
9001
working on year 2019
1
1001
2001
3001
4001
5001
6001
7001
8001


In [17]:
snow = noaa.get_mo_data_noaa('SNOW', 2013, 2019) #Avg snow precip for month

working on year 2013
1
1001
2001
3001
4001
working on year 2014
1
1001
2001
3001
working on year 2015
1
1001
2001
3001
4001
working on year 2016
1
1001
2001
3001
working on year 2017
1
1001
2001
3001
4001
working on year 2018
1
1001
2001
3001
working on year 2019
1
1001
2001
3001


In [18]:
evap = noaa.get_mo_data_noaa('EVAP', 2013, 2019) # Total evaporation for month

working on year 2013
1
working on year 2014
1
working on year 2015
1
working on year 2016
1
working on year 2017
1
working on year 2018
1
working on year 2019
1


In [19]:
awnd = noaa.get_mo_data_noaa('AWND', 2013, 2019) #Average wind speed for the month

working on year 2013
1
working on year 2014
1
working on year 2015
1
working on year 2016
1
working on year 2017
1
working on year 2018
1
working on year 2019
1


In [25]:
df_tot = tmax.join(tmin, how='outer')

In [27]:
df_tot = df_tot.join(tavg, how = 'outer')

In [28]:
df_tot = df_tot.join(emxt, how= 'outer')

In [30]:
df_tot = df_tot.join(emnt, how = 'outer')

In [31]:
df_tot = df_tot.join(cldd, how = 'outer')

In [32]:
df_tot = df_tot.join(prcp, how ='outer')

In [33]:
df_tot = df_tot.join(snow, how='outer')

In [34]:
df_tot = df_tot.join(evap, how='outer')

In [35]:
df_tot = df_tot.join(awnd, how = 'outer')

In [37]:
#save this as a csv so we don't have to import all this again
df_tot.to_csv('noaa_data_2013_2019.csv')

In [62]:
df_tot.CLDD.dropna().droplevel(0).index.unique().shape

(660,)

In [47]:
idx = pd.IndexSlice