# CA Wildfire Causes Analysis

The first module of this notebook allows you to analyze the number of acres burned by each of the different causes defined by Cal Fire between 1887 and 2020. 

The second module of this notebook allows you to analyze the same data in a shorter period of time: from 2000 to 2020.


### Configuration:
Let's begin by importing the Python tools necessary for the job.

In [5]:
import pandas as pd
import altair as alt

Import Cal Fire Dataset

In [6]:
ca_fires = pd.read_csv("fires_data copy.csv") # Import database

### Analysis: 
The data source is now prepared for analysis.

In [7]:
ca_fires.info() # Explore database

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21318 entries, 0 to 21317
Data columns (total 18 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   OBJECTID      21318 non-null  int64  
 1   YEAR_         21241 non-null  float64
 2   STATE         21315 non-null  object 
 3   AGENCY        21313 non-null  object 
 4   UNIT_ID       21298 non-null  object 
 5   FIRE_NAME     21195 non-null  object 
 6   INC_NUM       20396 non-null  object 
 7   ALARM_DATE    15954 non-null  object 
 8   CONT_DATE     8638 non-null   object 
 9   CAUSE         21270 non-null  object 
 10  COMMENTS      4064 non-null   object 
 11  REPORT_AC     8767 non-null   float64
 12  GIS_ACRES     21311 non-null  float64
 13  C_METHOD      9096 non-null   object 
 14  OBJECTIVE     21123 non-null  object 
 15  FIRE_NUM      17313 non-null  object 
 16  Shape_Length  21318 non-null  float64
 17  Shape_Area    21318 non-null  float64
dtypes: float64(5), int64(1), o

### Explore:
We start to explore the different "CAUSES" included in our dataset.

In [8]:
ca_fires.head(3)

Unnamed: 0,OBJECTID,YEAR_,STATE,AGENCY,UNIT_ID,FIRE_NAME,INC_NUM,ALARM_DATE,CONT_DATE,CAUSE,COMMENTS,REPORT_AC,GIS_ACRES,C_METHOD,OBJECTIVE,FIRE_NUM,Shape_Length,Shape_Area
0,21440,2020.0,California,California Department of Forestry and Fire Pro...,Nevada - Yuba - Placer CAL FIRE,NELSON,13212,6/18/20 0:00,6/23/20 0:00,11 - Powerline,,110.0,109.602501,1 - GPS Ground,Suppression (Wildfire),,3252.52328,443544.7
1,21441,2020.0,California,California Department of Forestry and Fire Pro...,Nevada - Yuba - Placer CAL FIRE,AMORUSO,11799,6/1/20 0:00,6/4/20 0:00,2 - Equipment Use,,670.0,685.585022,1 - GPS Ground,Suppression (Wildfire),,9653.760308,2774464.0
2,21442,2020.0,California,California Department of Forestry and Fire Pro...,Nevada - Yuba - Placer CAL FIRE,ATHENS,18493,8/10/20 0:00,3/1/20 0:00,14 - Unknown / Unidentified,,26.0,27.30048,1 - GPS Ground,Suppression (Wildfire),,1649.643235,110481.1


In [9]:
ca_fires.CAUSE.describe() # Explore "CAUSE"

count                           21270
unique                             18
top       14 - Unknown / Unidentified
freq                             9543
Name: CAUSE, dtype: object

Filter down to number of wildfires due to each cause in California.

In [10]:
ca_fires.CAUSE.value_counts() # Number of fires by "CAUSE"

14 - Unknown / Unidentified      9543
1 - Lightning                    3454
9 - Miscellaneous                3379
2 - Equipment Use                1246
7 - Arson                         903
5 - Debris                        723
10 - Vehicle                      454
11 - Powerline                    412
4 - Campfire                      380
3 - Smoking                       342
8 - Playing with fire             196
18 - Escaped Prescribed Burn       90
6 - Railroad                       80
15 - Structure                     21
19 - Illegal Alien Campfire        17
16 - Aircraft                      14
13 - Non-Firefighter Training      11
12 - Firefighter Training           5
Name: CAUSE, dtype: int64

In [11]:
ca_fires.CAUSE.value_counts().reset_index() # Reformat table

Unnamed: 0,index,CAUSE
0,14 - Unknown / Unidentified,9543
1,1 - Lightning,3454
2,9 - Miscellaneous,3379
3,2 - Equipment Use,1246
4,7 - Arson,903
5,5 - Debris,723
6,10 - Vehicle,454
7,11 - Powerline,412
8,4 - Campfire,380
9,3 - Smoking,342


### Filter:
You can run this filter with every cause.

In [12]:
my_cause = "11 - Powerline" # Filter by "CAUSE" - This filter can be use with any "CAUSE".

In [13]:
my_cause = ca_fires[ca_fires.CAUSE == my_cause] # Filter by "my chosen cause".

In [14]:
my_cause.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 412 entries, 0 to 21302
Data columns (total 18 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   OBJECTID      412 non-null    int64  
 1   YEAR_         412 non-null    float64
 2   STATE         412 non-null    object 
 3   AGENCY        412 non-null    object 
 4   UNIT_ID       410 non-null    object 
 5   FIRE_NAME     411 non-null    object 
 6   INC_NUM       404 non-null    object 
 7   ALARM_DATE    402 non-null    object 
 8   CONT_DATE     357 non-null    object 
 9   CAUSE         412 non-null    object 
 10  COMMENTS      88 non-null     object 
 11  REPORT_AC     333 non-null    float64
 12  GIS_ACRES     412 non-null    float64
 13  C_METHOD      361 non-null    object 
 14  OBJECTIVE     410 non-null    object 
 15  FIRE_NUM      196 non-null    object 
 16  Shape_Length  412 non-null    float64
 17  Shape_Area    412 non-null    float64
dtypes: float64(5), int64(1), obj

In [15]:
my_cause.head(3)

Unnamed: 0,OBJECTID,YEAR_,STATE,AGENCY,UNIT_ID,FIRE_NAME,INC_NUM,ALARM_DATE,CONT_DATE,CAUSE,COMMENTS,REPORT_AC,GIS_ACRES,C_METHOD,OBJECTIVE,FIRE_NUM,Shape_Length,Shape_Area
0,21440,2020.0,California,California Department of Forestry and Fire Pro...,Nevada - Yuba - Placer CAL FIRE,NELSON,13212,6/18/20 0:00,6/23/20 0:00,11 - Powerline,,110.0,109.602501,1 - GPS Ground,Suppression (Wildfire),,3252.52328,443544.7
14,21454,2020.0,California,Department of Defense,,PAVE PAWS,17717,8/2/20 0:00,8/2/20 0:00,11 - Powerline,Beale Air Force Base,532.0,532.773377,1 - GPS Ground,Suppression (Wildfire),,7847.30461,2156057.0
15,21455,2020.0,California,California Department of Forestry and Fire Pro...,Nevada - Yuba - Placer CAL FIRE,RIOSA,12970,6/15/20 0:00,6/15/20 0:00,11 - Powerline,,13.8,13.69473,1 - GPS Ground,Suppression (Wildfire),,1354.188167,55420.56


### Sorting Values:
Here we analyze the number of acres burned in each fired related to our specified cause.

In [16]:
my_cause.sort_values("GIS_ACRES", ascending=False).head(3) # Sorting values - The most destructive fires caused by "my_cause"

Unnamed: 0,OBJECTID,YEAR_,STATE,AGENCY,UNIT_ID,FIRE_NAME,INC_NUM,ALARM_DATE,CONT_DATE,CAUSE,COMMENTS,REPORT_AC,GIS_ACRES,C_METHOD,OBJECTIVE,FIRE_NUM,Shape_Length,Shape_Area
20608,42055,2018.0,California,California Department of Forestry and Fire Pro...,Butte CAL FIRE,CAMP,16737,11/8/18 0:00,11/26/18 0:00,11 - Powerline,,153336.0,153335.5625,1 - GPS Ground,Suppression (Wildfire),,311935.2788,620527017.8
21031,42478,2019.0,California,California Department of Forestry and Fire Pro...,Sonoma - Lake - Napa CAL FIRE,KINCADE,19376,10/23/19 0:00,11/10/19 0:00,11 - Powerline,,,77762.14063,1 - GPS Ground,Suppression (Wildfire),,186114.8953,314692204.7
15004,36451,2002.0,California,California Department of Forestry and Fire Pro...,San Diego CAL FIRE,PINES,5658,7/29/02 0:00,8/11/02 0:00,11 - Powerline,,61690.0,61691.23828,2 - GPS Air,Suppression (Wildfire),777.0,193363.8981,249655581.0


Here we are calculating the total number of acres burned due to our specified cause.

In [17]:
my_cause.GIS_ACRES.sum() # Calculation: number of acres burned due to "my_cause"

525121.876570476

### Calculate Top Causes:
Now we are going to group our dataset by the column "CAUSE" and calculate the sum of acres burned per cause.

In [18]:
top_causes = ca_fires.groupby(["CAUSE"]).GIS_ACRES.sum().reset_index().sort_values("GIS_ACRES", ascending=False).head(10)

In [19]:
top_causes.head(10) # Give us the top 10 Wildfires Causes.

Unnamed: 0,CAUSE,GIS_ACRES
5,14 - Unknown / Unidentified,14276770.0
0,1 - Lightning,10256130.0
17,9 - Miscellaneous,7722483.0
15,7 - Arson,1986705.0
10,2 - Equipment Use,1820318.0
12,4 - Campfire,1341582.0
1,10 - Vehicle,696625.5
13,5 - Debris,632788.6
2,11 - Powerline,525121.9
11,3 - Smoking,358641.5


### Examine the result:

In [31]:
alt.Chart(top_causes).mark_bar().encode(x = "GIS_ACRES", y=alt.Y('CAUSE:N', sort='-x')) # Create a chart – CAUSE/ACRES 

In [30]:
alt.Chart(top_causes).mark_bar().encode(x = "GIS_ACRES", y=alt.Y('CAUSE:N', sort='-x')).properties(title="Top Wildfire Causes in CA") # Create a title


## II Module: General Analysis - From 2010 to 2020 

Here we are performing the same analysis but limiting our data frame to 2000-2020.

In [38]:
ca_fires_2010 = ca_fires[ca_fires.YEAR_ > 2010] # Filter form 2000 to 2020

### How many fires have been caused by each cause? 

In [39]:
ca_fires_2010.CAUSE.value_counts().reset_index() # Reformat table

Unnamed: 0,index,CAUSE
0,14 - Unknown / Unidentified,1180
1,1 - Lightning,713
2,9 - Miscellaneous,440
3,2 - Equipment Use,406
4,10 - Vehicle,235
5,11 - Powerline,190
6,7 - Arson,159
7,5 - Debris,141
8,4 - Campfire,97
9,8 - Playing with fire,37


### How many acres have been burned by each cause? 

Now we are going to group our dataset by the column "CAUSE" and calculate the sum of acres burned per cause.

In [44]:
top_causes_2010 = ca_fires_2010.groupby(["CAUSE"]).GIS_ACRES.sum().reset_index().sort_values("GIS_ACRES", ascending=False).head(10)

In [45]:
top_causes_2010.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 13
Data columns (total 2 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   CAUSE      10 non-null     object 
 1   GIS_ACRES  10 non-null     float64
dtypes: float64(1), object(1)
memory usage: 240.0+ bytes


In [46]:
top_causes_2010.head(10) # Top 10 wildfire cause between 2000 and 2020

Unnamed: 0,CAUSE,GIS_ACRES
0,1 - Lightning,5027323.0
3,14 - Unknown / Unidentified,2772650.0
14,9 - Miscellaneous,1206509.0
1,10 - Vehicle,539022.9
9,4 - Campfire,525163.3
12,7 - Arson,328349.9
2,11 - Powerline,280708.1
7,2 - Equipment Use,161898.8
10,5 - Debris,68728.9
13,8 - Playing with fire,15277.17


### Examine the result:

In [47]:
alt.Chart(top_causes_2010).mark_bar().encode(x = "GIS_ACRES", y=alt.Y('CAUSE:N', sort='-x')).properties(title="Top Wildfire Causes in CA between 2000 and 2020") # Create a chart with title