# CA Wildfire Caused by Powerlines

After requesting data from Cal Fire on power equipment related fires, the department provided a database of all incidents that occurred between 2012 and 2021.

The objective of this notebook is to analyze this database in search of details about the size of the issue. 

Questions:
- How many powerlines related wildfires have occurred in the last 10 years?
- How many per year? 
- What was the worst year?
- How many acres have been burned each year due to powerline incidents?
- How many lives have been lost?
- What is the cost of these fires?

### Data description
* All Reported CA Wildland Fires Caused by Powerlines/Utilities (Updated Februeary 3, 2022)
* Time range: Includes incident date range of 01/01/2011 through 12/31/2021
* Total values: 3.067

### Data limitations
*
*
*

### Configuration:

Let's begin by importing the Python tools necessary for the analysis.

In [460]:
import pandas as pd
import altair as alt
import datetime

Import Cal Fire Dataset:

In [461]:
power_fires = pd.read_csv("data/raw/All_CA_WLFires_Powerlines.csv") # Import data

In [462]:
power_fires.info() # Check the quality of the data – Not empty fields

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3067 entries, 0 to 3066
Data columns (total 9 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   FDID                            3067 non-null   int64  
 1   Inc date                        3067 non-null   object 
 2   Inc Number                      3067 non-null   object 
 3   Address                         3067 non-null   object 
 4   Total Loss                      3067 non-null   float64
 5   Total Acres Burned              3067 non-null   float64
 6   Equipment Involved in Ignition  3067 non-null   object 
 7   Total Injuries                  3067 non-null   int64  
 8   Total Deaths                    3067 non-null   int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 215.8+ KB


In [482]:
power_fires.head(3) 

Unnamed: 0,FDID,Inc date,Inc Number,Address,Total Loss,Total Acres Burned,Equipment Involved in Ignition,Total Injuries,Total Deaths
0,19110,01/14/2011,10797,CORRAL CANYON MALIBU 000000000,0.0,0.21,211 - Electrical power (utility) line.,0,0
1,37555,01/23/2011,864,Grove Rd PAUMA_VALLEY 92061,0.0,0.1,211 - Electrical power (utility) line.,0,0
2,19030,01/28/2011,5510,343 N CALIFORNIA ST Burbank 91505,500.0,0.01,211 - Electrical power (utility) line.,0,0


To make the database more handy, I have reformatted the column headings here:

In [464]:
power_fires_format = power_fires.rename(columns={"Inc date": "inc_date", "Address": "address", "Total Loss": "total_lost", "Total Acres Burned": "total_acres_burned", "Total Injuries": "total_injuries", "Total Deaths": "total_deaths", "Equipment Involved in Ignition": "equipment_involved_in_ignition", "Inc Number": "inc_number"})

In order to make my calculations per year, I need to reformate the field "inc_date":

In [465]:
power_fires_format['year'] = pd.DatetimeIndex(power_fires_format['inc_date']).year

Reformating Data Types:

In [466]:
power_fires_format.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3067 entries, 0 to 3066
Data columns (total 10 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   FDID                            3067 non-null   int64  
 1   inc_date                        3067 non-null   object 
 2   inc_number                      3067 non-null   object 
 3   address                         3067 non-null   object 
 4   total_lost                      3067 non-null   float64
 5   total_acres_burned              3067 non-null   float64
 6   equipment_involved_in_ignition  3067 non-null   object 
 7   total_injuries                  3067 non-null   int64  
 8   total_deaths                    3067 non-null   int64  
 9   year                            3067 non-null   int64  
dtypes: float64(2), int64(4), object(4)
memory usage: 239.7+ KB


### Explore:
- Which kind of equipment is mentioned in the database? 
- Which one of these equipment causes more incidents?

In [467]:
power_fires_format.equipment_involved_in_ignition.value_counts().reset_index() # Number of fires by "equipment involved in ignition"

Unnamed: 0,index,equipment_involved_in_ignition
0,211 - Electrical power (utility) line.,2536
1,212 - Electrical service supply wires.,324
2,"221 - Transformer, distribution type.",170
3,"223 - Transformer, low voltage.",37


In [468]:
powerlines_fires_format = power_fires_format[power_fires_format.equipment_involved_in_ignition == '211 - Electrical power (utility) line.']

In [490]:
powerlines_fires_format.total_lost.sum()

2557351934.0

In [491]:
powerlines_fires_format.total_acres_burned.sum()

515155.18

In [None]:
powerlines_fires_format.total_deaths_bu.sum()

In [489]:
powerlines_fires_format.sort_values('total_lost', ascending=False)


Unnamed: 0,FDID,inc_date,inc_number,address,total_lost,total_acres_burned,equipment_involved_in_ignition,total_injuries,total_deaths,year
2028,4555,11/08/2018,16737,Camp Creek RD JARBO_GAP 95965,1999999998.0,153336.0,211 - Electrical power (utility) line.,0,86,2018
2394,49555,10/23/2019,0019376,GEYSERS 9-10 - FUMAROLE Cloverdale 95425,383750000.0,77758.0,211 - Electrical power (utility) line.,0,0,2019
1744,58555,10/08/2017,26269,13916 Cascade WAY Browns Valley 95918,50000000.0,9989.0,211 - Electrical power (utility) line.,0,4,2017
1757,29555,10/09/2017,26279,11228 McCourtney RD Grass Valley 95949,40000000.0,76.0,211 - Electrical power (utility) line.,0,0,2017
1758,17555,10/09/2017,10055,1350 SULPHUR BANK RD CLEARLAKE_OAKS 95423,35110000.0,2207.0,211 - Electrical power (utility) line.,0,0,2017
...,...,...,...,...,...,...,...,...,...,...
538,43080,05/04/2013,1240083,16 CHESTNUT AV LOS GATOS 95030,0.0,0.01,211 - Electrical power (utility) line.,0,0,2013
1553,43080,06/14/2017,1650052,20210 LYNTON CT CUPERTINO 95014,0.0,0.01,211 - Electrical power (utility) line.,0,0,2017
1552,48040,06/12/2017,0000909,1 Esperson CT Rio Vista 94571,0.0,2.0,211 - Electrical power (utility) line.,0,0,2017
1551,20015,06/12/2017,11793,00016329 FAIRVIEW ST MADERA_ACRES 93637,0.0,2.5,211 - Electrical power (utility) line.,0,0,2017


In [470]:
powerlines_fires_format.to_csv('data/processed/powerline_fires_clean.csv', index=False)

- How many power lines related fires had ocurred per year?
- What was the worst year? 

In [471]:
fires_per_year = powerlines_fires_format.groupby('year').inc_number.count().reset_index()
fires_per_year.rename(columns={'inc_number': 'powerline_fires_per_year'}, inplace=True)
fires_per_year

Unnamed: 0,year,powerline_fires_per_year
0,2011,175
1,2012,228
2,2013,248
3,2014,203
4,2015,174
5,2016,186
6,2017,274
7,2018,192
8,2019,357
9,2020,276


In [472]:
fires_per_year.to_csv('data/processed/powerline_fires_per_year.csv', index=False)

In [473]:
alt.Chart(calculation_fires_per_year).mark_bar(size=20).encode(
    x = "year", 
    y= "fires_per_year", 
    color = alt.condition(
        alt.datum.year == 2019, 
        alt.value('orange'), 
        alt.value('steelblue')
    )
).properties(title="Incidents per Year")

Explore the damage caused annually by fires related to power lines:

In [474]:
damages_per_year = powerlines_fires_format.groupby("year").agg({'total_lost':'sum', 'total_acres_burned':'sum', 'total_injuries': 'sum', 'total_deaths': 'sum'}).reset_index()

In [475]:
damages_per_year.info() # Check that there is not errors

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11 entries, 0 to 10
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   year                11 non-null     int64  
 1   total_lost          11 non-null     float64
 2   total_acres_burned  11 non-null     float64
 3   total_injuries      11 non-null     int64  
 4   total_deaths        11 non-null     int64  
dtypes: float64(2), int64(3)
memory usage: 568.0 bytes


In [476]:
pd.set_option('display.float_format', lambda x: '{:,}'.format(round(x, 2))) # To format Pandas scientific notation for floats

In [477]:
damages_per_year.head(10)

Unnamed: 0,year,total_lost,total_acres_burned,total_injuries,total_deaths
0,2011,269514.0,1108.0,0,0
1,2012,140966.0,519.98,2,0
2,2013,400782.0,1163.68,3,0
3,2014,201154.0,467.33,1,0
4,2015,25324597.0,12099.51,2,0
5,2016,110488.0,386.75,0,0
6,2017,139712652.0,226888.68,0,44
7,2018,2000305367.0,154572.54,2,86
8,2019,388854751.0,86356.25,4,0
9,2020,817873.0,21459.94,5,1


In [478]:
alt.Chart(damages_per_year).mark_bar(size=20).encode(
    x = "year", 
    y= "total_acres_burned", 
    color = alt.condition(
                alt.datum.year == 2017, 
                alt.value('orange'), 
                alt.value('steelblue')
        )
).properties(width=300, title="Acres Burned per Year")

In [479]:
alt.Chart(damages_per_year).mark_bar(size=20).encode(
    x = "year", 
    y= "total_lost", 
    color = alt.condition(
                alt.datum.year == 2018, 
                alt.value('orange'), 
                alt.value('steelblue')
        )
).properties(width=300, title="Total Lost")