# CA Wildfire Caused by Power Lines

After requesting data from Cal Fire on powerline related fires, the department provided a database of all incidents that occurred between 2012 and 2021.

The objective of this notebook is to analyze this database in search of details about the size of the issue. 

Questions:
- How many power lines related wildfires have occurred in the last 10 years?
- How many per year? 
- What was the worst year?
- How many acres have been burned each year due to power lines incidents?
- How many lives have been lost?
- What is the cost of these fires?

### Configuration:

Let's begin by importing the Python tools necessary for the analysis.

In [346]:
import pandas as pd
import altair as alt
import datetime

Import Cal Fire Dataset:

In [347]:
powerlines_fires = pd.read_csv("All_CA_WLFires_Powerlines.csv") # Import data

In [348]:
powerlines_fires.info() # Check the quality of the data â€“ Not empty fields

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3067 entries, 0 to 3066
Data columns (total 9 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   FDID                            3067 non-null   int64  
 1   Inc date                        3067 non-null   object 
 2   Inc Number                      3067 non-null   object 
 3   Address                         3067 non-null   object 
 4   Total Loss                      3067 non-null   float64
 5   Total Acres Burned              3067 non-null   float64
 6   Equipment Involved in Ignition  3067 non-null   object 
 7   Total Injuries                  3067 non-null   int64  
 8   Total Deaths                    3067 non-null   int64  
dtypes: float64(2), int64(3), object(4)
memory usage: 215.8+ KB


In [349]:
powerlines_fires.head(3) 

Unnamed: 0,FDID,Inc date,Inc Number,Address,Total Loss,Total Acres Burned,Equipment Involved in Ignition,Total Injuries,Total Deaths
0,19110,01/14/2011,10797,CORRAL CANYON MALIBU 000000000,0.0,0.21,211 - Electrical power (utility) line.,0,0
1,37555,01/23/2011,864,Grove Rd PAUMA_VALLEY 92061,0.0,0.1,211 - Electrical power (utility) line.,0,0
2,19030,01/28/2011,5510,343 N CALIFORNIA ST Burbank 91505,500.0,0.01,211 - Electrical power (utility) line.,0,0


To make the database more handy, I have reformatted the column headings here:

In [351]:
powerlines_fires_format = powerlines_fires.rename(columns={"Inc date": "inc_date", "Address": "address", "Total Loss": "total_lost", "Total Acres Burned": "total_acres_burned", "Total Injuries": "total_injuries", "Total Deaths": "total_deaths", "Equipment Involved in Ignition": "equipment_involved_in_ignition", "Inc Number": "inc_number"})

In order to make my calculations per year, I need to reformate the field "inc_date":

In [352]:
powerlines_fires_format['year'] = pd.DatetimeIndex(powerlines_fires_format['inc_date']).year

Reformating Data Types:

In [353]:
powerlines_fires_format.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3067 entries, 0 to 3066
Data columns (total 10 columns):
 #   Column                          Non-Null Count  Dtype  
---  ------                          --------------  -----  
 0   FDID                            3067 non-null   int64  
 1   inc_date                        3067 non-null   object 
 2   inc_number                      3067 non-null   object 
 3   address                         3067 non-null   object 
 4   total_lost                      3067 non-null   float64
 5   total_acres_burned              3067 non-null   float64
 6   equipment_involved_in_ignition  3067 non-null   object 
 7   total_injuries                  3067 non-null   int64  
 8   total_deaths                    3067 non-null   int64  
 9   year                            3067 non-null   int64  
dtypes: float64(2), int64(4), object(4)
memory usage: 239.7+ KB


### Explore:
- Which kind of equipment is mentioned in the database? 
- Which one of these equipment causes more incidents?

In [354]:
powerlines_fires_format.equipment_involved_in_ignition.value_counts().reset_index() # Number of fires by "equipment involved in ignition"

Unnamed: 0,index,equipment_involved_in_ignition
0,211 - Electrical power (utility) line.,2536
1,212 - Electrical service supply wires.,324
2,"221 - Transformer, distribution type.",170
3,"223 - Transformer, low voltage.",37


- How many power lines related fires had ocurred per year?
- What was the worst year? 

In [355]:
fires_per_year = powerlines_fires_format.groupby('year').value_counts().to_frame(name = "number_of_fires").reset_index() # Create a new column with "number of fires" - Each row is one fire.

In [356]:
fires_per_year.head(1)

Unnamed: 0,year,FDID,inc_date,inc_number,address,total_lost,total_acres_burned,equipment_involved_in_ignition,total_injuries,total_deaths,number_of_fires
0,2011,1008,06/23/2011,1115287,BRUNS RD LIVERMORE 94550,0.0,5.0,211 - Electrical power (utility) line.,0,0,1


In [357]:
calculation_fires_per_year = fires_per_year.groupby('year')['number_of_fires'].sum().to_frame(name = "fires_per_year").reset_index() 

In [358]:
calculation_fires_per_year.head(20) # Fires per year

Unnamed: 0,year,fires_per_year
0,2011,214
1,2012,270
2,2013,296
3,2014,239
4,2015,228
5,2016,230
6,2017,328
7,2018,241
8,2019,424
9,2020,331


In [401]:
alt.Chart(calculation_fires_per_year).mark_bar(size=20).encode(
    x = "year", 
    y= "fires_per_year", 
    color = alt.condition(
        alt.datum.year == 2019, 
        alt.value('orange'), 
        alt.value('steelblue')
    )
).properties(title="Incidents per Year")

Explore the damage caused annually by fires related to power lines:

In [370]:
damages_per_year = powerlines_fires_format.groupby("year").agg({'total_lost':'sum', 'total_acres_burned':'sum', 'total_injuries': 'sum', 'total_deaths': 'sum'}).reset_index().

In [372]:
damages_per_year.info() # Check that there is not errors

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   year                10 non-null     int64  
 1   total_lost          10 non-null     float64
 2   total_acres_burned  10 non-null     float64
 3   total_injuries      10 non-null     int64  
 4   total_deaths        10 non-null     int64  
dtypes: float64(2), int64(3)
memory usage: 528.0 bytes


In [382]:
pd.set_option('display.float_format', lambda x: '%.5f' % x) # To format Pandas scientific notation for floats

In [383]:
damages_per_year.head(10)

Unnamed: 0,year,total_lost,total_acres_burned,total_injuries,total_deaths
0,2011,288074.0,1231.82,0,0
1,2012,244772.0,641.3,2,0
2,2013,454817.0,1834.79,3,0
3,2014,205531.0,977.87,1,0
4,2015,26343101.0,133017.73,2,0
5,2016,125367.0,404.37,0,0
6,2017,139719778.0,228884.33,1,44
7,2018,2000615493.0,154759.97,2,86
8,2019,389488747.0,86469.42,7,0
9,2020,977878.0,23260.55,6,1


In [399]:
alt.Chart(damages_per_year).mark_bar(size=20).encode(
    x = "year", 
    y= "total_acres_burned", 
    color = alt.condition(
                alt.datum.year == 2017, 
                alt.value('orange'), 
                alt.value('steelblue')
        )
).properties(width=300, title="Acres Burned per Year")

In [403]:
alt.Chart(damages_per_year).mark_bar(size=20).encode(
    x = "year", 
    y= "total_lost", 
    color = alt.condition(
                alt.datum.year == 2018, 
                alt.value('orange'), 
                alt.value('steelblue')
        )
).properties(width=300, title="Total Lost")