### 1. Importing

In [1]:
import pandas as pd
import altair as alt
import numpy as np

In [2]:
xls = pd.ExcelFile('residential-nonresidential-fire-loss-estimates-2003-2021_fixed.xlsx')
prop_type_totals = pd.read_excel(xls, 'Res Bldg Prop Typ Est 2003-2021')
fire_causes = pd.read_excel(xls, 'Res Bldg Fire Est by Cause')
death_by_cause = pd.read_excel(xls, 'Res Bld Fire Death Est by Cause')
injury_by_cause = pd.read_excel(xls, 'Res Bld Fire Inj Est by Cause')
cost_by_cause = pd.read_excel(xls, 'Res Bld Fire Dollr Est by Cause')

I got this data set from the U.S. Fire Administration. I attached the original sheets (https://docs.google.com/spreadsheets/d/1ftqlXHDH5-mazBV-eiaq2O1GhhUa88bH/edit?usp=drive_link&ouid=109691480649921496130&rtpof=true&sd=true) and the cleaned up version (https://docs.google.com/spreadsheets/d/1RyRbEZeVrwTApUokM6hvDUTfQj0E3xvT/edit?usp=drive_link&ouid=109691480649921496130&rtpof=true&sd=true) that I used for this notebook on Google Drive. I was interested in fire coverage a lot this summer so I decided to explore what data sets the Fire Administration had on their site and found this one that documented what statistics that had on fires, especially regarding property fires.

### 2. Viewing the Data

So this first data frame is showing the overall totals of how many fires there were and the impacts of those fires, such as death, injuries and financial cost for 1 & 2 Family Homes as well as Multifamily Homes. It also shows the overall total as well. The next four data frames are expanding on those totals and breaking it down by the causes of the fires. 

In [3]:
prop_type_totals.tail()

Unnamed: 0,Years,Overall Res Building Fires,1 & 2 Family Fires,Multifamily Fires,Overall Deaths,1 & 2 Family Deaths,Multifamily Deaths,Overall Injuries,1 & 2 Family Injuries,Multifamily Injuries,Overall Dollars Los,1 & 2 Family Dollars Loss,Multifamily Dollars Loss
14,2017,371500,233800,109600,2695,2200,385,10825,6650,3800,8619200000,6081200000,2191400000
15,2018,379600,237000,110200,2790,2220,440,11525,7175,3875,8842700000,6596300000,1811600000
16,2019,354400,220600,100300,2830,2240,375,12625,7950,3950,8338100000,6321400000,1454900000
17,2020,372000,227300,107800,2615,2095,345,11825,7400,3725,9008700000,6539000000,1804900000
18,2021,353500,215700,100500,2840,2235,405,11400,7300,3500,8855900000,6452300000,1808400000


In [4]:
fire_causes

Unnamed: 0,Years,Overall Intentional,1 & 2 Fam Intentional,Multi Fam Intentional,Overall Playing with Heat Source,1 &2 Fam Playing with Heat Source,Multi Fam Playing with Heat Source,Overall Smoking,1 & 2 Fam Smoking,Mulit Fam Smoking,...,Overall Equipment Malfunction,1 & 2 Fam Equipment Malfunction,Multi Fam Equipment Malfunction,"Overall Other Unintentional, Careless","1 & 2 Fam Other Unintentional, Careless","Multi Fam Other Unintentional, Careless",Overall Cause Under Investigation,1 & 2 Fam Cause Under Investigation,Multi Fam Cause Under Investigation,Total Residential*
0,2003,17400,17400,3200,3500,2800,700,8900,5600,2700,...,16100,12200,3300,22000,17100,4000,2200,1800,400,381200
1,2004,17500,17500,3000,3100,2400,600,9000,5600,2800,...,16000,12000,3300,22300,17400,4000,2400,1900,400,389800
2,2005,18000,18000,3300,2900,2300,600,8700,5400,2600,...,15100,11100,3400,22500,17300,4100,2900,2200,500,376500
3,2006,18100,18100,3200,3200,2500,700,9700,6000,3100,...,14700,10800,3400,23400,18200,4100,3300,2500,600,392700
4,2007,19000,19000,3400,3100,2400,600,8900,5700,2700,...,15200,11400,3100,25400,20200,4000,3800,2900,700,390300
5,2008,18300,18300,3300,2700,2200,500,8300,5300,2500,...,13700,10400,2700,24500,19800,3900,3800,3100,600,378200
6,2009,16200,16200,2600,2100,1600,400,7000,4500,2100,...,12300,9300,2500,23100,18800,3500,3400,2700,500,356200
7,2010,16000,16000,2700,2200,1700,400,7600,4900,2300,...,13300,10000,2800,24600,19900,3800,3700,3100,500,362100
8,2011,17400,17400,2800,2300,1800,500,7800,4900,2500,...,13300,10100,2800,24900,20000,3900,3800,3100,600,364500
9,2012,18800,18800,2900,2300,1700,500,9600,6100,3000,...,2900,2400,400,21100,17400,2800,4400,3600,700,374000


In [5]:
death_by_cause

Unnamed: 0,Years,Overall Intentional,1 & 2 Fam Intentional,Multi Fam Intentional,Overall Playing with Heat Source,1 &2 Fam Playing with Heat Source,Multi Fam Playing with Heat Source,Overall Smoking,1 & 2 Fam Smoking,Mulit Fam Smoking,...,Multi Fam Exposure,Overall Equipment Malfunction,1 & 2 Fam Equipment Malfunction,Multi Fam Equipment Malfunction,"Overall Other Unintentional, Careless","1 & 2 Fam Other Unintentional, Careless","Multi Fam Other Unintentional, Careless",Overall Cause Under Investigation,1 & 2 Fam Cause Under Investigation,Multi Fam Cause Under Investigation
0,2003,265,220,40,130,130,<3,500,320,120,...,<3,100,95,5,440,365,55,215,205,5
1,2004,255,220,35,60,45,5,505,380,95,...,<3,155,105,25,430,350,60,195,170,25
2,2005,350,280,55,55,30,20,510,350,120,...,15,120,90,30,410,370,30,205,160,40
3,2006,250,175,50,80,75,10,485,360,90,...,10,90,65,20,365,310,35,165,120,35
4,2007,310,265,35,55,50,<3,470,340,100,...,<3,120,115,5,380,325,50,240,205,35
5,2008,310,245,55,45,40,<3,390,275,90,...,<3,90,70,10,430,370,50,265,220,40
6,2009,265,210,25,45,35,10,360,275,70,...,<3,115,100,5,410,320,75,240,205,20
7,2010,260,215,40,60,50,10,350,240,85,...,5,125,120,10,435,385,45,230,200,25
8,2011,240,200,20,30,20,5,305,230,65,...,<3,90,85,5,360,290,55,350,300,25
9,2012,320,285,40,45,25,15,330,240,80,...,<3,40,15,15,295,245,35,285,245,35


In [6]:
injury_by_cause

Unnamed: 0,Years,Overall Intentional,1 & 2 Fam Intentional,Multi Fam Intentional,Overall Playing with Heat Source,1 &2 Fam Playing with Heat Source,Multi Fam Playing with Heat Source,Overall Smoking,1 & 2 Fam Smoking,Mulit Fam Smoking,...,Multi Fam Exposure,Overall Equipment Malfunction,1 & 2 Fam Equipment Malfunction,Multi Fam Equipment Malfunction,"Overall Other Unintentional, Careless","1 & 2 Fam Other Unintentional, Careless","Multi Fam Other Unintentional, Careless",Overall Cause Under Investigation,1 & 2 Fam Cause Under Investigation,Multi Fam Cause Under Investigation
0,2003,875,570,255,500,405,95,1025,655,305,...,15,900,585,295,1550,1150,340,275,225,45
1,2004,850,625,215,500,345,135,1050,600,355,...,10,1150,815,270,1500,1070,390,325,240,75
2,2005,925,615,280,325,200,115,1025,505,405,...,20,1050,715,295,1500,1075,365,400,275,100
3,2006,750,440,255,425,315,110,1150,620,425,...,10,900,560,310,1350,930,355,375,255,95
4,2007,950,580,310,400,330,80,950,590,315,...,5,825,545,265,1500,1110,360,425,290,120
5,2008,825,505,300,400,320,45,950,580,335,...,20,875,565,280,1550,1040,440,375,290,90
6,2009,775,455,280,300,215,95,900,515,340,...,20,900,585,285,1525,1085,395,450,290,145
7,2010,750,480,225,425,335,100,950,605,300,...,10,975,655,280,1525,1030,430,475,265,195
8,2011,850,555,255,350,275,70,1050,610,380,...,15,950,615,305,1625,1145,425,425,285,110
9,2012,775,495,260,325,235,85,800,435,335,...,20,125,100,10,1050,680,335,500,370,120


In [7]:
cost_by_cause

Unnamed: 0,Years,Overall Intentional,1 & 2 Fam Intentional,Multi Fam Intentional,Overall Playing with Heat Source,1 &2 Fam Playing with Heat Source,Multi Fam Playing with Heat Source,Overall Smoking,1 & 2 Fam Smoking,Mulit Fam Smoking,...,Multi Fam Exposure,Overall Equipment Malfunction,1 & 2 Fam Equipment Malfunction,Multi Fam Equipment Malfunction,"Overall Other Unintentional, Careless","1 & 2 Fam Other Unintentional, Careless","Multi Fam Other Unintentional, Careless",Overall Cause Under Investigation,1 & 2 Fam Cause Under Investigation,Multi Fam Cause Under Investigation
0,2003,761200000,542300000,153100000,180200000,142600000,34200000,363700000,243000000,103000000,...,116500000,543000000,429800000,96300000,1140600000,962900000,138200000,252100000,220000000,26400000
1,2004,716600000,554900000,125500000,154600000,106900000,43700000,375800000,228500000,133500000,...,110100000,469600000,380800000,72100000,1114600000,848800000,209000000,293600000,214100000,74800000
2,2005,784600000,527700000,212000000,175700000,122100000,48700000,423600000,228900000,158400000,...,104100000,553900000,433000000,82800000,1145900000,967600000,148800000,451100000,313100000,101700000
3,2006,719900000,527100000,104200000,152300000,108400000,40500000,434000000,239200000,176100000,...,76700000,504800000,389200000,96600000,1204100000,963600000,194200000,405000000,293100000,95100000
4,2007,630800000,433400000,171000000,121900000,81900000,37400000,345800000,202000000,132000000,...,109700000,440600000,339300000,68400000,1130600000,930200000,144400000,357400000,233800000,110400000
5,2008,862600000,654200000,187000000,145600000,103400000,40700000,415900000,237800000,154900000,...,163700000,558700000,453700000,95700000,1427700000,1151500000,244400000,399100000,325000000,68800000
6,2009,742000000,523000000,127100000,103700000,80900000,21700000,443000000,243100000,170800000,...,186300000,551500000,443200000,88600000,1575200000,1287600000,236600000,475200000,382700000,65900000
7,2010,607600000,471000000,110100000,101000000,77100000,20000000,355600000,215200000,125600000,...,68300000,580800000,479200000,79400000,1410100000,1158100000,189300000,425800000,339400000,74900000
8,2011,580100000,447300000,111500000,104700000,79300000,21800000,357200000,187000000,156100000,...,131700000,505900000,386600000,103700000,1299000000,1036500000,159800000,437800000,341000000,79000000
9,2012,628800000,450600000,104200000,100300000,71600000,27900000,410800000,238700000,159300000,...,75200000,164200000,136400000,10000000,1321000000,1066900000,195300000,430400000,330600000,90900000


### 3. Changing the dtypes of years into a datetime

So I used .info() for all my dataframes to check that their dtypes are what I wanted them to be. Most of the dtypes were fine, yhey were an integer. But I wanted to change the year dtype into a datetime one.

In [8]:
prop_type_totals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype
---  ------                      --------------  -----
 0   Years                       19 non-null     int64
 1   Overall Res Building Fires  19 non-null     int64
 2   1 & 2 Family Fires          19 non-null     int64
 3   Multifamily Fires           19 non-null     int64
 4   Overall Deaths              19 non-null     int64
 5   1 & 2 Family Deaths         19 non-null     int64
 6   Multifamily Deaths          19 non-null     int64
 7   Overall Injuries            19 non-null     int64
 8   1 & 2 Family Injuries       19 non-null     int64
 9   Multifamily Injuries        19 non-null     int64
 10  Overall Dollars Los         19 non-null     int64
 11  1 & 2 Family Dollars Loss   19 non-null     int64
 12  Multifamily Dollars Loss    19 non-null     int64
dtypes: int64(13)
memory usage: 2.1 KB


In [9]:
prop_type_totals['Years'] = pd.to_datetime(prop_type_totals['Years'], format='%Y')

In [10]:
prop_type_totals.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 13 columns):
 #   Column                      Non-Null Count  Dtype         
---  ------                      --------------  -----         
 0   Years                       19 non-null     datetime64[ns]
 1   Overall Res Building Fires  19 non-null     int64         
 2   1 & 2 Family Fires          19 non-null     int64         
 3   Multifamily Fires           19 non-null     int64         
 4   Overall Deaths              19 non-null     int64         
 5   1 & 2 Family Deaths         19 non-null     int64         
 6   Multifamily Deaths          19 non-null     int64         
 7   Overall Injuries            19 non-null     int64         
 8   1 & 2 Family Injuries       19 non-null     int64         
 9   Multifamily Injuries        19 non-null     int64         
 10  Overall Dollars Los         19 non-null     int64         
 11  1 & 2 Family Dollars Loss   19 non-null     int64         
 

In [11]:
prop_type_totals

Unnamed: 0,Years,Overall Res Building Fires,1 & 2 Family Fires,Multifamily Fires,Overall Deaths,1 & 2 Family Deaths,Multifamily Deaths,Overall Injuries,1 & 2 Family Injuries,Multifamily Injuries,Overall Dollars Los,1 & 2 Family Dollars Loss,Multifamily Dollars Loss
0,2003-01-01,381200,249400,108800,3000,2480,380,13425,9200,3650,8376800000,6603600000,1362200000
1,2004-01-01,389800,254600,111700,3050,2485,425,13650,9275,3775,8097100000,6227800000,1509500000
2,2005-01-01,376500,245900,107000,2895,2225,510,13375,8950,3800,9013900000,6929100000,1654400000
3,2006-01-01,392700,253800,113900,2490,1925,445,12550,8225,3725,8932300000,6960600000,1570900000
4,2007-01-01,390300,260700,104600,2765,2285,405,13525,9125,3875,9353500000,7520600000,1445700000
5,2008-01-01,378200,250400,104100,2650,2160,395,13100,8400,4200,10095500000,7750500000,1693700000
6,2009-01-01,356200,234100,100200,2480,1965,375,12600,8125,4050,9169400000,7199300000,1560100000
7,2010-01-01,362100,236900,102700,2555,2025,425,13275,8525,4250,8259900000,6567800000,1359600000
8,2011-01-01,364500,237700,102800,2450,1945,390,13900,8925,4450,8012500000,6183300000,1478100000
9,2012-01-01,374000,242700,106000,2385,1885,400,13050,8300,4325,8383000000,6529600000,1462900000


In [12]:
fire_causes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 47 columns):
 #   Column                                   Non-Null Count  Dtype
---  ------                                   --------------  -----
 0   Years                                    19 non-null     int64
 1   Overall Intentional                      19 non-null     int64
 2   1 & 2 Fam Intentional                    19 non-null     int64
 3   Multi Fam Intentional                    19 non-null     int64
 4   Overall Playing with Heat Source         19 non-null     int64
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64
 6   Multi Fam Playing with Heat Source       19 non-null     int64
 7   Overall Smoking                          19 non-null     int64
 8   1 & 2 Fam Smoking                        19 non-null     int64
 9   Mulit Fam Smoking                        19 non-null     int64
 10  Overall Heating                          19 non-null     int64
 11  1 & 2 Fa

In [13]:
fire_causes['Years'] = pd.to_datetime(fire_causes['Years'], format='%Y')

In [14]:
fire_causes.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 47 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   Years                                    19 non-null     datetime64[ns]
 1   Overall Intentional                      19 non-null     int64         
 2   1 & 2 Fam Intentional                    19 non-null     int64         
 3   Multi Fam Intentional                    19 non-null     int64         
 4   Overall Playing with Heat Source         19 non-null     int64         
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64         
 6   Multi Fam Playing with Heat Source       19 non-null     int64         
 7   Overall Smoking                          19 non-null     int64         
 8   1 & 2 Fam Smoking                        19 non-null     int64         
 9   Mulit Fam Smoking                        19 n

In [15]:
death_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype 
---  ------                                   --------------  ----- 
 0   Years                                    19 non-null     int64 
 1   Overall Intentional                      19 non-null     int64 
 2   1 & 2 Fam Intentional                    19 non-null     int64 
 3   Multi Fam Intentional                    19 non-null     int64 
 4   Overall Playing with Heat Source         19 non-null     int64 
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64 
 6   Multi Fam Playing with Heat Source       19 non-null     object
 7   Overall Smoking                          19 non-null     int64 
 8   1 & 2 Fam Smoking                        19 non-null     int64 
 9   Mulit Fam Smoking                        19 non-null     int64 
 10  Overall Heating                          19 non-null     int64 


In [16]:
death_by_cause['Years'] = pd.to_datetime(death_by_cause['Years'], format='%Y')

In [17]:
death_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   Years                                    19 non-null     datetime64[ns]
 1   Overall Intentional                      19 non-null     int64         
 2   1 & 2 Fam Intentional                    19 non-null     int64         
 3   Multi Fam Intentional                    19 non-null     int64         
 4   Overall Playing with Heat Source         19 non-null     int64         
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64         
 6   Multi Fam Playing with Heat Source       19 non-null     object        
 7   Overall Smoking                          19 non-null     int64         
 8   1 & 2 Fam Smoking                        19 non-null     int64         
 9   Mulit Fam Smoking                        19 n

In [18]:
injury_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype
---  ------                                   --------------  -----
 0   Years                                    19 non-null     int64
 1   Overall Intentional                      19 non-null     int64
 2   1 & 2 Fam Intentional                    19 non-null     int64
 3   Multi Fam Intentional                    19 non-null     int64
 4   Overall Playing with Heat Source         19 non-null     int64
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64
 6   Multi Fam Playing with Heat Source       19 non-null     int64
 7   Overall Smoking                          19 non-null     int64
 8   1 & 2 Fam Smoking                        19 non-null     int64
 9   Mulit Fam Smoking                        19 non-null     int64
 10  Overall Heating                          19 non-null     int64
 11  1 & 2 Fa

In [19]:
injury_by_cause['Years'] = pd.to_datetime(injury_by_cause['Years'], format='%Y')

In [20]:
injury_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   Years                                    19 non-null     datetime64[ns]
 1   Overall Intentional                      19 non-null     int64         
 2   1 & 2 Fam Intentional                    19 non-null     int64         
 3   Multi Fam Intentional                    19 non-null     int64         
 4   Overall Playing with Heat Source         19 non-null     int64         
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64         
 6   Multi Fam Playing with Heat Source       19 non-null     int64         
 7   Overall Smoking                          19 non-null     int64         
 8   1 & 2 Fam Smoking                        19 non-null     int64         
 9   Mulit Fam Smoking                        19 n

In [21]:
cost_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype
---  ------                                   --------------  -----
 0   Years                                    19 non-null     int64
 1   Overall Intentional                      19 non-null     int64
 2   1 & 2 Fam Intentional                    19 non-null     int64
 3   Multi Fam Intentional                    19 non-null     int64
 4   Overall Playing with Heat Source         19 non-null     int64
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64
 6   Multi Fam Playing with Heat Source       19 non-null     int64
 7   Overall Smoking                          19 non-null     int64
 8   1 & 2 Fam Smoking                        19 non-null     int64
 9   Mulit Fam Smoking                        19 non-null     int64
 10  Overall Heating                          19 non-null     int64
 11  1 & 2 Fa

In [22]:
cost_by_cause['Years'] = pd.to_datetime(cost_by_cause['Years'], format='%Y')

In [23]:
cost_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   Years                                    19 non-null     datetime64[ns]
 1   Overall Intentional                      19 non-null     int64         
 2   1 & 2 Fam Intentional                    19 non-null     int64         
 3   Multi Fam Intentional                    19 non-null     int64         
 4   Overall Playing with Heat Source         19 non-null     int64         
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64         
 6   Multi Fam Playing with Heat Source       19 non-null     int64         
 7   Overall Smoking                          19 non-null     int64         
 8   1 & 2 Fam Smoking                        19 non-null     int64         
 9   Mulit Fam Smoking                        19 n

### 4. Cleaning the data

In the death_by_cause dataframe, there are a lot of less than 3 values (they're entered like this '<3') which means that the organization collecting the data is sure that there is a low value of deaths. also because these are estimates, it can also mean that they're not sure what the exact total it. I also plan to focus on whats impacting people more, so I plan to remove those '<3' values. This should also make it easier to change the dtype from object to float which helps when I'm trying to look at the data. To do this, I'm changing all the '<3' values with NaN and making them a float.

In [24]:
death_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   Years                                    19 non-null     datetime64[ns]
 1   Overall Intentional                      19 non-null     int64         
 2   1 & 2 Fam Intentional                    19 non-null     int64         
 3   Multi Fam Intentional                    19 non-null     int64         
 4   Overall Playing with Heat Source         19 non-null     int64         
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64         
 6   Multi Fam Playing with Heat Source       19 non-null     object        
 7   Overall Smoking                          19 non-null     int64         
 8   1 & 2 Fam Smoking                        19 non-null     int64         
 9   Mulit Fam Smoking                        19 n

In [25]:
death_by_cause

Unnamed: 0,Years,Overall Intentional,1 & 2 Fam Intentional,Multi Fam Intentional,Overall Playing with Heat Source,1 &2 Fam Playing with Heat Source,Multi Fam Playing with Heat Source,Overall Smoking,1 & 2 Fam Smoking,Mulit Fam Smoking,...,Multi Fam Exposure,Overall Equipment Malfunction,1 & 2 Fam Equipment Malfunction,Multi Fam Equipment Malfunction,"Overall Other Unintentional, Careless","1 & 2 Fam Other Unintentional, Careless","Multi Fam Other Unintentional, Careless",Overall Cause Under Investigation,1 & 2 Fam Cause Under Investigation,Multi Fam Cause Under Investigation
0,2003-01-01,265,220,40,130,130,<3,500,320,120,...,<3,100,95,5,440,365,55,215,205,5
1,2004-01-01,255,220,35,60,45,5,505,380,95,...,<3,155,105,25,430,350,60,195,170,25
2,2005-01-01,350,280,55,55,30,20,510,350,120,...,15,120,90,30,410,370,30,205,160,40
3,2006-01-01,250,175,50,80,75,10,485,360,90,...,10,90,65,20,365,310,35,165,120,35
4,2007-01-01,310,265,35,55,50,<3,470,340,100,...,<3,120,115,5,380,325,50,240,205,35
5,2008-01-01,310,245,55,45,40,<3,390,275,90,...,<3,90,70,10,430,370,50,265,220,40
6,2009-01-01,265,210,25,45,35,10,360,275,70,...,<3,115,100,5,410,320,75,240,205,20
7,2010-01-01,260,215,40,60,50,10,350,240,85,...,5,125,120,10,435,385,45,230,200,25
8,2011-01-01,240,200,20,30,20,5,305,230,65,...,<3,90,85,5,360,290,55,350,300,25
9,2012-01-01,320,285,40,45,25,15,330,240,80,...,<3,40,15,15,295,245,35,285,245,35


In [26]:
# if the cell'Multi Fam Playing with Heat Source' cell is '<3', change that cell to NaN
death_by_cause.loc[
    death_by_cause['Multi Fam Playing with Heat Source'] == '<3', 'Multi Fam Playing with Heat Source'
] = np.nan

In [27]:
death_by_cause.loc[
    death_by_cause['Multi Fam Appliances'] == '<3', 'Multi Fam Appliances'
] = np.nan

In [28]:
death_by_cause.loc[
    death_by_cause['Multi Fam Other Equipment'] == '<3', 'Multi Fam Other Equipment'
] = np.nan

In [29]:
death_by_cause.loc[
    death_by_cause['Multi Fam Natural'] == '<3', 'Multi Fam Natural'
] = np.nan

In [30]:
death_by_cause.loc[
    death_by_cause['Overall Exposure'] == '<3', 'Overall Exposure'
] = np.nan

In [31]:
death_by_cause.loc[
    death_by_cause['1 & 2 Fam Explosure'] == '<3', '1 & 2 Fam Explosure'
] = np.nan

In [32]:
death_by_cause.loc[
    death_by_cause['Multi Fam Exposure'] == '<3', 'Multi Fam Exposure'
] = np.nan

In [33]:
death_by_cause.loc[
    death_by_cause['Multi Fam Equipment Malfunction'] == '<3', 'Multi Fam Equipment Malfunction'
] = np.nan

In [34]:
# convert column to float
death_by_cause['Multi Fam Playing with Heat Source'] = death_by_cause['Multi Fam Playing with Heat Source'].astype(float)

In [35]:
death_by_cause['Multi Fam Appliances'] = death_by_cause['Multi Fam Appliances'].astype(float)

In [36]:
death_by_cause['Multi Fam Other Equipment'] = death_by_cause['Multi Fam Other Equipment'].astype(float)

In [37]:
death_by_cause['Multi Fam Natural'] = death_by_cause['Multi Fam Natural'].astype(float)

In [38]:
death_by_cause['Overall Exposure'] = death_by_cause['Overall Exposure'].astype(float)

In [39]:
death_by_cause['1 & 2 Fam Explosure'] = death_by_cause['1 & 2 Fam Explosure'].astype(float)

In [40]:
death_by_cause['Multi Fam Exposure'] = death_by_cause['Multi Fam Exposure'].astype(float)

In [41]:
death_by_cause['Multi Fam Equipment Malfunction'] = death_by_cause['Multi Fam Equipment Malfunction'].astype(float)

In [42]:
# checking that the values are now NaN
death_by_cause[['Multi Fam Playing with Heat Source']]

Unnamed: 0,Multi Fam Playing with Heat Source
0,
1,5.0
2,20.0
3,10.0
4,
5,
6,10.0
7,10.0
8,5.0
9,15.0


In [43]:
death_by_cause[['Multi Fam Appliances']]

Unnamed: 0,Multi Fam Appliances
0,10.0
1,10.0
2,5.0
3,
4,
5,10.0
6,10.0
7,15.0
8,15.0
9,10.0


In [44]:
death_by_cause[['Multi Fam Other Equipment']]

Unnamed: 0,Multi Fam Other Equipment
0,
1,
2,
3,10.0
4,25.0
5,5.0
6,5.0
7,5.0
8,5.0
9,5.0


In [45]:
death_by_cause[['Multi Fam Natural']]

Unnamed: 0,Multi Fam Natural
0,
1,
2,
3,
4,
5,
6,
7,
8,
9,5.0


In [46]:
death_by_cause[['Overall Exposure']]

Unnamed: 0,Overall Exposure
0,
1,25.0
2,20.0
3,15.0
4,35.0
5,10.0
6,5.0
7,30.0
8,20.0
9,15.0


In [47]:
death_by_cause[['1 & 2 Fam Explosure']]

Unnamed: 0,1 & 2 Fam Explosure
0,
1,25.0
2,5.0
3,5.0
4,35.0
5,10.0
6,5.0
7,15.0
8,20.0
9,15.0


In [48]:
death_by_cause[['Multi Fam Exposure']]

Unnamed: 0,Multi Fam Exposure
0,
1,
2,15.0
3,10.0
4,
5,
6,
7,5.0
8,
9,


In [49]:
death_by_cause[['Multi Fam Equipment Malfunction']]

Unnamed: 0,Multi Fam Equipment Malfunction
0,5.0
1,25.0
2,30.0
3,20.0
4,5.0
5,10.0
6,5.0
7,10.0
8,5.0
9,15.0


In [50]:
# checking that the dtype has changed
death_by_cause.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 46 columns):
 #   Column                                   Non-Null Count  Dtype         
---  ------                                   --------------  -----         
 0   Years                                    19 non-null     datetime64[ns]
 1   Overall Intentional                      19 non-null     int64         
 2   1 & 2 Fam Intentional                    19 non-null     int64         
 3   Multi Fam Intentional                    19 non-null     int64         
 4   Overall Playing with Heat Source         19 non-null     int64         
 5   1 &2 Fam Playing with Heat Source        19 non-null     int64         
 6   Multi Fam Playing with Heat Source       10 non-null     float64       
 7   Overall Smoking                          19 non-null     int64         
 8   1 & 2 Fam Smoking                        19 non-null     int64         
 9   Mulit Fam Smoking                        19 n

In [51]:
death_by_cause

Unnamed: 0,Years,Overall Intentional,1 & 2 Fam Intentional,Multi Fam Intentional,Overall Playing with Heat Source,1 &2 Fam Playing with Heat Source,Multi Fam Playing with Heat Source,Overall Smoking,1 & 2 Fam Smoking,Mulit Fam Smoking,...,Multi Fam Exposure,Overall Equipment Malfunction,1 & 2 Fam Equipment Malfunction,Multi Fam Equipment Malfunction,"Overall Other Unintentional, Careless","1 & 2 Fam Other Unintentional, Careless","Multi Fam Other Unintentional, Careless",Overall Cause Under Investigation,1 & 2 Fam Cause Under Investigation,Multi Fam Cause Under Investigation
0,2003-01-01,265,220,40,130,130,,500,320,120,...,,100,95,5.0,440,365,55,215,205,5
1,2004-01-01,255,220,35,60,45,5.0,505,380,95,...,,155,105,25.0,430,350,60,195,170,25
2,2005-01-01,350,280,55,55,30,20.0,510,350,120,...,15.0,120,90,30.0,410,370,30,205,160,40
3,2006-01-01,250,175,50,80,75,10.0,485,360,90,...,10.0,90,65,20.0,365,310,35,165,120,35
4,2007-01-01,310,265,35,55,50,,470,340,100,...,,120,115,5.0,380,325,50,240,205,35
5,2008-01-01,310,245,55,45,40,,390,275,90,...,,90,70,10.0,430,370,50,265,220,40
6,2009-01-01,265,210,25,45,35,10.0,360,275,70,...,,115,100,5.0,410,320,75,240,205,20
7,2010-01-01,260,215,40,60,50,10.0,350,240,85,...,5.0,125,120,10.0,435,385,45,230,200,25
8,2011-01-01,240,200,20,30,20,5.0,305,230,65,...,,90,85,5.0,360,290,55,350,300,25
9,2012-01-01,320,285,40,45,25,15.0,330,240,80,...,,40,15,15.0,295,245,35,285,245,35


### 5. Figuring out which columns I want to focus on

So there are a lot of different data frames, but I want to focus on what fire causes are affecting residential areas most, so I'm getting the sum of all the columns to help figure out which columns I want to focus on. 

In [53]:
fire_causes_totals = fire_causes.sum(numeric_only=True).to_frame().reset_index()
fire_causes_totals.columns = ['category', 'annual_totals']
fire_causes_totals

Unnamed: 0,category,annual_totals
0,Overall Intentional,324200
1,1 & 2 Fam Intentional,324200
2,Multi Fam Intentional,51900
3,Overall Playing with Heat Source,40700
4,1 &2 Fam Playing with Heat Source,31400
5,Multi Fam Playing with Heat Source,8200
6,Overall Smoking,155000
7,1 & 2 Fam Smoking,98500
8,Mulit Fam Smoking,46700
9,Overall Heating,863800


Based on this summary table, I plan to focus on 5 different columns: 3 for 1 & 2 Family Home and 2 fo MultiFamily Homes. I plan to focus on Heating, Cooking and Electrical Malfunction for 1&2 Fam and Cooking and Heating for MultiFam. The sum values of these columns show that they are the most common reason for a fire to break out. 

### 6. Exporting the cleaned up data frames

I'm exporting the cleaned up copies of the dataframes I worked on to csv files so that way we can analyze them right away in the next notebook without having to do all this over again. 

In [54]:
prop_type_totals.to_csv('prop_type_totals_clean.csv', index=False)
fire_causes.to_csv('fire_causes_clean.csv', index=False)
death_by_cause.to_csv('death_by_cause_clean.csv', index=False)
injury_by_cause.to_csv('injury_by_cause_clean.csv', index=False)
cost_by_cause.to_csv('cost_by_cause_clean.csv', index=False)