In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import sqlite3
import re

In [3]:
pd.set_option('display.max_columns', None)

In [4]:
seed = 538

# Get basic stats on fire incidents from NFIRS by city

In this notebook, we'll get basic summary stats for our cities based on the NFIRS data. The goal is to calculate:
* Average fire spread (defined by number of other structures impacted)
* Average injuries and deaths per fire reported
* Average property damage per fire reported
* Average number of alarms tripped

In [5]:
WORKING_DIRECTORY = 'D:/Fire Project/data/'

We'll load our cleaned REAC data for comparison.

In [6]:
public = pd.read_csv(WORKING_DIRECTORY + 'clean_agg_public.csv', sep=',')
multi = pd.read_csv(WORKING_DIRECTORY + 'clean_agg_multi.csv', sep=',')

In [7]:
public['CITYSTATE'] = public.CITY + ',' + public.STATE
multi['CITYSTATE'] = multi.CITY + ',' + multi.STATE

## Calculate average spread

In [8]:
conn = sqlite3.Connection(WORKING_DIRECTORY + 'fire_data.db')

Let's check the proportion of missing values in the fields we care about.

EXP_NO is the number of other structures/vehicles to which the fire spreads. By calculating the average exposure number by city, we can see which cities are most likely to see fires spread.

* PROP_LOSS and CONT_LOSS are dollar values of lost property and lost contents (anything inside a burning structure/vehicle), there is also PROP_VAL and CONT_VAL, but these measure the pre-fire value.
* DET_ALERT (Detector alert) indicates whether a fire detector alerted the occupants or not. Options are 1 (yes alerted), 2 (not alerted), U (unknown), and NULL (not reported)

Here is the extract year code:
* SUBSTR(ia.INC_DATE, LENGTH(ia.INC_DATE) - 3, 4)

In [9]:
# 4 mins to run
spread = pd.read_sql("""
    SELECT ia.CITY || ',' || ia.STATE 
                AS CITYSTATE, 
            AVG(ia.EXP_NO) 
                AS AVG_SPREAD,
            AVG(bi.ALARMS)
                AS AVG_ALARMS,
            AVG(bi.PROP_LOSS + bi.CONT_LOSS)
                AS AVG_MONEY_LOST,
            AVG(bi.OTH_DEATH + bi.FF_DEATH)
                AS AVG_FATALITIES,
            AVG(bi.OTH_INJ + bi.FF_INJ)
                AS AVG_INJURED,
            COUNT(*)
                AS SUPPORT
    FROM basic_incident bi JOIN incident_address ia
        USING (INCIDENT_KEY)
    GROUP BY ia.CITY, ia.STATE
""", conn)
spread.sample(n=5, random_state=seed)

Unnamed: 0,CITYSTATE,AVG_SPREAD,AVG_ALARMS,AVG_MONEY_LOST,AVG_FATALITIES,AVG_INJURED,SUPPORT
5838,"PIERRE,SD",0.002028,0.0,18582.818533,0.0,1.0,493
4420,"MANASSA,CO",0.037975,,7705.405405,,,79
6020,"POWERS,MI",0.0,,3256.756757,,,41
6770,"SHOSHONE,ID",0.0,0.0,0.0,,,85
215,"ARANSAS PASS,TX",0.000779,0.997608,2206.695652,1.0,0.0,1283


### Get average time from alarm to response

### Average severity of fires in city, as measured by deaths per fire, number of alarms triggered? Other information about property damage?

### Demographic factors related to deaths.