In [6]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import sqlite3
import re

In [7]:
pd.set_option('display.max_columns', None)

In [8]:
seed = 538

# Get basic stats on fire incidents from NFIRS by city

In this notebook, we'll get basic summary stats for our cities based on the NFIRS data. The goal is to calculate:
* Average fire spread (defined by number of other structures impacted)
* Average injuries and deaths per fire reported
* Average property damage per fire reported
* Average number of alarms tripped

In [9]:
WORKING_DIRECTORY = 'D:/Fire Project/data/'

## Calculate basic statistics by city

In [10]:
conn = sqlite3.Connection(WORKING_DIRECTORY + 'fire_data.db')

EXP_NO is the number of other structures/vehicles to which the fire spreads. By calculating the average exposure number by city, we can see which cities are most likely to see fires spread.

* PROP_LOSS and CONT_LOSS are dollar values of lost property and lost contents (anything inside a burning structure/vehicle), there is also PROP_VAL and CONT_VAL, but these measure the pre-fire value.
* DET_ALERT (Detector alert) indicates whether a fire detector alerted the occupants or not. Options are 1 (yes alerted), 2 (not alerted), U (unknown), and NULL (not reported)

Because we want to use data from 2020 and later as a validation set, we'll exclude 2020 from the statistics that we calculate.

Here is the extract year code:
* SUBSTR(ia.INC_DATE, LENGTH(ia.INC_DATE) - 3, 4)

In [11]:
# 4 mins to run
city_stats = pd.read_sql("""
    SELECT ia.CITY, 
            ia.STATE,
            ia.CITY || ',' || ia.STATE 
                AS CITYSTATE, 
            AVG(ia.EXP_NO)
                AS AVG_SPREAD_ADJ,
            AVG(bi.ALARMS)
                AS AVG_ALARMS,
            AVG(bi.PROP_LOSS + bi.CONT_LOSS)
                AS AVG_MONEY_LOST,
            AVG(bi.OTH_DEATH + bi.FF_DEATH)
                AS AVG_FATALITIES,
            AVG(bi.OTH_INJ + bi.FF_INJ)
                AS AVG_INJURED,
            COUNT(bi.ID)
                AS SUPPORT
    FROM basic_incident bi JOIN incident_address ia
        USING (INCIDENT_KEY)
    WHERE CAST(SUBSTR(ia.INC_DATE, LENGTH(ia.INC_DATE) - 3, 4) AS INTEGER) < 2019
    GROUP BY ia.CITY, ia.STATE
""", conn)

In [12]:
city_stats

Unnamed: 0,CITY,STATE,CITYSTATE,AVG_SPREAD_ADJ,AVG_ALARMS,AVG_MONEY_LOST,AVG_FATALITIES,AVG_INJURED,SUPPORT
0,ABBEVILLE,AL,"ABBEVILLE,AL",0.000000,0.500000,0.000000,,,6
1,ABBEVILLE,GA,"ABBEVILLE,GA",0.000000,0.820717,11990.486486,,,259
2,ABBEVILLE,LA,"ABBEVILLE,LA",0.017306,0.577381,12703.393939,0.5,1.5,1329
3,ABBEVILLE,SC,"ABBEVILLE,SC",0.000000,0.500000,479.192901,0.5,1.0,1135
4,ABBOTSFORD,WI,"ABBOTSFORD,WI",0.000000,0.000000,13815.000000,,,51
...,...,...,...,...,...,...,...,...,...
8213,ZION CROSSROADS,VA,"ZION CROSSROADS,VA",0.000000,1.000000,0.000000,,,3
8214,ZIONSVILLE,IN,"ZIONSVILLE,IN",0.001681,0.044964,11379.409283,,,595
8215,ZUMBROTA,MN,"ZUMBROTA,MN",0.000000,,44286.538462,,,108
8216,ZUNI,VA,"ZUNI,VA",0.000000,1.000000,4540.606061,0.5,0.5,93


In [19]:
city_stats.to_csv(path_or_buf=WORKING_DIRECTORY + 'nfirs_stats_pre2019.csv')

### Get average time from alarm to response

### Average severity of fires in city, as measured by deaths per fire, number of alarms triggered? Other information about property damage?

### Demographic factors related to deaths.