# Objective:
As part of this project, we need to focus on understanding, cleaning, describing, visualizing, and 
transforming data to bring data to life and tell your own version of the story and unveil some hidden insights. 
Primary objective is to come up with creative features using given data and external sources and 
focus less on making a highly accurate model. 

Visualization Tool: You have the option to visualize your charts using Python (library of choice), 
    Tableau, Domo, PowerBI, Google Data Studio or any other visualization tool of choice, 
    so you can focus on feature engineering part.
    
Label: Create a label (target variable) using Civilian_Casualties by converting it into a binary variable 0 / 1 so the modelling problem becomes a classification problem. 
Label = 0 if  Civilian_Casualties = 0
Label = 1 if  Civilian_Casualties > 0

## @Surnjani Djoko Nov 2021
## Notes: this notebook joins incident dataset with fire station dataset, and create time related features plus distance between incident and fire station (km)

## Summary of the output dataset:

In [1]:
import pandas as pd
import numpy as np
import holidays

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

## User defined functions

In [2]:
# Compute distance between fire location and its fire station
# Distance Feature

def haversine_distance(row):
    '''
    distance is in km
    '''
    lat_p, lon_p = row['Latitude'], row['Longitude'] # fire location
    lat_d, lon_d = row['fs_Longitude'], row['fs_Longitude'] # fire station location
    radius = 6371 # km

    # convert decimal degrees to radians 
    dlat = np.radians(lat_d - lat_p)
    dlon = np.radians(lon_d - lon_p)
    a = np.sin(dlat/2) * np.sin(dlat/2) + np.cos(np.radians(lat_p)) * np.cos(np.radians(lat_d)) * np.sin(dlon/2) * np.sin(dlon/2)
    c = 2 * np.arctan2(np.sqrt(a), np.sqrt(1-a))
    distance = radius * c

    return distance

# (1) Importing dataset

## (1a) incident dataset

In [3]:
# main dataset: https://open.toronto.ca/dataset/fire-incidents/
incident=pd.read_csv('group_data/Fire_incidents_Data.csv')
print(incident.shape)
incident.head(3)

(17536, 43)


Unnamed: 0,_id,Area_of_Origin,Building_Status,Business_Impact,Civilian_Casualties,Count_of_Persons_Rescued,Estimated_Dollar_Loss,Estimated_Number_Of_Persons_Displaced,Exposures,Ext_agent_app_or_defer_time,Extent_Of_Fire,Final_Incident_Type,Fire_Alarm_System_Impact_on_Evacuation,Fire_Alarm_System_Operation,Fire_Alarm_System_Presence,Fire_Under_Control_Time,Ignition_Source,Incident_Number,Incident_Station_Area,Incident_Ward,Initial_CAD_Event_Type,Intersection,Last_TFS_Unit_Clear_Time,Latitude,Level_Of_Origin,Longitude,Material_First_Ignited,Method_Of_Fire_Control,Number_of_responding_apparatus,Number_of_responding_personnel,Possible_Cause,Property_Use,Smoke_Alarm_at_Fire_Origin,Smoke_Alarm_at_Fire_Origin_Alarm_Failure,Smoke_Alarm_at_Fire_Origin_Alarm_Type,Smoke_Alarm_Impact_on_Persons_Evacuating_Impact_on_Evacuation,Smoke_Spread,Sprinkler_System_Operation,Sprinkler_System_Presence,Status_of_Fire_On_Arrival,TFS_Alarm_Time,TFS_Arrival_Time,TFS_Firefighter_Casualties
0,1946929,81 - Engine Area,,,0,0,15000.0,,,2018-02-25T02:12:00,,01 - Fire,,,,2018-02-25T02:15:40,999 - Undetermined,F18020956,441,1.0,Vehicle Fire,Dixon Rd / 427 N Dixon Ramp,2018-02-25T02:38:31,43.686558,,-79.599419,47 - Vehicle,1 - Extinguished by fire department,1,4,99 - Undetermined,"896 - Sidewalk, street, roadway, highway, hwy ...",,,,,,,,"7 - Fully involved (total structure, vehicle, ...",2018-02-25T02:04:29,2018-02-25T02:10:11,0
1,1946930,"75 - Trash, rubbish area (outside)",,,0,0,50.0,,,2018-02-25T02:29:42,,01 - Fire,,,,2018-02-25T02:32:24,999 - Undetermined,F18020969,116,18.0,Fire - Grass/Rubbish,Sheppard Ave E / Clairtrell Rd,2018-02-25T02:35:58,43.766135,,-79.390039,97 - Other,1 - Extinguished by fire department,1,4,03 - Suspected Vandalism,"896 - Sidewalk, street, roadway, highway, hwy ...",,,,,,,,2 - Fire with no evidence from street,2018-02-25T02:24:43,2018-02-25T02:29:31,0
2,1946931,,,,0,0,,,,,,"03 - NO LOSS OUTDOOR fire (exc: Sus.arson,vand...",,,,,,F18021182,221,21.0,Fire - Highrise Residential,Danforth Rd / Savarin St,2018-02-25T19:14:03,43.74323,,-79.245061,,,6,22,,891 - Outdoor general auto parking,,,,,,,,,2018-02-25T18:29:59,2018-02-25T18:36:49,0


## (1b) fire station location

In [4]:
station=pd.read_csv('group_data/fire-station-locations_clean.csv')
print(station.shape)
station.head(3)

(84, 23)


Unnamed: 0,_id,ID,NAME,ADDRESS,ADDRESS_POINT_ID,ADDRESS_ID,CENTRELINE_ID,MAINT_STAGE,ADDRESS_NUMBER,LINEAR_NAME_FULL,POSTAL_CODE,GENERAL_USE,CLASS_FAMILY_DESC,ADDRESS_ID_LINK,PLACE_NAME,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUNICIPALITY_NAME,OBJECTID,geometry
0,1,21,FIRE STATION 211,900 TAPSCOTT RD,4236992,363382,4236991,REGULAR,900,Tapscott Rd,,Fire/Ambulance Stations,"Land, Structure, Structure Entrance",,"Fire Station 211, Ambulance Station 27",,,,,Scarborough North (23),Scarborough,1520443,"{u'type': u'Point', u'coordinates': (-79.24287..."
1,2,60,FIRE STATION 342,106 ASCOT AVE,764237,70190,1140634,REGULAR,106,Ascot Ave,,Fire Station,"Land, Structure, Structure Entrance",,Fire Station 342,,,,,Davenport (9),former Toronto,1541526,"{u'type': u'Point', u'coordinates': (-79.44862..."
2,3,61,FIRE STATION 343,65 HENDRICK AVE,819425,127148,1140587,REGULAR,65,Hendrick Ave,,Fire Station,"Land, Structure, Structure Entrance",,Fire Station 343,,,,,Toronto-St. Paul's (12),former Toronto,1543317,"{u'type': u'Point', u'coordinates': (-79.43075..."


In [5]:
# extract longitude and latitude from geometry column
station['fs_Longitude'] = station['geometry'].str[37:51].astype(float)
station['fs_Latitude'] = station['geometry'].str[53:-2].astype(float)

In [6]:
# extract station area so it can be used to join with incident dataset
station['Station_Area']= station.NAME.str.extract('(\d+)')
station['Station_Area'] = station['Station_Area'].apply(pd.to_numeric)

In [7]:
incident.Incident_Station_Area.unique()

array([441, 116, 221, 133, 132, 215, 235, 231, 332, 426, 225, 325, 226,
       341, 421, 244, 141, 115, 415, 431, 331, 413, 314, 333, 311, 145,
       143, 342, 443, 312, 223, 134, 214, 434, 423, 233, 114, 112, 224,
       326, 212, 343, 135, 125, 315, 234, 324, 113, 142, 146, 313, 442,
       222, 241, 345, 232, 121, 432, 425, 334, 411, 445, 243, 323, 435,
       213, 422, 412, 123, 344, 111, 242, 321, 433, 245, 211, 131, 322,
       444, 122, 335, 227, 346, 424])

In [8]:
incident = pd.merge(incident,station, how='left',left_on='Incident_Station_Area', right_on = 'Station_Area')

In [9]:
print(incident.shape)
incident.sample(3)

(17536, 69)


Unnamed: 0,_id_x,Area_of_Origin,Building_Status,Business_Impact,Civilian_Casualties,Count_of_Persons_Rescued,Estimated_Dollar_Loss,Estimated_Number_Of_Persons_Displaced,Exposures,Ext_agent_app_or_defer_time,Extent_Of_Fire,Final_Incident_Type,Fire_Alarm_System_Impact_on_Evacuation,Fire_Alarm_System_Operation,Fire_Alarm_System_Presence,Fire_Under_Control_Time,Ignition_Source,Incident_Number,Incident_Station_Area,Incident_Ward,Initial_CAD_Event_Type,Intersection,Last_TFS_Unit_Clear_Time,Latitude,Level_Of_Origin,Longitude,Material_First_Ignited,Method_Of_Fire_Control,Number_of_responding_apparatus,Number_of_responding_personnel,Possible_Cause,Property_Use,Smoke_Alarm_at_Fire_Origin,Smoke_Alarm_at_Fire_Origin_Alarm_Failure,Smoke_Alarm_at_Fire_Origin_Alarm_Type,Smoke_Alarm_Impact_on_Persons_Evacuating_Impact_on_Evacuation,Smoke_Spread,Sprinkler_System_Operation,Sprinkler_System_Presence,Status_of_Fire_On_Arrival,TFS_Alarm_Time,TFS_Arrival_Time,TFS_Firefighter_Casualties,_id_y,ID,NAME,ADDRESS,ADDRESS_POINT_ID,ADDRESS_ID,CENTRELINE_ID,MAINT_STAGE,ADDRESS_NUMBER,LINEAR_NAME_FULL,POSTAL_CODE,GENERAL_USE,CLASS_FAMILY_DESC,ADDRESS_ID_LINK,PLACE_NAME,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUNICIPALITY_NAME,OBJECTID,geometry,fs_Longitude,fs_Latitude,Station_Area
3008,1949937,24 - Cooking Area or Kitchen,01 - Normal (no change),8 - Not applicable (not a business),0,0,0.0,0.0,,2018-11-25T01:35:10,1 - Confined to object of origin,01 - Fire,3 - No one (at risk) evacuated as a result of ...,1 - Fire alarm system operated,1 - Fire alarm system present,2018-11-25T01:35:10,"11 - Stove, Range-top burner",F18126026,331,10.0,Alarm Residential,Stafford St / Wellington St W,2018-11-25T01:39:14,43.641013,003,-79.41025,"56 - Paper, Cardboard",3 - Extinguished by occupant,5,16,47 - Improper handling of ignition source or i...,232 - Halfway/Transitional House,3 - Floor/suite of fire origin: Smoke alarm pr...,4 - Remote from fire – smoke did not reach alarm,2 - Hardwired (standalone),3 - No one (at risk) evacuated as a result of ...,2 - Confined to part of room/area of origin,2 - Did not activate: remote from fire,2 - Partial sprinkler system present,1 - Fire extinguished prior to arrival,2018-11-25T01:27:52,2018-11-25T01:31:48,0,13,54,FIRE STATION 331,33 CLAREMONT ST,6522583,152968,1146109,REGULAR,33,Claremont St,,Fire Station,"Land, Structure, Structure Entrance",,Fire Station 331,,,,,Spadina-Fort York (10),former Toronto,1701571,"{u'type': u'Point', u'coordinates': (-79.40977...",-79.409774,43.647336,331
6240,1953169,42 - Garage,01 - Normal (no change),8 - Not applicable (not a business),0,0,100.0,0.0,,2011-06-04T19:32:00,1 - Confined to object of origin,01 - Fire,9 - Undetermined,1 - Fire alarm system operated,1 - Fire alarm system present,2011-06-04T19:33:41,999 - Undetermined,F11062514,116,23.0,VEFU,Bayview Ave / Bayview Mews Lane,2011-06-04T20:48:31,43.76938,B01,-79.38866,"46 - Rubbish, Trash, Waste",3 - Extinguished by occupant,6,18,99 - Undetermined,"303 - Attached Dwelling (eg. rowhouse, townhou...",1 - Floor/suite of fire origin: No smoke alarm,98 - Not applicable: Alarm operated OR presenc...,8 - Not applicable - no smoke alarm or presenc...,9 - Undetermined,3 - Spread to entire room of origin,3 - Did not activate: fire too small to trigge...,2 - Partial sprinkler system present,1 - Fire extinguished prior to arrival,2011-06-04T19:22:27,2011-06-04T19:28:06,0,46,6,FIRE STATION 116,255 ESTHER SHINER BLVD,8731578,1464220,30005958,REGULAR,255,Esther Shiner Blvd,,Fire Station,"Structure, Structure Entrance",484876.0,Fire Station 116,,,,,Don Valley North (17),North York,2607109,"{u'type': u'Point', u'coordinates': (-79.36506...",-79.365064,43.769146,116
11098,1958027,53 - Chimney/Flue Pipe,01 - Normal (no change),8 - Not applicable (not a business),0,0,1000.0,0.0,,2017-02-07T22:35:00,1 - Confined to object of origin,01 - Fire,"8 - Not applicable: No fire alarm system, no p...",8 - Not applicable (no system),8 - Not applicable (bldg not classified by OBC...,2017-02-07T23:15:00,17 - Wood burning stove,F17012288,142,8.0,FIR,Derrydown Rd / Keegan Cres,2017-02-07T23:42:17,43.75359,B01,-79.5051,"44 - Creosote (chimney, flue pipe)",1 - Extinguished by fire department,6,21,20 - Design/Construction/Installation/Maintena...,301 - Detached Dwelling,3 - Floor/suite of fire origin: Smoke alarm pr...,4 - Remote from fire – smoke did not reach alarm,1 - Battery operated,7 - Not applicable: Occupant(s) first alerted ...,2 - Confined to part of room/area of origin,8 - Not applicable - no sprinkler system present,3 - No sprinkler system,3 - Fire with smoke showing only - including v...,2017-02-07T22:23:31,2017-02-07T22:29:09,0,57,17,FIRE STATION 142,2753 JANE ST,531061,461520,442012,REGULAR,2753,Jane St,,Fire/Ambulance Stations,"Land, Structure, Structure Entrance",,"Fire Station 142, Ambulance Station 15",,,,,Humber River-Black Creek (7),North York,2982185,"{u'type': u'Point', u'coordinates': (-79.51437...",-79.514371,43.745983,142


## process target

In [10]:
incident['LABEL'] = 0
incident.loc[incident.Civilian_Casualties > 0, 'LABEL']=1

In [11]:
pd.crosstab(incident.LABEL, incident.Civilian_Casualties)

Civilian_Casualties,0,1,2,3,4,5,6,7,8,15
LABEL,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
0,16543,0,0,0,0,0,0,0,0,0
1,0,807,116,40,20,2,4,1,2,1


In [12]:
incident.LABEL.value_counts(normalize=True)

0    0.943374
1    0.056626
Name: LABEL, dtype: float64

## (2) Reformat date related columns

In [13]:
#Convert the string to datetime format

date_var = ["TFS_Alarm_Time", "TFS_Arrival_Time", "Fire_Under_Control_Time", "Last_TFS_Unit_Clear_Time"]
incident[date_var] = incident[date_var].apply(pd.to_datetime)

## (3.1) Create distance feature

In [14]:
# what is the distance between incident and fire station (in km)
incident['DISTANCE_INCIDENT_FIRESTATION'] = incident.apply(haversine_distance, axis = 1)

## (3.2) Create dates related features
### minutes to arrive, minutes to leave, incident period of the day, is holiday, and a day or a day after is a holidy

In [15]:
incident['INCIDENT_DATE']= incident['TFS_Alarm_Time'].apply(lambda x: x.date())
incident['INCIDENT_DATE'] = incident['INCIDENT_DATE'].apply(pd.to_datetime)
incident['DOW']= incident['INCIDENT_DATE'].apply(lambda x: x.weekday())
incident['IS_WEEKEND'] = incident['INCIDENT_DATE'].apply(lambda x: 1 if x.weekday() in (5, 6) else 0)

In [16]:
incident['YEAR'] = pd.DatetimeIndex(incident['INCIDENT_DATE']).year
incident['MONTH'] = pd.DatetimeIndex(incident['INCIDENT_DATE']).month

In [17]:
incident.sample(3)

Unnamed: 0,_id_x,Area_of_Origin,Building_Status,Business_Impact,Civilian_Casualties,Count_of_Persons_Rescued,Estimated_Dollar_Loss,Estimated_Number_Of_Persons_Displaced,Exposures,Ext_agent_app_or_defer_time,Extent_Of_Fire,Final_Incident_Type,Fire_Alarm_System_Impact_on_Evacuation,Fire_Alarm_System_Operation,Fire_Alarm_System_Presence,Fire_Under_Control_Time,Ignition_Source,Incident_Number,Incident_Station_Area,Incident_Ward,Initial_CAD_Event_Type,Intersection,Last_TFS_Unit_Clear_Time,Latitude,Level_Of_Origin,Longitude,Material_First_Ignited,Method_Of_Fire_Control,Number_of_responding_apparatus,Number_of_responding_personnel,Possible_Cause,Property_Use,Smoke_Alarm_at_Fire_Origin,Smoke_Alarm_at_Fire_Origin_Alarm_Failure,Smoke_Alarm_at_Fire_Origin_Alarm_Type,Smoke_Alarm_Impact_on_Persons_Evacuating_Impact_on_Evacuation,Smoke_Spread,Sprinkler_System_Operation,Sprinkler_System_Presence,Status_of_Fire_On_Arrival,TFS_Alarm_Time,TFS_Arrival_Time,TFS_Firefighter_Casualties,_id_y,ID,NAME,ADDRESS,ADDRESS_POINT_ID,ADDRESS_ID,CENTRELINE_ID,MAINT_STAGE,ADDRESS_NUMBER,LINEAR_NAME_FULL,POSTAL_CODE,GENERAL_USE,CLASS_FAMILY_DESC,ADDRESS_ID_LINK,PLACE_NAME,X,Y,LATITUDE,LONGITUDE,WARD_NAME,MUNICIPALITY_NAME,OBJECTID,geometry,fs_Longitude,fs_Latitude,Station_Area,LABEL,DISTANCE_INCIDENT_FIRESTATION,INCIDENT_DATE,DOW,IS_WEEKEND,YEAR,MONTH
5969,1952898,81 - Engine Area,,,0,0,60000.0,,,2011-03-24T14:12:00,,01 - Fire,,,,2011-03-24 14:42:37,999 - Undetermined,F11033388,432,5.0,FICI,Paxman Rd / The West Mall,2011-03-24 15:31:12,43.62407,,-79.55955,99 - Undetermined (formerly 98),1 - Extinguished by fire department,6,21,99 - Undetermined,903 - Large Truck (Excluding Truck Trailer),,,,,,,,5 - Flames showing from large area (more than ...,2011-03-24 14:02:37,2011-03-24 14:10:38,0,47,76,FIRE STATION 432,155 THE EAST MALL,8100063,6586,8100062,REGULAR,155,The East Mall,,Fire/Ambulance Stations,"Land, Structure, Structure Entrance",,"Fire Station 432, Ambulance Station 39",,,,,Etobicoke-Lakeshore (3),Etobicoke,2609714,"{u'type': u'Point', u'coordinates': (-79.54935...",-79.549354,43.623341,432,0,13696.259828,2011-03-24,3,0,2011,3
7984,1954913,24 - Cooking Area or Kitchen,01 - Normal (no change),1 - No business interruption,0,0,50000.0,7.0,,2017-07-30T19:25:31,2 - Confined to part of room/area of origin,01 - Fire,"8 - Not applicable: No fire alarm system, no p...",8 - Not applicable (no system),2 - No Fire alarm system,2017-07-30 19:35:25,15 - Range Hood,F17071688,132,15.0,FIR,Hotspur Rd / Neptune Dr,2017-07-30 22:46:51,43.73236,002,-79.43572,"74 - Cooking Oil, Grease",1 - Extinguished by fire department,11,34,"28 - Routine maintenance deficiency, eg creoso...",321 - Multi-Unit Dwelling - 2 to 6 Units,3 - Floor/suite of fire origin: Smoke alarm pr...,"5 - Separated from fire (e.g. wall, etc)",1 - Battery operated,2 - Some persons (at risk) self evacuated as a...,6 - Multi unit bldg: spread to separate suite(s),8 - Not applicable - no sprinkler system present,3 - No sprinkler system,3 - Fire with smoke showing only - including v...,2017-07-30 19:15:43,2017-07-30 19:20:24,0,15,12,FIRE STATION 132,476 LAWRENCE AVE W,9847351,100477,9694792,REGULAR,476,Lawrence Ave W,,Fire Station,"Land, Structure, Structure Entrance",,Fire Station 132,,,,,Eglinton-Lawrence (8),North York,1852933,"{u'type': u'Point', u'coordinates': (-79.42857...",-79.428572,43.719798,132,0,13694.870792,2017-07-30,6,1,2017,7
13625,1960554,28 - Office,01 - Normal (no change),1 - No business interruption,0,0,1000.0,0.0,,2014-05-26T14:30:16,1 - Confined to object of origin,01 - Fire,1 - All persons (at risk of injury) evacuated ...,1 - Fire alarm system operated,1 - Fire alarm system present,2014-05-26 14:30:16,98 - Other,F14044910,134,22.0,FICI,Eglinton Ave E / Taunton Rd,2014-05-26 14:45:57,43.70865,B01,-79.38888,97 - Other,3 - Extinguished by occupant,6,21,52 - Electrical Failure,405 - General Business Office,2 - Floor/suite of fire origin: Smoke alarm pr...,98 - Not applicable: Alarm operated OR presenc...,4 - Interconnected,1 - All persons (at risk of injury) self evacu...,2 - Confined to part of room/area of origin,8 - Not applicable - no sprinkler system present,3 - No sprinkler system,1 - Fire extinguished prior to arrival,2014-05-26 14:25:02,2014-05-26 14:30:15,0,11,14,FIRE STATION 134,16 MONTGOMERY AVE,843876,106325,1138230,REGULAR,16,Montgomery Ave,,Fire Station,"Land, Structure, Structure Entrance",,Fire Station 134,,,,,Eglinton-Lawrence (8),former Toronto,1662464,"{u'type': u'Point', u'coordinates': (-79.39976...",-79.399768,43.709657,134,0,13689.031561,2014-05-26,0,0,2014,5


In [18]:
# how long does it take for fire services to arrive at the incident location
# get the number of minutes it took for Fire Services to arrive to the emergency
incident["MINUTES_TO_ARRIVE"] = np.around((incident["TFS_Arrival_Time"] - 
                            incident["TFS_Alarm_Time"]) / np.timedelta64(1, "m"), decimals=3)

In [19]:
incident['TFS_ARR_DAY'] = incident['TFS_Arrival_Time'].dt.day
incident['TFS_ALM_DAY'] = incident['TFS_Alarm_Time'].dt.day

incident['TFS_ARR_HOUR'] = incident['TFS_Arrival_Time'].dt.hour
incident['TFS_ALM_HOUR'] = incident['TFS_Alarm_Time'].dt.hour

In [20]:
# Once the fires crew arrives, how long does it take to clear the fire
incident["MINUTES_TO_LEAVE"] = np.around((incident["Last_TFS_Unit_Clear_Time"] -
              incident["TFS_Arrival_Time"]) / np.timedelta64(1, "m"), decimals=3)

In [21]:
incident["TFS_Alarm_Time"].min()

Timestamp('2011-01-01 05:06:48')

In [22]:
incident["TFS_Alarm_Time"].max()

Timestamp('2019-07-01 03:47:46')

In [23]:
# create categorical feature to indicate the incident period 
day_mapping = {1: 'Late Night',
               2: 'Early Morning',
               3: 'Morning',
               4: 'Noon',
               5: 'Evening',
               6: 'Night'}
incident['INCIDENT_PERIOD_NUM'] = (incident['TFS_ALM_HOUR']%24+4)//4
incident['INCIDENT_PERIOD_CAT'] = incident['INCIDENT_PERIOD_NUM'].replace(day_mapping)

In [24]:
pd.crosstab(incident.INCIDENT_PERIOD_NUM,incident.INCIDENT_PERIOD_CAT)

INCIDENT_PERIOD_CAT,Early Morning,Evening,Late Night,Morning,Night,Noon
INCIDENT_PERIOD_NUM,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
1,0,0,3531,0,0,0
2,2186,0,0,0,0,0
3,0,0,0,1561,0,0
4,0,0,0,0,0,2405
5,0,3611,0,0,0,0
6,0,0,0,0,4242,0


In [25]:
pd.crosstab(incident.TFS_ALM_HOUR,incident.INCIDENT_PERIOD_CAT)

INCIDENT_PERIOD_CAT,Early Morning,Evening,Late Night,Morning,Night,Noon
TFS_ALM_HOUR,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,0,0,1005,0,0,0
1,0,0,962,0,0,0
2,0,0,830,0,0,0
3,0,0,734,0,0,0
4,623,0,0,0,0,0
5,552,0,0,0,0,0
6,564,0,0,0,0,0
7,447,0,0,0,0,0
8,0,0,0,393,0,0
9,0,0,0,336,0,0


In [26]:
incident.INCIDENT_PERIOD_NUM.value_counts(normalize=True)

6    0.241902
5    0.205919
1    0.201357
4    0.137146
2    0.124658
3    0.089017
Name: INCIDENT_PERIOD_NUM, dtype: float64

In [27]:
# prep for the holiday calculation
years = incident['INCIDENT_DATE'].dt.year.unique()
years = sorted(years)
years_ = [int(item) for item in years]

In [28]:
years = incident['INCIDENT_DATE'].dt.year.unique()
CA_holidays = []
for year in years_:
    for date in holidays.CA(prov='ON',years=year).items():
        CA_holidays.append(str(date[0]))
print(CA_holidays)

['2011-01-01', '2010-12-31', '2011-02-21', '2011-04-22', '2011-05-23', '2011-07-01', '2011-08-01', '2011-09-05', '2011-10-10', '2011-12-25', '2011-12-26', '2011-12-27', '2012-01-01', '2012-01-02', '2012-02-20', '2012-04-06', '2012-05-21', '2012-07-01', '2012-07-02', '2012-08-06', '2012-09-03', '2012-10-08', '2012-12-25', '2012-12-26', '2013-01-01', '2013-02-18', '2013-03-29', '2013-05-20', '2013-07-01', '2013-08-05', '2013-09-02', '2013-10-14', '2013-12-25', '2013-12-26', '2014-01-01', '2014-02-17', '2014-04-18', '2014-05-19', '2014-07-01', '2014-08-04', '2014-09-01', '2014-10-13', '2014-12-25', '2014-12-26', '2015-01-01', '2015-02-16', '2015-04-03', '2015-05-18', '2015-07-01', '2015-08-03', '2015-09-07', '2015-10-12', '2015-12-25', '2015-12-28', '2016-01-01', '2016-02-15', '2016-03-25', '2016-05-23', '2016-07-01', '2016-08-01', '2016-09-05', '2016-10-10', '2016-12-25', '2016-12-26', '2016-12-27', '2017-01-01', '2017-01-02', '2017-02-20', '2017-04-14', '2017-05-22', '2017-07-01', '2017

In [29]:
incident['IS_HOLIDAY'] = [1 if str(val).split()[0] in CA_holidays else 0 for val in incident['INCIDENT_DATE']]

In [30]:
# 1 day before holiday
incident['IS_HOLIDAY_LAG1'] = [1 if str(val).split()[0] in CA_holidays \
                               else 0 for val in (incident['INCIDENT_DATE']+pd.Timedelta(days=-1))]
# 1 day after holiday
incident['IS_HOLIDAY_LEAD1'] = [1 if str(val).split()[0] in CA_holidays \
                               else 0 for val in (incident['INCIDENT_DATE']+pd.Timedelta(days=1))]

In [31]:
def is_holiday_season(row):
    row_sum = row['IS_HOLIDAY'] + row['IS_HOLIDAY_LEAD1'] + row['IS_HOLIDAY_LAG1']
    return (1 if row_sum >= 1 else 0)

In [32]:
incident['IS_HOLIDAY_SEASON'] = incident.apply(is_holiday_season, axis=1)

In [33]:
incident[['IS_HOLIDAY','IS_HOLIDAY_LAG1','IS_HOLIDAY_LEAD1','IS_HOLIDAY_SEASON']].nunique()

IS_HOLIDAY           2
IS_HOLIDAY_LAG1      2
IS_HOLIDAY_LEAD1     2
IS_HOLIDAY_SEASON    2
dtype: int64

In [88]:
incident.to_csv('group_data/incident_SD.csv', index=False)

## Quick check on the data

In [57]:
incident.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 17536 entries, 0 to 17535
Data columns (total 88 columns):
 #   Column                                                         Non-Null Count  Dtype         
---  ------                                                         --------------  -----         
 0   _id_x                                                          17536 non-null  int64         
 1   Area_of_Origin                                                 15623 non-null  object        
 2   Building_Status                                                11216 non-null  object        
 3   Business_Impact                                                11214 non-null  object        
 4   Civilian_Casualties                                            17536 non-null  int64         
 5   Count_of_Persons_Rescued                                       17536 non-null  int64         
 6   Estimated_Dollar_Loss                                          15627 non-null  float64       


In [207]:
# Deterime the pecentage of nan per column
incident.isnull().sum()/incident.shape[0]*100

_id                                                               0.000000
Area_of_Origin                                                   10.908987
Building_Status                                                  36.040146
Business_Impact                                                   0.000000
Civilian_Casualties                                               0.000000
Count_of_Persons_Rescued                                          0.000000
Estimated_Dollar_Loss                                            10.886177
Estimated_Number_Of_Persons_Displaced                            36.045849
Exposures                                                        98.101049
Ext_agent_app_or_defer_time                                      10.914690
Extent_Of_Fire                                                    0.000000
Final_Incident_Type                                               0.000000
Fire_Alarm_System_Impact_on_Evacuation                           36.051551
Fire_Alarm_System_Operati

In [None]:
# ##################