### We would like to use a neural network, now
- The data should follow a comprehensive workflow to ensure data is well-prepared and the model is appropriately designed and evaluated

#### Steps:
1. Import Data
2. Train-Val-Test
3. Preprocess Data
    - Handle Missing Values
    - Normalize/Scale Data
    - Encode Categorical Data
    - Text Processing
    - Handle Outliers
4. Create Pipeline for Numerical and Categorical Data
5. Build the nn
6. Compile and Train Model
7. Evaluate Model
8. Tune Hyperparameters
9. Model Interpretation and Feature Importance
10. Save/Load Model
11. Deploy Model

### 1. Import the data

In [85]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [86]:
fire_csv = '../../datasets/Fire-Incidents.csv'

fire_df = pd.read_csv(fire_csv)
fire_df.head()

Unnamed: 0,Area_of_Origin,Business_Impact,Civilian_Casualties,Count_of_Persons_Rescued,Estimated_Dollar_Loss,Estimated_Number_Of_Persons_Displaced,Ext_agent_app_or_defer_time,Extent_Of_Fire,Fire_Alarm_System_Impact_on_Evacuation,Fire_Alarm_System_Operation,...,Longitude,Material_First_Ignited,Method_Of_Fire_Control,Possible_Cause,Property_Use,Smoke_Alarm_at_Fire_Origin_Alarm_Failure,Smoke_Alarm_at_Fire_Origin_Alarm_Type,Status_of_Fire_On_Arrival,TFS_Alarm_Time,TFS_Arrival_Time
0,Porch or Balcony,No business interruption,0,86,3000,0,2018-08-24T17:06:26,Confined to object of origin,Not applicable: Occupant(s) first alerted by o...,Fire alarm system did not operate,...,-79.412479,Undetermined (formerly 98),Extinguished by fire department,Undetermined,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Hardwired (standalone),Fire with smoke showing only - including vehic...,2018-08-24T16:49:36,2018-08-24T16:54:09
1,Cooking Area or Kitchen,Undetermined,0,28,50000,28,2018-11-24T07:19:00,Confined to part of room/area of origin,Not applicable: Occupant(s) first alerted by o...,Fire alarm system operated,...,-79.530419,Plastic,Extinguished by occupant,Under Investigation,Infirmary,Not applicable: Alarm operated OR presence/ope...,Interconnected,Fire extinguished prior to arrival,2018-11-24T07:09:12,2018-11-24T07:14:23
2,"Living Area (e.g. living, TV, recreation, etc)",Not applicable (not a business),1,16,1000000,130,2017-02-09T18:02:13,"Spread beyond room of origin, same floor",Some persons (at risk) evacuated as a result o...,Fire alarm system operated,...,-79.37346,Undetermined (formerly 98),Extinguished by fire department,Undetermined,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Hardwired (standalone),Flames showing from small area (one storey or ...,2017-02-09T17:45:07,2017-02-09T17:48:49
3,Undetermined (formerly 98),May not resume operations,0,12,1000000,14,2012-10-30T00:52:04,Entire Structure,Undetermined,Fire alarm system operation undetermined,...,-79.3937,Undetermined (formerly 98),Extinguished by fire department,Undetermined,"Clothing Store, Accessories, fur",Not applicable: Alarm operated OR presence/ope...,Type undetermined,Flames showing from large area (more than one ...,2012-10-30T00:42:01,2012-10-30T00:44:58
4,"Sleeping Area or Bedroom (inc. patients room, ...",Not applicable (not a business),8,11,125000,2,2018-07-08T04:35:00,"Spread beyond room of origin, same floor",Some persons (at risk) evacuated as a result o...,Fire alarm system operated,...,-79.511539,Bedding,Extinguished by fire department,Suspected Arson,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Interconnected,Fire with no evidence from street,2018-07-08T04:08:50,2018-07-08T04:13:54


In [87]:
fire_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 11214 entries, 0 to 11213
Data columns (total 27 columns):
 #   Column                                    Non-Null Count  Dtype  
---  ------                                    --------------  -----  
 0   Area_of_Origin                            11214 non-null  object 
 1   Business_Impact                           11214 non-null  object 
 2   Civilian_Casualties                       11214 non-null  int64  
 3   Count_of_Persons_Rescued                  11214 non-null  int64  
 4   Estimated_Dollar_Loss                     11214 non-null  int64  
 5   Estimated_Number_Of_Persons_Displaced     11214 non-null  int64  
 6   Ext_agent_app_or_defer_time               11214 non-null  object 
 7   Extent_Of_Fire                            11214 non-null  object 
 8   Fire_Alarm_System_Impact_on_Evacuation    11214 non-null  object 
 9   Fire_Alarm_System_Operation               11214 non-null  object 
 10  Fire_Alarm_System_Presence        

In [88]:
# Change the column names so they are uniform

fire_df.columns = ['origin', 'busi_impact', 'num_casualties', 'num_rescued', 'est_loss', 
                   'est_displaced', 'ext_agent_app_or_defer_time', 'extent', 'fire_alarm_impact_on_evac',
                   'fire_alarm_sys_op', 'fire_alarm_presence', 'fire_under_control_time', 'ignition_source',
                   'incident_station_area', 'incident_ward', 'last_tfs_unit_clear_time', 'latitude',
                   'longitude', 'material_first_ignited', 'method_of_fire_control', 'possible_cause',
                   'property_use', 'smoke_alarm_at_fire_origin_alarm_failure',
                   'smoke_alarm_at_fire_origin_alarm_type', 'status_of_fire_on_arrival',
                   'alarm_time', 'arrival_time']
fire_df['alarm_time'] = pd.to_datetime(fire.alarm_time)
fire_df['arrival_time'] = pd.to_datetime(fire.arrival_time)

### 2. Split the data

In [89]:
from sklearn.model_selection import train_test_split as tts

train_set, test_set = tts(fire_df, test_size=0.2, random_state=42)

In [90]:
train_set

Unnamed: 0,origin,busi_impact,num_casualties,num_rescued,est_loss,est_displaced,ext_agent_app_or_defer_time,extent,fire_alarm_impact_on_evac,fire_alarm_sys_op,...,longitude,material_first_ignited,method_of_fire_control,possible_cause,property_use,smoke_alarm_at_fire_origin_alarm_failure,smoke_alarm_at_fire_origin_alarm_type,status_of_fire_on_arrival,alarm_time,arrival_time
4743,"Trash, Rubbish Storage (inc garbage chute room...",No business interruption,0,0,10000,0,2014-03-27T08:15:00,Confined to object of origin,Not applicable: Occupant(s) first alerted by o...,Not applicable (no system),...,-79.352480,Undetermined (formerly 98),Extinguished by fire department,Mechanical Failure,Waste Transfer Station,Reason for inoperation undetermined,Not applicable - no smoke alarm or presence un...,Fire with smoke showing only - including vehic...,2014-03-27 08:02:56,2014-03-27 08:09:11
3790,"Trash, rubbish area (outside)",No business interruption,0,0,40000,0,2014-08-09T16:31:00,Confined to part of room/area of origin,"Not applicable: No fire alarm system, no perso...",Not applicable (no system),...,-79.472250,"Rubbish, Trash, Waste",Extinguished by fire department,Improperly Discarded,Detached Garage,Not applicable: Alarm operated OR presence/ope...,Not applicable - no smoke alarm or presence un...,Flames showing from small area (one storey or ...,2014-08-09 16:22:10,2014-08-09 16:29:07
9395,Concealed Ceiling Area,May resume operations within a week,0,0,10000,0,2014-02-27T01:51:00,Confined to part of room/area of origin,Undetermined,Fire alarm system operation undetermined,...,-79.370140,Insulation,Extinguished by fire department,Used or Placed too close to combustibles,Bank,Other reason,Type undetermined,Fire with smoke showing only - including vehic...,2014-02-27 01:35:00,2014-02-27 01:41:56
7440,"HVAC Equipment Room (furnace room, water heate...",No business interruption,0,0,2000,0,2017-11-07T20:42:15,Confined to object of origin,No one (at risk) evacuated as a result of hear...,Fire alarm system operated,...,-79.237790,Undetermined (formerly 98),Extinguished by fire department,Mechanical Failure,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Type undetermined,Fire with no evidence from street,2017-11-07 20:27:58,2017-11-07 20:32:52
4823,Locker (apartment storage),No business interruption,0,0,200,0,2014-04-03T17:21:00,Confined to part of room/area of origin,Some persons (at risk) evacuated as a result o...,Fire alarm system operated,...,-79.331240,Multiple diverse objects ignited,Extinguished by automatic system,"Other unintentional cause, not classified",Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Interconnected,Fire extinguished prior to arrival,2014-04-03 17:05:03,2014-04-03 17:11:34
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5734,"Trash, Rubbish Storage (inc garbage chute room...",No business interruption,0,0,0,0,2012-12-03T11:48:30,Confined to object of origin,All persons (at risk of injury) evacuated as a...,Fire alarm system operated,...,-79.526610,"Rubbish, Trash, Waste",Action taken unclassified,"Other unintentional cause, not classified",Multi-Unit Dwelling - 2 to 6 Units,Not applicable: Alarm operated OR presence/ope...,Hardwired (standalone),Fire extinguished prior to arrival,2012-12-03 11:43:26,2012-12-03 11:48:29
5191,"Trash, Rubbish Storage (inc garbage chute room...",No business interruption,0,0,500,0,2015-11-17T23:27:05,Confined to object of origin,Not applicable: Occupant(s) first alerted by o...,Fire alarm system operation undetermined,...,-79.172160,"Rubbish, Trash, Waste",Extinguished by fire department,"Other unintentional cause, not classified","Attached Dwelling (eg. rowhouse, townhouse, etc.)",Other reason,Not applicable - no smoke alarm or presence un...,Flames showing from small area (one storey or ...,2015-11-17 23:20:37,2015-11-17 23:25:32
5390,"Trash, rubbish area (outside)",No business interruption,0,0,3000,0,2017-03-11T14:26:59,Confined to roof/exterior structure,Undetermined,Fire alarm system operation undetermined,...,-79.398350,Plastic,Extinguished by fire department,Improperly Discarded,Attached Dwelling with Business,Not applicable: Alarm operated OR presence/ope...,Not applicable - no smoke alarm or presence un...,Fire with smoke showing only - including vehic...,2017-03-11 14:20:32,2017-03-11 14:25:09
860,Cooking Area or Kitchen,Not applicable (not a business),0,0,10000,4,2018-12-14T09:10:00,Confined to part of room/area of origin,All persons (at risk of injury) evacuated as a...,Fire alarm system operated,...,-79.516894,Multiple diverse objects ignited,Action taken unclassified,Electrical Failure,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Hardwired (standalone),Fire extinguished prior to arrival,2018-12-14 09:02:19,2018-12-14 09:04:35


In [91]:
test_set

Unnamed: 0,origin,busi_impact,num_casualties,num_rescued,est_loss,est_displaced,ext_agent_app_or_defer_time,extent,fire_alarm_impact_on_evac,fire_alarm_sys_op,...,longitude,material_first_ignited,method_of_fire_control,possible_cause,property_use,smoke_alarm_at_fire_origin_alarm_failure,smoke_alarm_at_fire_origin_alarm_type,status_of_fire_on_arrival,alarm_time,arrival_time
3703,"Washroom or Bathroom (toilet,restroom/locker r...",Not applicable (not a business),0,0,0,0,2011-10-26T07:13:00,Confined to object of origin,"Not applicable: No fire alarm system, no perso...",Not applicable (no system),...,-79.55728,"Fabric - Natural (eg. cotton, wool, etc.)",Extinguished by occupant,Design/Construction/Installation/Maintenance D...,Semi-Detached Dwelling,Remote from fire – smoke did not reach alarm,Battery operated,Fire extinguished prior to arrival,NaT,NaT
9792,Cooking Area or Kitchen,No business interruption,0,0,400000,0,2016-01-26T15:54:56,"Spread to other floors, confined to building",Undetermined,Fire alarm system operation undetermined,...,-79.47693,Undetermined (formerly 98),Extinguished by fire department,"Other unintentional cause, not classified",Detached Dwelling,Not applicable: Alarm operated OR presence/ope...,Type undetermined,Fire with smoke showing only - including vehic...,NaT,NaT
9821,Multiple Areas of Origin,May not resume operations,0,0,500000,0,2012-02-01T01:50:00,Entire Structure,Undetermined,Fire alarm system operation undetermined,...,-79.57588,Undetermined (formerly 98),Extinguished by fire department,Undetermined,Motor Vehicle Repair Garage,Not applicable: Alarm operated OR presence/ope...,Not applicable - no smoke alarm or presence un...,"Fully involved (total structure, vehicle, spre...",NaT,NaT
303,Cooking Area or Kitchen,Not applicable (not a business),0,1,75000,2,2012-01-11T11:20:00,Spread to entire room of origin,"Not applicable: No fire alarm system, no perso...",Not applicable (no system),...,-79.44345,Cabinetry,Extinguished by fire department,"Other unintentional cause, not classified",Detached Dwelling,Reason for inoperation undetermined,Type undetermined,Fire with smoke showing only - including vehic...,NaT,NaT
4862,Cooking Area or Kitchen,No business interruption,0,0,10000,2,2013-04-21T16:09:29,Confined to part of room/area of origin,Undetermined,Fire alarm system operated,...,-79.37150,"Books, Magazines, Newspapers",Extinguished by fire department,Unattended,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Interconnected,Fire with smoke showing only - including vehic...,NaT,NaT
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7298,"Trash, rubbish area (outside)",No business interruption,0,0,500,0,2016-11-15T00:41:26,Confined to part of room/area of origin,Undetermined,Fire alarm system operation undetermined,...,-79.39209,"Rubbish, Trash, Waste",Extinguished by fire department,Improperly Discarded,School - Pre-Elementary,Not applicable: Alarm operated OR presence/ope...,Not applicable - no smoke alarm or presence un...,Fire with smoke showing only - including vehic...,NaT,NaT
6010,"Ducting - Exhaust (inc cooking, fumes, etc.)",Undetermined,0,0,3000,10,2016-09-15T08:04:00,Confined to object of origin,Not applicable: Occupant(s) first alerted by o...,Not applicable (no system),...,-79.27104,Other,Extinguished by occupant,Design/Construction/Installation/Maintenance D...,"Mfg: Secondary Processing (eg finished goods, ...",Not applicable: Alarm operated OR presence/ope...,Not applicable - no smoke alarm or presence un...,Fire extinguished prior to arrival,NaT,NaT
6431,"Washroom or Bathroom (toilet,restroom/locker r...",No business interruption,0,0,5000,10,2015-06-12T03:55:00,Confined to part of room/area of origin,All persons (at risk of injury) evacuated as a...,Fire alarm system operated,...,-79.30174,Structural Member,Fire self extinguished,Design/Construction/Installation/Maintenance D...,Detached Dwelling,Not applicable: Alarm operated OR presence/ope...,Battery operated,Fire with smoke showing only - including vehic...,NaT,NaT
2629,Porch or Balcony,Not applicable (not a business),0,0,1500,0,2017-04-28T16:24:30,Confined to part of room/area of origin,"Not applicable: No fire alarm system, no perso...",Fire alarm system operated,...,-79.36330,Multiple diverse objects ignited,Extinguished by fire department,Improperly Discarded,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Type undetermined,Flames showing from small area (one storey or ...,NaT,NaT


### 3. Preprocess the data

In [92]:
fire = train_set.copy()

In [93]:
# Add response_time column

fire['response_time'] = (fire.arrival_time - fire.alarm_time).dt.total_seconds()
fire.head()

Unnamed: 0,origin,busi_impact,num_casualties,num_rescued,est_loss,est_displaced,ext_agent_app_or_defer_time,extent,fire_alarm_impact_on_evac,fire_alarm_sys_op,...,material_first_ignited,method_of_fire_control,possible_cause,property_use,smoke_alarm_at_fire_origin_alarm_failure,smoke_alarm_at_fire_origin_alarm_type,status_of_fire_on_arrival,alarm_time,arrival_time,response_time
4743,"Trash, Rubbish Storage (inc garbage chute room...",No business interruption,0,0,10000,0,2014-03-27T08:15:00,Confined to object of origin,Not applicable: Occupant(s) first alerted by o...,Not applicable (no system),...,Undetermined (formerly 98),Extinguished by fire department,Mechanical Failure,Waste Transfer Station,Reason for inoperation undetermined,Not applicable - no smoke alarm or presence un...,Fire with smoke showing only - including vehic...,2014-03-27 08:02:56,2014-03-27 08:09:11,375.0
3790,"Trash, rubbish area (outside)",No business interruption,0,0,40000,0,2014-08-09T16:31:00,Confined to part of room/area of origin,"Not applicable: No fire alarm system, no perso...",Not applicable (no system),...,"Rubbish, Trash, Waste",Extinguished by fire department,Improperly Discarded,Detached Garage,Not applicable: Alarm operated OR presence/ope...,Not applicable - no smoke alarm or presence un...,Flames showing from small area (one storey or ...,2014-08-09 16:22:10,2014-08-09 16:29:07,417.0
9395,Concealed Ceiling Area,May resume operations within a week,0,0,10000,0,2014-02-27T01:51:00,Confined to part of room/area of origin,Undetermined,Fire alarm system operation undetermined,...,Insulation,Extinguished by fire department,Used or Placed too close to combustibles,Bank,Other reason,Type undetermined,Fire with smoke showing only - including vehic...,2014-02-27 01:35:00,2014-02-27 01:41:56,416.0
7440,"HVAC Equipment Room (furnace room, water heate...",No business interruption,0,0,2000,0,2017-11-07T20:42:15,Confined to object of origin,No one (at risk) evacuated as a result of hear...,Fire alarm system operated,...,Undetermined (formerly 98),Extinguished by fire department,Mechanical Failure,Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Type undetermined,Fire with no evidence from street,2017-11-07 20:27:58,2017-11-07 20:32:52,294.0
4823,Locker (apartment storage),No business interruption,0,0,200,0,2014-04-03T17:21:00,Confined to part of room/area of origin,Some persons (at risk) evacuated as a result o...,Fire alarm system operated,...,Multiple diverse objects ignited,Extinguished by automatic system,"Other unintentional cause, not classified",Multi-Unit Dwelling - Over 12 Units,Not applicable: Alarm operated OR presence/ope...,Interconnected,Fire extinguished prior to arrival,2014-04-03 17:05:03,2014-04-03 17:11:34,391.0


In [94]:
# What do the correlations look like

fire_num = fire.select_dtypes(include=['number', 'datetime'])
fire_cat = fire.select_dtypes(include=['object'])

In [95]:
fire_num.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8971 entries, 4743 to 7270
Data columns (total 11 columns):
 #   Column                 Non-Null Count  Dtype         
---  ------                 --------------  -----         
 0   num_casualties         8971 non-null   int64         
 1   num_rescued            8971 non-null   int64         
 2   est_loss               8971 non-null   int64         
 3   est_displaced          8971 non-null   int64         
 4   incident_station_area  8971 non-null   int64         
 5   incident_ward          8942 non-null   float64       
 6   latitude               8971 non-null   float64       
 7   longitude              8971 non-null   float64       
 8   alarm_time             8971 non-null   datetime64[ns]
 9   arrival_time           8971 non-null   datetime64[ns]
 10  response_time          8971 non-null   float64       
dtypes: datetime64[ns](2), float64(4), int64(5)
memory usage: 841.0 KB


In [96]:
fire_cat.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8971 entries, 4743 to 7270
Data columns (total 17 columns):
 #   Column                                    Non-Null Count  Dtype 
---  ------                                    --------------  ----- 
 0   origin                                    8971 non-null   object
 1   busi_impact                               8971 non-null   object
 2   ext_agent_app_or_defer_time               8971 non-null   object
 3   extent                                    8971 non-null   object
 4   fire_alarm_impact_on_evac                 8971 non-null   object
 5   fire_alarm_sys_op                         8971 non-null   object
 6   fire_alarm_presence                       8971 non-null   object
 7   fire_under_control_time                   8971 non-null   object
 8   ignition_source                           8971 non-null   object
 9   last_tfs_unit_clear_time                  8971 non-null   object
 10  material_first_ignited                    8971 non

In [97]:
corr_matrix = fire_num.corr()
corr_matrix['est_loss'].sort_values(ascending=False)

est_loss                 1.000000
est_displaced            0.124613
arrival_time             0.023541
alarm_time               0.023541
num_rescued              0.022099
incident_station_area    0.018305
num_casualties           0.013342
response_time            0.004353
latitude                -0.001024
longitude               -0.012278
incident_ward           -0.018928
Name: est_loss, dtype: float64

In [98]:
# Add new features
fire_num['displaced_by_time'] = fire_num.est_displaced / (fire_num.response_time)
fire_num

Unnamed: 0,num_casualties,num_rescued,est_loss,est_displaced,incident_station_area,incident_ward,latitude,longitude,alarm_time,arrival_time,response_time,displaced_by_time
4743,0,0,10000,0,333,30.0,43.645310,-79.352480,2014-03-27 08:02:56,2014-03-27 08:09:11,375.0,0.000000
3790,0,0,40000,0,423,13.0,43.662240,-79.472250,2014-08-09 16:22:10,2014-08-09 16:29:07,417.0,0.000000
9395,0,0,10000,0,321,26.0,43.697050,-79.370140,2014-02-27 01:35:00,2014-02-27 01:41:56,416.0,0.000000
7440,0,0,2000,0,221,35.0,43.737820,-79.237790,2017-11-07 20:27:58,2017-11-07 20:32:52,294.0,0.000000
4823,0,0,200,0,125,26.0,43.710750,-79.331240,2014-04-03 17:05:03,2014-04-03 17:11:34,391.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...
5734,0,0,0,0,443,4.0,43.682330,-79.526610,2012-12-03 11:43:26,2012-12-03 11:48:29,303.0,0.000000
5191,0,0,500,0,234,44.0,43.769200,-79.172160,2015-11-17 23:20:37,2015-11-17 23:25:32,295.0,0.000000
5390,0,0,3000,0,311,22.0,43.679040,-79.398350,2017-03-11 14:20:32,2017-03-11 14:25:09,277.0,0.000000
860,0,0,10000,4,142,7.0,43.751816,-79.516894,2018-12-14 09:02:19,2018-12-14 09:04:35,136.0,0.029412


In [99]:
corr_matrix = fire_num.corr()
corr_matrix['est_loss'].sort_values(ascending=False)

est_loss                 1.000000
est_displaced            0.124613
displaced_by_time        0.108217
arrival_time             0.023541
alarm_time               0.023541
num_rescued              0.022099
incident_station_area    0.018305
num_casualties           0.013342
response_time            0.004353
latitude                -0.001024
longitude               -0.012278
incident_ward           -0.018928
Name: est_loss, dtype: float64

### 3b. Clean the data

In [100]:
fire_X = train_set.drop(columns=['est_loss'])
fire_y = train_set['est_loss']

In [101]:
fire_X.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8971 entries, 4743 to 7270
Data columns (total 26 columns):
 #   Column                                    Non-Null Count  Dtype         
---  ------                                    --------------  -----         
 0   origin                                    8971 non-null   object        
 1   busi_impact                               8971 non-null   object        
 2   num_casualties                            8971 non-null   int64         
 3   num_rescued                               8971 non-null   int64         
 4   est_displaced                             8971 non-null   int64         
 5   ext_agent_app_or_defer_time               8971 non-null   object        
 6   extent                                    8971 non-null   object        
 7   fire_alarm_impact_on_evac                 8971 non-null   object        
 8   fire_alarm_sys_op                         8971 non-null   object        
 9   fire_alarm_presence             

In [102]:
# We're going to fill the missing values of incident_ward with the median value

In [103]:
from sklearn.impute import SimpleImputer

imputer = SimpleImputer(strategy='median')

In [107]:
fire_X_num = fire_X.select_dtypes(include=['number'])

In [108]:
imputer.fit(fire_X_num)

In [109]:
imputer.statistics_

array([  0.        ,   0.        ,   0.        , 314.        ,
        19.        ,  43.69670845, -79.40365   ])

In [110]:
fire_X_num.median().values

array([  0.        ,   0.        ,   0.        , 314.        ,
        19.        ,  43.69670845, -79.40365   ])

In [111]:
X = imputer.transform(fire_X_num)

In [112]:
fire_X_tr = pd.DataFrame(X, columns=fire_X_num.columns, index=fire_X_num.index)

In [113]:
fire_X_tr.info()

<class 'pandas.core.frame.DataFrame'>
Index: 8971 entries, 4743 to 7270
Data columns (total 7 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   num_casualties         8971 non-null   float64
 1   num_rescued            8971 non-null   float64
 2   est_displaced          8971 non-null   float64
 3   incident_station_area  8971 non-null   float64
 4   incident_ward          8971 non-null   float64
 5   latitude               8971 non-null   float64
 6   longitude              8971 non-null   float64
dtypes: float64(7)
memory usage: 560.7 KB


In [114]:
# Create a custom function to add attributes

from sklearn.base import BaseEstimator, TransformerMixin

displaced, alarm_time, arrival_time = 2, 24, 25

class CustomAttr(BaseEstimator, TransformerMixin):
    def __init__(self, add_response_time = True, add_displaced_by_time = True):
        self.add_response_time = add_response_time
        self.add_displaced_by_time = add_displaced_by_time

    def fit(self, X, y = None):
        return self

    def transform(self, X, y = None):
        response_time = 0
        ala_time = pd.to_datetime(X[:, alarm_time])
        arr_time = pd.to_datetime(X[:, arrival_time])
        if self.add_response_time:
            response_time = (arr_time - ala_time).total_seconds()
            return np.c_[X, ala_time, arr_time, response_time]
        else:
            return np.c_[X, ala_time, arr_time]
            
        if self.add_displaced_by_time:
            disp_by_time = X[:, displaced] / (response_time)
            return np.c_[X, disp_by_time]
        else:
            return np.c_[X]

attr_adder = CustomAttr(add_response_time=True, add_displaced_by_time=True)
fire_extra_attribs = attr_adder.transform(fire_X.values)

In [115]:
fire_extra_attribs

array([['Trash, Rubbish Storage (inc garbage chute room, garbage/industri',
        'No business interruption', 0, ..., 1395907376000000000,
        1395907751000000000, 375.0],
       ['Trash, rubbish area (outside)', 'No business interruption', 0,
        ..., 1407601330000000000, 1407601747000000000, 417.0],
       ['Concealed Ceiling Area', 'May resume operations within a week',
        0, ..., 1393464900000000000, 1393465316000000000, 416.0],
       ...,
       ['Trash, rubbish area (outside)', 'No business interruption', 0,
        ..., 1489242032000000000, 1489242309000000000, 277.0],
       ['Cooking Area or Kitchen', 'Not applicable (not a business)', 0,
        ..., 1544778139000000000, 1544778275000000000, 136.0],
       ['Porch or Balcony', 'Not applicable (not a business)', 0, ...,
        1467564280000000000, 1467564604000000000, 324.0]], dtype=object)

In [116]:
# Make a pipeline

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

num_pipeline = Pipeline([
    ('imputer', SimpleImputer(strategy='median')),
    ('attribs_adder', CustomAttr()),
    ('std_scaler', StandardScaler()),
])

fire_num_tr = num_pipeline.fit_transform(fire_X_num)

IndexError: index 24 is out of bounds for axis 1 with size 7

In [117]:
# Need to figure out a way to work with the object/datetime alarm_time and arrival_time to create a response_time column (float)
# At the moment, we can't pass datetime or object objects through the pipeline because of the imputer