# Technical Report on the Response Time of LAFD Emergency Reports in LA County

## Introduction:

Back in early 2012, The Los Angeles Fire Department admitted to the [Los Angeles Times](https://latimesblogs.latimes.com/lanow/2012/03/la-fire-department-admits-exaggerating-response-times.html) that they exaggerated the data to make it appear that firefighters were responding to emergencies quicker than they actually were. The false data reported that in 2008 the department responded to emergencies within five minutes about 86 percent of the time. It turns out the LAFD categorized 6 minute responses the same as 5 minute responses. Apparently the corrected data showed that the department actually met the five-minute benchmark only 64 percent of the time in 2008. In the LA Times report mentioned above, it mentions the "Federal guidelines call for first responders to arrive on scene in under five minutes 90% of the time."

Response times can sometimes be a factor in the difference between life and death. There have been times when a person has died while waiting for the medical help to arrive. Reducing the wait time should be the top priority for the LAFD and the City officials; nevertheless in recent years, the response time seems to have increased. Through this project we hope to discover if certain regions in LA play any factor in the response time from the LAFD. 

The findings in this project can be utilized by LA County residents, who can now have a better understanding on LAFD's potential response time in their neighborhood council region based on certain factors. The City of LA can also use this to determine what areas/factors need evaluation and/or improvement:
    * Idle ambulances can be put back into service
    * Upgrade of technical equipment
    * More efficient system for fire-fighters to response to calls requiring full PPE & fire-use equipments
 

*We also want to end on the note that we have great appreciation to the dedication and
commitment of all emergency responders in the Los Angeles County and we simply hope that this report can be of some use for the improvement of their service to all residents in LA.*  



## Problem Statement:

We will use two datas from the Los Angeles City [website](https://data.lacity.org/A-Safe-City/All-Stations-Response-Metrics/kszm-sdw4), LAFD's response metric and Census Data of Los Angeles neighborhoods, in order to explore the emergency metrics and census data for the LA neighborhood councils. The main objectives for this project are listed as follows:

**1) We want to analyze if certain locations (N.Cs/ regions) within the Los Angeles County has any affect on the time it takes for the LAFD respondents to arrive at the scene. In this project we strive to investigate whether certain factors such as, the poverty rate, majority ethnic group, or the region of a location in LA has any correlation to the average response time from the LAFD.**

**2) We will build a regression model to predict the response time of the LAFD when it comes to EMS reports given certain information from a particular location in the County of Los Angeles. Model performance will be guided by RMSE and the model test score.** 
 
 
 
### Dataset Collection:

LA CITY has available a variety of datasets. The 2 main sets we will use throughout the project are the LAFD Response Metric Raw Data and the Census Data by N.C. The data dictionary pdf can be found in the website below under "attachment"

[Source](https://data.lacity.org/A-Safe-City/LAFD-Response-Metrics-Raw-Data/n44u-wxe4)
* LAFD Response Metrics - Raw Data
    - Data last updated in April 8, 2019
    
[Source](https://data.lacity.org/A-Safe-City/All-Stations-Response-Metrics/kszm-sdw4)
* Archived response Time metrics for all LAFD stations
    - Contains all stations response metrics from 2011 to 2017
    
[Source](https://data.lacity.org/A-Safe-City/Fire-Stations/sfzi-8n8k)
* LA County Fire Station locations (geo-spatial data)
    - Data last updated in November 2016
    
[Source](https://data.lacity.org/A-Livable-and-Sustainable-City/Census-Data-by-Neighborhood-Council/nwj3-ufba)
* Census 2010 population/demographic data approximated from block groups to 97 LA Neighborhood Councils
    - Data last updated in April 2018

[Source](https://data.lacity.org/A-Well-Run-City/Neighborhood-Councils-Certified-/fu65-dz2f)
* Neighborhood Councils -Certified (geo-spatial data)
    - Official Certified Neighborhood Council boundaries in the City of Los Angeles created and maintained by the Bureau of Engineering /       GIS Mapping Division.
    - Data last updated in November 2017


The LAFD response metric raw data contains 4.7 million rows of LAFD incident reports. We collected 4 million rows of data as a JSON file and then converted into a Panda DataFrame with the help of Socrata. 


## Prior Cleaning Breakdown:

Prior to doing any manual data cleaning in any of our datasets, we must first determine which Neighborhood Councils and fire-station departments we will be working with in order to be able to effectively merge different datasets together. Unforunately the LAFD Raw Response Metric data did not provide the address of the incident as one of its' columns (privacy law). However, it did provide a "First In District" column. In the description, it states that it is "the location where the incident occurred in terms of a Fire Station district. The area where a particular fire station responds as well as where the incident occurred." Moving forth with the assumption that the incident is most likely in the same region as the responding fire-station department, we then needed to find the boundaries of the Neighborhood Councils. 

We needed to make sure the number listed in the fire-station column aligns with the number in the fire-station column of all our other dataset that we will be using throughout this project. Once we found out it was related, we then ploted out the fire-stations that are within the boundaries of the Neighborhood Councils in our datasets

<img src="./images/nc_color.png" alt="N.C" width="800"/>

*Created via Tableau*
<img src="./images/label_1.png" alt="Fire Stations in NC" width="200"/>
<img src="./images/label_2.png" alt="Fire Stations in NC" width="200"/>
<img src="./images/label_3.png" alt="Fire Stations in NC" width="200"/>

These labeled fire-stations and its' corresponding N.C names are what we considered throughout the scope of the project.

### Dataset Cleaning:

**The LAFD Raw Metric Null Values:**
en_route_time_gmt             110020
first_in_district                425
incident_creation_time_gmt         0
on_scene_time_gmt             573804

We removed all null values at the very start. Our entire project relies on the appropriate response times of each incident, if the on-scene time and/or the en-route time is missing then we won't have the target time available for modeling. 

Also around 190,000 incidents received a response from another fire-station department that is outside our range of Neighborhood Councils. As a result, we dropped those rows.


**Time metrics:**
We converted the string type time metrics provided by the dataset into these 4 features. These features are much more readible and helpful for the computer when it comes to understanding the response time from each incident.

***Response Time:***
The time interval that starts when first contact is made (either through 911 or the fire dispatch center) and ends when the first respondents arrives on-scene. ***(turnout time + travel time)***
 
***Call Processing Time:***
The time interval that starts when the call‐taker is presented with the call and creates the incident in CAD and ends when a fire station is notified and provided dispatch instructions via the LAFD’s Fire Station Alerting System. ***(dispatch time - incident creation time)***
 
***Turnout Time:***
The time interval between the time of dispatch and the en-route (wheels rolling) time.  Both station alarm and en-route times are required to measure this for each unit that responds.  Turnout time is calculated for each unit dispatched to each incident. ***(en route time - dispatch time)***
 
***Travel Time:***
The time interval that begins when the respondent is en-route to the incident and ends upon arrival on scene.  This requires one valid en-route time and one valid on-scene time for the incident.  Travel time can differ considerably amongst stations. ***(on scene time - en route time)***

    * According to the LAFD website many factors may impact travel time:
         * traffic, topography, road width, public events and unspecified incident locations.

**Year & Quarter features**
Using the 'Randomized Incident Number' feature which provides the year and quarter of the incident in the first 6 digits and then followed by another 6 random numbers (privacy law), we were able to add the year and quarter feature to the dataset.

## LAFD Raw Response Metrics Data Exploration:

In the beginning of the data exploration, we focused on the LAFD Raw Response Metrics.

![](./images/total_requests_nc.png)

According to this data, Downtown LA receives the most amount of LAFD emergency calls based on the timeline of this data (2013 to 2017). This makes sense as the population density in DTLA is most likely higher than any other city region within the LA County. A lot of the Neighborhood Councils reported among the top 10 most LAFD emergency calls are from "less-affluent" neighborhoods in LA. On the other hand, the least amount of LAFD emergency calls are from the area of Bel-Air Beverly. Compared to DTLA, Bel-Air is a much more affluent neighborhood in LA and the population density is much smaller. Most of the lower emergency incident locations are in the San Fernando Valley area or North East LA area.


### Average wait time for LAFD arrival:

***avg_travel_time_ems responded in less than 5 mins: 88.54 %***

***avg_travel_time_non_ems responded in less than 5 mins: 81.89 %***

After the 2008 scandal, the LAFD and Mayor Eric Garcetti both made a pledge to speed up Fire Department responses to emergencies in order to meet Federal guidelines consistently. However, the result states otherwise. Based on the plots below, we can see that the average travel time by LAFD first-unit respondents has increased from the year of 2013 to 2017. The average travel time for non-EMS related incidents in the year of 2015 has decreased a bit from the year before; however, the very next year it increased. On the other hand, the average travel time for EMS-related incidents has consistently increased throughout the years. From 2013 the average travel time increased by nearly half a minute in 2017, this is a 10% increase in time. 

At this point, we should consider looking into the reasons why response times are increasing instead of decreasing. 
* Perhaps the distance in some parts of the Los Angeles County make it extremely difficult for fire-fighters to arrive at the location quicker? 
* Perhaps there is some inefficiency when it comes to allocation of personnel and equipment? 
* Perhaps the closest fire-station to the incident location is in another call?

![](./images/avg_travel_time_ems.png)
![](./images/avg_travel_time_non_ems.png)

Both the average travel time for EMS and NON-EMS fall short of the Federal guidelines of meeting emergency calls in under 5 minutes 90 percent of the times. According to the data-dictionary, all the data are emergency related calls. EMS category in this case, includes incident types that require minimum personal protective equipments (PPE) and a turnout time of 60 seconds. The majority of the incidents reported in this dataset are medical in nature and as a result do not require protective equipments for the fire-fighters. The NON-EMS category are incidents that require full PPE and a turnout time of 80 seconds. Majority of these incidents require fire tools and equipments. 

Based on the following information, it makes sense that the average travel time for NON-EMS calls are longer in nature; as it requires more preparation. However for that very reason, we must find ways to eliminate the processing time for response times, especially the NON-EMS related calls as those are more of a priority in many cases. 


### Total Number of Incidents reported by the year:

The number of LAFD emergency calls have been increasing steadily the past few years. As for the quarters, the number of emergency responses are relatively stable across the years. There is a bit of a decline in the second quarter (April-June) compared to the other quarters. 

![](./images/lafd_total_incidents_year.png)
![](./images/lafd_total_incidents_quarter.png)


### Total Number of Incidents reported by region segments:

We can see that North East, East side, and Harbor of the LA regions have the lowest number of LAFD emergency reports. On the other hand, San Fernando Valley and South Central have significantly higher number of incidents reported. Granted, the number of neighborhood councils within the SF valley and South Central regions are much more than the number of neighborhood councils within the East Side and North East regions.

![](./images/total_requests_regions.png)

#### Total No. of Incidents in West side/Hollywood regions (affluent areas) VS South Central region (non-affluent areas):

Unfortunately the data did not provide any information that combined all these individual Neighborhood Councils together in some relatable aspect. As a result, we manually grouped certain Neighborhood Councils into particular regions to gain a bit more insight overall. We referred to this [website](https://wikitravel.org/en/Los_Angeles_County) for potential regions in the County of Los Angeles, and we looked at the boundaries of each Neighborhood Councils in our dataset and placed them in its' appropriate regions. 

![](./images/nc_into_regions_list.png)

Overall, there is a higher number of incidents reported in the South Central regions than their more-affluent Hollywood and West side counterparts. From early 2013 to the beginning of 2016, both regions steadily increased the number of emergency LAFD incidents; however, as it continued to increase in the Hollywood and West side region of LA, it droppped a bit in the South Central region.

![](./images/total_request_West_vs_SCentral.png)


**Average time it takes for the LAFD respondents to arrive on-site from the moment a fire-station is notifed:**
* **Hollywood - 6 minutes & 3 seconds**
* **West Side - 6 minutes & 38 seconds**
* **South Central - 5 minutes & 51 seconds**



## Predicting Models

At this stage, we utilized the data in order to create a model that can best predict the response time of the LAFD respondents using the census metrics and a few LAFD Response metrics.


These are the columns we selected as potential features that may impact the response time of the LAFD respondents to an emergency call in  a particular region of the LA County.

| Features |  Data Type  |  Description |
| ---   |  ---   |  ---        |
| **dispatch_status** | **Object** | The status of the responding unit at the time of dispatch. For example, status “QTR” means a unit responded from quarters, “RAD” means a unit responded from a radio call in and was not in quarters at the time, “AVI” means the unit is available, typically when released from an incident, and “ONS” means the unit is on-scene. |
| **first_in_district** | **Int** | The location where the incident occurred in terms of a Fire Station district. The area where a particular fire station responds as well as where the incident occurred. |
| **ppe_level** | **Object** | EMS category includes incident types that require minimum PPE* and a Turnout Time of 60 seconds. The majority of these incidents are medical in nature and do not require fire suppression tools and equipment to mediate. The NON-EMS category includes incidents that require full PPE* and a Turnout Time of 80 seconds. The majority of these incidents require fire suppression tools and equipment to mediate and may result in patients that require medical evaluation and treatment. *Personal Protective Equipment* |
| **unit_type** | **Object** | The type of responding unit. |
| **total_pop** | **Float** | Total population of residents in the neighborhood councils |
| **nc_name** | **Object** | Los Angeles County Neighborhood Councils |
| **call_process_time** | **Int** | The time interval that starts when the call is created in CAD by a Fire Dispatcher until the initial Fire or EMS3 unit is dispatched |
| **turnout_time** | **Int** | The time interval between the activation of station alerting devices to when the first responders put on their PPE4 and are aboard apparatus and en-route (wheels rolling).  Both station alarm and en-route times are required to measure this for each unit that responds.  Turnout time is calculated for each unit dispatched to each incident |
| **response_time** | **Int** | The time interval that begins when the first Standard Unit is en-route to the incident and ends upon arrival of any of the Standard Units first on scene.  This requires one valid en-route time and one valid on-scene time for the incident.  Travel time can differ considerably amongst stations |
| **year** | **Int** | Year the incident was reported |
| **quarter** | **Object** | Quarter of the year the incident was reported |
| **regions** | **Object** | Region of the incident |
| **white_perc** | **Float** | White population percentage |
| **black_perc** | **Float** | Black population percentage |
| **asian_perc** | **Float** | Asian population percentage |
| **hawn_pi_perc** | **Float** | Hawaiian population percentage |
| **other_perc** | **Float** | Other population percentage |
| **multi_perc** | **Float** | Multi population percentage |
| **poverty_perc** | **Float** | Poverty population percentage |
| **owner_occ_perc** | **Float** | Owner occupants population percentage |
| **renter_occ_perc** | **Float** | Renter occupants population percentage |


All models will be scored according to the RMSE and Model scores (R2). RMSE and R2 scores the model performance by examining the magnitiude of the model's residual errors (the difference between the actual response time to the mean response time.

#### BASELINE ACCURACY:

**Root Mean Squared Error: 221.671 ; R2: 0.0**

Root Mean Squared Error: RMSE is 233.9, which is significant large given our dataset. This metric represents the average distance from the actual response time (seconds) to the mean (predicted response time in seconds). We need to minimize this score in order to have an accurate model.

R2 score: The baseline model does not explain any of the target data's variability around its' mean. The predictions are inaccurate. The goal is to increase the R2 score as close to 1 (100%) as possible so that the variability in the data can be explained by the model.

### Linear Regression Model:

Train R2 Score: 0.16100812885771
Test R2 Score: 0.15931963671763205

![](./images/Lin_reg_important_feat.png)



### Random Forest Regression Model: 

The Linear Regression model provided us a terrible accuracy score for predicting response times. We attempted to improve the accuracy by implementing the Random Forest Regression model using the same training and testing set.

Training Score: 0.609
Testing Score: 0.057

The training R2 score jumped significantly, somewhat expected from a Random Forest model. However the testing R2 score decreased even further suggesting extreme over-fitting.

### XGBoost Regressor Model:

Training Score: 0.267
Testing Score: 0.230
RMSE for XGBRegressor: 0.434

## Conclusion & Limitations & Improvements:

Unfortunately the predictive models did not produce favorable results. Based on our analytics, there seems to not be enough signal in the datasets we collected. 

Our limitations transpired from the lack of assessible census data which forced us to make a variety of assumptions. One of the primary assumptions we had to make was that the 'first-in-district' column for each of the emergency response by the LAFD rows correspond to the fire-station department in our fire-station geo-spatial dataset. Assuming that it was correct, we also made an assumption that the fire-department that first responded to the incident, must be near the same geographic location as where the incident took place. Afterwards we created a plot displaying the boundaries of the 97 Neighborhood Councils throughout the LA County and matched the fire-station geographic locations that are within the boundaries of the N.C. 

We were able to merge datasets with inferring the first_in_district number corresponds to the Neighborhood Council.

Another huge limitation was that we underestimated how computationally expensive running the models would be. We will consider moving the processings to the Cloud via AWS in order to run the GridSearch of finding the best hyperparamters of our respective models. 


