# Erhan Asad Javed

## Research question/interests

### Emergency response

Faster emergency response times are critical in reducing fatalities and mitigating long-term injuries from road accidents. This research area focuses on identifying bottlenecks in emergency services to optimize resource allocation and improve outcomes. 

#### Key research questions
- How does the emergency response time correlate with the number of fatalities? 
- Are urban accidents associated with faster response times than rural ones? 
- How does traffic volume at the time of an accident affect emergency response times?
- What role does population density play in determining emergency response times?


## EDA

### Imports

In [2]:
import os

import altair as alt
import pandas as pd
from toolz.curried import pipe

# Create a new data transformer that stores the files in a directory
def json_dir(data, data_dir='altairdata'):
    os.makedirs(data_dir, exist_ok=True)
    return pipe(data, alt.to_json(filename=data_dir + '/{prefix}-{hash}.{extension}') )

# Register and enable the new transformer
alt.data_transformers.register('json_dir', json_dir)
alt.data_transformers.enable('json_dir')

# Handle large data sets (default shows only 5000)
# See here: https://altair-viz.github.io/user_guide/data_transformers.html
alt.data_transformers.disable_max_rows()

alt.renderers.enable("jupyter", offline=True)


RendererRegistry.enable('jupyter')

### Loading in the data

In [3]:
accidents = pd.read_csv('../../data/raw/road_accident_dataset.csv')
accidents.head()

Unnamed: 0,Country,Year,Month,Day of Week,Time of Day,Urban/Rural,Road Type,Weather Conditions,Visibility Level,Number of Vehicles Involved,...,Number of Fatalities,Emergency Response Time,Traffic Volume,Road Condition,Accident Cause,Insurance Claims,Medical Cost,Economic Loss,Region,Population Density
0,USA,2002,October,Tuesday,Evening,Rural,Street,Windy,220.414651,1,...,2,58.62572,7412.75276,Wet,Weather,4,40499.856982,22072.878502,Europe,3866.273014
1,UK,2014,December,Saturday,Evening,Urban,Street,Windy,168.311358,3,...,1,58.04138,4458.62882,Snow-covered,Mechanical Failure,3,6486.600073,9534.399441,North America,2333.916224
2,USA,2012,July,Sunday,Afternoon,Urban,Highway,Snowy,341.286506,4,...,4,42.374452,9856.915064,Wet,Speeding,4,29164.412982,58009.145124,South America,4408.889129
3,UK,2017,May,Saturday,Evening,Urban,Main Road,Clear,489.384536,2,...,3,48.554014,4958.646267,Icy,Distracted Driving,3,25797.212566,20907.151302,Australia,2810.822423
4,Canada,2002,July,Tuesday,Afternoon,Rural,Highway,Rainy,348.34485,1,...,4,18.31825,3843.191463,Icy,Distracted Driving,8,15605.293921,13584.060759,South America,3883.645634


In [4]:
accidents.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132000 entries, 0 to 131999
Data columns (total 30 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   Country                      132000 non-null  object 
 1   Year                         132000 non-null  int64  
 2   Month                        132000 non-null  object 
 3   Day of Week                  132000 non-null  object 
 4   Time of Day                  132000 non-null  object 
 5   Urban/Rural                  132000 non-null  object 
 6   Road Type                    132000 non-null  object 
 7   Weather Conditions           132000 non-null  object 
 8   Visibility Level             132000 non-null  float64
 9   Number of Vehicles Involved  132000 non-null  int64  
 10  Speed Limit                  132000 non-null  int64  
 11  Driver Age Group             132000 non-null  object 
 12  Driver Gender                132000 non-null  object 
 13 

In [5]:
accidents.describe()

Unnamed: 0,Year,Visibility Level,Number of Vehicles Involved,Speed Limit,Driver Alcohol Level,Driver Fatigue,Pedestrians Involved,Cyclists Involved,Number of Injuries,Number of Fatalities,Emergency Response Time,Traffic Volume,Insurance Claims,Medical Cost,Economic Loss,Population Density
count,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0,132000.0
mean,2011.973348,275.038776,2.501227,74.544068,0.125232,0.500576,1.000773,0.998356,9.508205,1.995439,32.491746,5041.929098,4.495621,25198.454901,50437.505615,2506.476223
std,7.198624,129.923625,1.117272,26.001448,0.072225,0.500002,0.816304,0.817764,5.774366,1.412974,15.889537,2860.671611,2.867347,14274.771691,28584.290822,1440.646352
min,2000.0,50.001928,1.0,30.0,2e-06,0.0,0.0,0.0,0.0,0.0,5.000177,100.062626,0.0,500.11009,1000.335085,10.002669
25%,2006.0,162.33886,2.0,52.0,0.06263,0.0,0.0,0.0,5.0,1.0,18.732879,2560.601299,2.0,12836.933596,25692.817343,1258.158299
50%,2012.0,274.67299,3.0,74.0,0.125468,1.0,1.0,1.0,9.0,2.0,32.534944,5037.909855,4.0,25188.202669,50395.499874,2506.203333
75%,2018.0,388.014111,3.0,97.0,0.187876,1.0,2.0,2.0,15.0,3.0,46.289527,7524.638162,7.0,37529.024899,75186.626093,3756.65295
max,2024.0,499.999646,4.0,119.0,0.249999,1.0,2.0,2.0,19.0,4.0,59.999588,9999.997468,9.0,49999.93013,99999.622968,4999.991745


In [6]:
print(f"The dataset has {accidents.shape[0]} rows and {accidents.shape[1]} columns.")

The dataset has 132000 rows and 30 columns.


### The `Emergency Response Time` variable

In [7]:
histogram = alt.Chart(accidents).mark_bar().encode(
    alt.X('Emergency Response Time:Q', bin=alt.BinParams(maxbins=20), title='Emergency Response Time (minutes)'),
    alt.Y('count()', title='Frequency'),
    tooltip=['count()']
).properties(
    title='Distribution of Emergency Response Time',
    #width=600,
    #height=400
)

histogram_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('Emergency Response Time:Q', bin=alt.BinParams(maxbins=20), title='Emergency Response Time (minutes)'),
    alt.Y('count()', title='Frequency', scale=alt.Scale(zero=False)),
    tooltip=['count()']
).properties(
    title='Distribution of Emergency Response Time (Zoomed)',
    #width=600,
    #height=400
)

boxplot = alt.Chart(accidents).mark_boxplot().encode(
    alt.X("Emergency Response Time:Q")
).properties(
    title='Boxplot of Emergency Response Time',
)

histogram | histogram_zoomed | boxplot

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'hconcat': [{'mark':…

We see here that the distribution of emergency response times are quite uniform, and the mean is at 32.53. The scale on the zoomed in version is worth noting, and thus not sufficient enough to establish a clear pattern except for unform.

Let's look at if and how the variable is influenced by other attributes in the dataset.

In [8]:
weather_bar_plot = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q'),
    alt.Y('Weather Conditions:N', sort='x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Weather Conditions',
    #width=600,
    #height=400
)

weather_bar_plot_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q', scale=alt.Scale(zero=False)),
    alt.Y('Weather Conditions:N', sort='x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Weather Conditions (Zoomed)',
    #width=600,
    #height=400
)

weather_bar_plot | weather_bar_plot_zoomed

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'hconcat': [{'mark':…

Regardless of the weather condition, the average emergency response time appears to be very similar. The zoomed in verison indicates around a 0.2 increase in the emergency response time from foggy weather to clear weather, which may be shocking given that clear weather should allow for emergency services to reach quicker.

In [9]:
fatality_bar_plot = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q'),
    alt.Y('Number of Fatalities:N', sort = 'x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Number of Fatalities',
    #width=600,
    #height=400
)

fatality_bar_plot_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q', scale=alt.Scale(zero=False)),
    alt.Y('Number of Fatalities:N', sort = 'x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Number of Fatalities',
    #width=600,
    #height=400
)

fatality_bar_plot | fatality_bar_plot_zoomed

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'hconcat': [{'mark':…

Regardless of the number of fatalities, the average emergency response time appears to be very similar. The zoomed in version implies that the number of fatalities may be associated with a longer emergency response time, however by a very miniscule difference.

In [10]:
u_r_bar_plot = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q'),
    alt.Y('Urban/Rural:N', sort = 'x'),
    tooltip = ['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Urban/Rural',
    #width=600,
    #height=400
)

u_r_bar_plot_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q', scale=alt.Scale(zero=False)),
    alt.Y('Urban/Rural:N', sort = 'x'),
    tooltip = ['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Urban/Rural (Zoomed)',
    #width=600,
    #height=400
)

u_r_bar_plot | u_r_bar_plot_zoomed

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'hconcat': [{'mark':…

Again, there seems to be almost no difference in emergency response times in rural and urban areas. Perhaps exploring the pattern of distributions within each type of suburb may be worthwhile is finding underlying patterns.

In [11]:
heatmap1 = alt.Chart(accidents).mark_rect().encode(
    x=alt.X('Urban/Rural:N', title='Urban/Rural'),
    y=alt.Y('Number of Fatalities:N', title='Number of Fatalities'),
    color=alt.Color('average(Emergency Response Time):Q', scale=alt.Scale(scheme='blues')),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Correlation Heatmap of Urban/Rural and Number of Fatalities',
    width=500,
    height=400
)

heatmap1

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'data': {'url': 'alt…

The scale implies that the difference in emergency response times is quite small, and thus may not be enough to reach any conclusions. However, we do observe a patten of longer emergency response times being seen with a higher number of fatalities, especially in urban areas.

In [12]:
road_bar_plot = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q'),
    alt.Y('Road Condition:N', sort='x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Road Conditions',
    #width=600,
    #height=400
)

road_bar_plot_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q', scale=alt.Scale(zero=False)),
    alt.Y('Road Condition:N', sort='x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Road Conditions (Zoomed)',
    #width=600,
    #height=400
)

road_bar_plot | road_bar_plot_zoomed

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'hconcat': [{'mark':…

There is no noticeable difference in emergency response times depending on different road conditions, however one may see that in the zoomed in version, icy road conditions lead to longer emergency response times, although by a very small extent.

In [13]:
heatmap = alt.Chart(accidents).mark_rect().encode(
    x=alt.X('Road Condition:N', title='Road Conditions'),
    y=alt.Y('Weather Conditions:N', title='Weather Conditions'),
    color=alt.Color('average(Emergency Response Time):Q', scale=alt.Scale(scheme='blues')),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Correlation Heatmap of Road and Weather Conditions',
    width=500,
    height=400
)

heatmap

JupyterChart(spec={'config': {'view': {'continuousWidth': 300, 'continuousHeight': 300}}, 'data': {'url': 'alt…

There is no clear pattern regarding how the average emergency response time is affected by both weather conditions and road conditions. One may see that there is a (small) increase in response time when it is clear weather and dry roads, as well as windy weather and icy roads. More analysis is required to further unravel why these conflicting observations are seen.

## Task Analysis

### **1. How does emergency response time correlate with the number of fatalities, and how do urban/rural settings and traffic volume influence this relationship?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Number of Fatalities`, `Urban/Rural`, and `Traffic Volume`.
- **Filter**: Filter data by `Urban/Rural` and group `Traffic Volume` into low, medium, and high.
- **Compute Derived Value**: Calculate correlation coefficients between response time and fatalities.
- **Correlate**: Analyze the relationship between response time and fatalities, stratified by urban/rural and traffic volume.
- **Characterize Distribution**: Visualize the distribution of response times and fatalities across urban/rural areas and traffic volume groups.
- **Find Anomalies**: Identify outliers in response times or fatalities.

---

### **2. How do weather conditions and road type interact with emergency response times and accident severity?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Accident Severity`, `Weather Conditions`, and `Road Type`.
- **Filter**: Filter data by `Weather Conditions` and `Road Type`.
- **Compute Derived Value**: Calculate average response times and median accident severity for each weather-road combination.
- **Correlate**: Analyze the relationship between weather, road type, response times, and accident severity.
- **Characterize Distribution**: Visualize the distribution of response times and severity across weather and road types.
- **Cluster**: Group similar weather-road combinations based on response times and severity.

---

### **3. How do population density and driver demographics (age group, gender) influence emergency response times and accident outcomes?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Accident Severity`, `Population Density`, `Driver Age Group`, and `Driver Gender`.
- **Filter**: Filter data by `Population Density` (low, medium, high) and `Driver Age Group`.
- **Compute Derived Value**: Calculate average response times and accident severity for each population density and age group.
- **Correlate**: Analyze the relationship between population density, driver demographics, response times, and accident severity.
- **Characterize Distribution**: Visualize the distribution of response times and severity across population density and driver demographics.
- **Find Extremum**: Identify the age group or population density with the highest response times or severity.

---

### **4. What is the combined impact of traffic volume, road conditions, and time of day on emergency response times and accident severity?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Accident Severity`, `Traffic Volume`, `Road Condition`, and `Time of Day`.
- **Filter**: Filter data by `Time of Day` and `Road Condition`.
- **Compute Derived Value**: Calculate average response times and severity for each time-road combination.
- **Correlate**: Analyze the relationship between traffic volume, road conditions, time of day, response times, and severity.
- **Characterize Distribution**: Visualize the distribution of response times and severity across traffic volume, road conditions, and time of day.
- **Cluster**: Group similar time-road-traffic combinations based on response times and severity.

---

### **Summary of Low-Level Tasks**
| **Research Question**                                                                 | **Low-Level Tasks**                                                                 |
|--------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| How does emergency response time correlate with fatalities, considering urban/rural and traffic volume? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Find Anomalies |
| How do weather conditions and road type interact with response times and accident severity? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Cluster |
| How do population density and driver demographics influence response times and accident outcomes? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Find Extremum |
| What is the combined impact of traffic volume, road conditions, and time of day on response times and severity? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Cluster |