# Erhan Asad Javed

## Research question/interests

### Emergency response

Faster emergency response times are critical in reducing fatalities and mitigating long-term injuries from road accidents. This research area focuses on identifying bottlenecks in emergency services to optimize resource allocation and improve outcomes. 

#### Key research questions
- How does the emergency response time correlate with the number of fatalities? 
- Are urban accidents associated with faster response times than rural ones? 
- How does traffic volume at the time of an accident affect emergency response times?
- What role does population density play in determining emergency response times?


## EDA

### Imports

In [44]:
import os

import altair as alt
import pandas as pd
from toolz.curried import pipe

# Create a new data transformer that stores the files in a directory
def json_dir(data, data_dir='altairdata'):
    os.makedirs(data_dir, exist_ok=True)
    return pipe(data, alt.to_json(filename=data_dir + '/{prefix}-{hash}.{extension}') )

# Register and enable the new transformer
alt.data_transformers.register('json_dir', json_dir)
alt.data_transformers.enable('json_dir')

# Handle large data sets (default shows only 5000)
# See here: https://altair-viz.github.io/user_guide/data_transformers.html
alt.data_transformers.disable_max_rows()

alt.renderers.enable('jupyterlab')


RendererRegistry.enable('jupyterlab')

### Loading in the data

In [45]:
accidents = pd.read_csv('../../data/raw/road_accident_dataset.csv')
accidents.head()

Unnamed: 0,Country,Year,Month,Day of Week,Time of Day,Urban/Rural,Road Type,Weather Conditions,Visibility Level,Number of Vehicles Involved,...,Number of Fatalities,Emergency Response Time,Traffic Volume,Road Condition,Accident Cause,Insurance Claims,Medical Cost,Economic Loss,Region,Population Density
0,USA,2002,October,Tuesday,Evening,Rural,Street,Windy,220.414651,1,...,2,58.62572,7412.75276,Wet,Weather,4,40499.856982,22072.878502,Europe,3866.273014
1,UK,2014,December,Saturday,Evening,Urban,Street,Windy,168.311358,3,...,1,58.04138,4458.62882,Snow-covered,Mechanical Failure,3,6486.600073,9534.399441,North America,2333.916224
2,USA,2012,July,Sunday,Afternoon,Urban,Highway,Snowy,341.286506,4,...,4,42.374452,9856.915064,Wet,Speeding,4,29164.412982,58009.145124,South America,4408.889129
3,UK,2017,May,Saturday,Evening,Urban,Main Road,Clear,489.384536,2,...,3,48.554014,4958.646267,Icy,Distracted Driving,3,25797.212566,20907.151302,Australia,2810.822423
4,Canada,2002,July,Tuesday,Afternoon,Rural,Highway,Rainy,348.34485,1,...,4,18.31825,3843.191463,Icy,Distracted Driving,8,15605.293921,13584.060759,South America,3883.645634


In [46]:
accidents.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 132000 entries, 0 to 131999
Data columns (total 30 columns):
 #   Column                       Non-Null Count   Dtype  
---  ------                       --------------   -----  
 0   Country                      132000 non-null  object 
 1   Year                         132000 non-null  int64  
 2   Month                        132000 non-null  object 
 3   Day of Week                  132000 non-null  object 
 4   Time of Day                  132000 non-null  object 
 5   Urban/Rural                  132000 non-null  object 
 6   Road Type                    132000 non-null  object 
 7   Weather Conditions           132000 non-null  object 
 8   Visibility Level             132000 non-null  float64
 9   Number of Vehicles Involved  132000 non-null  int64  
 10  Speed Limit                  132000 non-null  int64  
 11  Driver Age Group             132000 non-null  object 
 12  Driver Gender                132000 non-null  object 
 13 

In [47]:
print(f"The dataset has {accidents.shape[0]} rows and {accidents.shape[1]} columns.")

The dataset has 132000 rows and 30 columns.


In [48]:
histogram = alt.Chart(accidents).mark_bar().encode(
    alt.X('Emergency Response Time:Q', bin=alt.BinParams(maxbins=20), title='Emergency Response Time (minutes)'),
    alt.Y('count()', title='Frequency'),
    tooltip=['count()']
).properties(
    title='Distribution of Emergency Response Time',
    #width=600,
    #height=400
)

histogram_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('Emergency Response Time:Q', bin=alt.BinParams(maxbins=20), title='Emergency Response Time (minutes)'),
    alt.Y('count()', title='Frequency', scale=alt.Scale(zero=False)),
    tooltip=['count()']
).properties(
    title='Distribution of Emergency Response Time (Zoomed)',
    #width=600,
    #height=400
)

histogram | histogram_zoomed

<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


In [49]:
weather_bar_plot = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q'),
    alt.Y('Weather Conditions:N', sort='x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Weather Conditions',
    #width=600,
    #height=400
)

weather_bar_plot_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q', scale=alt.Scale(zero=False)),
    alt.Y('Weather Conditions:N', sort='x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Weather Conditions (Zoomed)',
    #width=600,
    #height=400
)

weather_bar_plot | weather_bar_plot_zoomed

<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


In [50]:
fatality_bar_plot = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q'),
    alt.Y('Number of Fatalities:N', sort = 'x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Number of Fatalities',
    #width=600,
    #height=400
)

fatality_bar_plot_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q', scale=alt.Scale(zero=False)),
    alt.Y('Number of Fatalities:N', sort = 'x'),
    tooltip=['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Number of Fatalities',
    #width=600,
    #height=400
)

fatality_bar_plot | fatality_bar_plot_zoomed

<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


In [51]:
u_r_bar_plot = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q'),
    alt.Y('Urban/Rural:N', sort = 'x'),
    tooltip = ['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Urban/Rural',
    #width=600,
    #height=400
)

u_r_bar_plot_zoomed = alt.Chart(accidents).mark_bar().encode(
    alt.X('average(Emergency Response Time):Q', scale=alt.Scale(zero=False)),
    alt.Y('Urban/Rural:N', sort = 'x'),
    tooltip = ['average(Emergency Response Time):Q']
).properties(
    title='Emergency Response Time vs. Urban/Rural (Zoomed)',
    #width=600,
    #height=400
)

u_r_bar_plot | u_r_bar_plot_zoomed

<VegaLite 5 object>

If you see this message, it means the renderer has not been properly enabled
for the frontend that you are using. For more information, see
https://altair-viz.github.io/user_guide/display_frontends.html#troubleshooting


## Task Analysis

### **1. How does emergency response time correlate with the number of fatalities, and how do urban/rural settings and traffic volume influence this relationship?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Number of Fatalities`, `Urban/Rural`, and `Traffic Volume`.
- **Filter**: Filter data by `Urban/Rural` and group `Traffic Volume` into low, medium, and high.
- **Compute Derived Value**: Calculate correlation coefficients between response time and fatalities.
- **Correlate**: Analyze the relationship between response time and fatalities, stratified by urban/rural and traffic volume.
- **Characterize Distribution**: Visualize the distribution of response times and fatalities across urban/rural areas and traffic volume groups.
- **Find Anomalies**: Identify outliers in response times or fatalities.

---

### **2. How do weather conditions and road type interact with emergency response times and accident severity?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Accident Severity`, `Weather Conditions`, and `Road Type`.
- **Filter**: Filter data by `Weather Conditions` and `Road Type`.
- **Compute Derived Value**: Calculate average response times and median accident severity for each weather-road combination.
- **Correlate**: Analyze the relationship between weather, road type, response times, and accident severity.
- **Characterize Distribution**: Visualize the distribution of response times and severity across weather and road types.
- **Cluster**: Group similar weather-road combinations based on response times and severity.

---

### **3. How do population density and driver demographics (age group, gender) influence emergency response times and accident outcomes?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Accident Severity`, `Population Density`, `Driver Age Group`, and `Driver Gender`.
- **Filter**: Filter data by `Population Density` (low, medium, high) and `Driver Age Group`.
- **Compute Derived Value**: Calculate average response times and accident severity for each population density and age group.
- **Correlate**: Analyze the relationship between population density, driver demographics, response times, and accident severity.
- **Characterize Distribution**: Visualize the distribution of response times and severity across population density and driver demographics.
- **Find Extremum**: Identify the age group or population density with the highest response times or severity.

---

### **4. What is the combined impact of traffic volume, road conditions, and time of day on emergency response times and accident severity?**
- **Retrieve Value**: Extract `Emergency Response Time`, `Accident Severity`, `Traffic Volume`, `Road Condition`, and `Time of Day`.
- **Filter**: Filter data by `Time of Day` and `Road Condition`.
- **Compute Derived Value**: Calculate average response times and severity for each time-road combination.
- **Correlate**: Analyze the relationship between traffic volume, road conditions, time of day, response times, and severity.
- **Characterize Distribution**: Visualize the distribution of response times and severity across traffic volume, road conditions, and time of day.
- **Cluster**: Group similar time-road-traffic combinations based on response times and severity.

---

### **Summary of Low-Level Tasks**
| **Research Question**                                                                 | **Low-Level Tasks**                                                                 |
|--------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| How does emergency response time correlate with fatalities, considering urban/rural and traffic volume? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Find Anomalies |
| How do weather conditions and road type interact with response times and accident severity? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Cluster |
| How do population density and driver demographics influence response times and accident outcomes? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Find Extremum |
| What is the combined impact of traffic volume, road conditions, and time of day on response times and severity? | Retrieve Value, Filter, Compute Derived Value, Correlate, Characterize Distribution, Cluster |