# Quantifying Dynamic Risk Factors for Vehicular Crashes

## Motivation
Briefly state the nature of your project and why you chose it. What specific question or goal did you try to address?

- show risks of injury and deaths based on four common driver errors
- Calculate probabilites of accident while driving poorly.

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import altair as alt
from altair import datum

## Data Sources
- where the datasets or API resources are located,
- what formats they returned/used,
- what were the important variables contained in them,
- how many records you used or retrieved (if using an API), and
- what time periods they covered (if there is a time element)

In [13]:
# Import data ##########################################################
df_vehicle = pd.read_csv("data/crss_2021_csv/vehicle.csv")
df_person = pd.read_csv("data/crss_2021_csv/person.csv")
df_accident = pd.read_csv("data/crss_2021_csv/accident.csv")

# Daniel's Topics: Fast driving, unbuckled driving
# relate to accident type, injury, and damage
# indicators: SPEEDREL, VSPD_LIM, TRAV_SP, CASENUM
# TODO look at accident dataframe and generate basic statistics

## Cleaning and Manipulation

- How specifically did you need to manipulate the data?
- How did you handle missing, incomplete, or incorrect data?
- How did you perform conversion or processing steps?
- What variables and steps did you use to join the two data resources to perform your data analysis?
- What challenges did you encounter and how did you solve them?

In [14]:
# Data cleaning and manipulation #######################################

# Replace non-applicable codes with NaN
# for plotting purposes
replace_vals_trav_speed = {997: np.NaN, 998: np.NaN, 999: np.NaN}
replace_vals_speed_lim = {98: np.NaN, 99: np.NaN}
replace_vals_age = {998: np.NaN, 999: np.NaN}

df_vehicle["TRAV_SP_1"] = df_vehicle["TRAV_SP"].replace(replace_vals_trav_speed)
df_vehicle["VSPD_LIM_1"] = df_vehicle["VSPD_LIM"].replace(replace_vals_speed_lim)
df_person["AGE_1"] = df_person["AGE"].replace(replace_vals_age)

# Merge vehicle and person df by CASENUM
df_merged = df_vehicle.merge(df_person, how="outer", on="CASENUM")
# Merge accident df by CASENUM
df_merged = df_merged.merge(df_accident, how="outer", on="CASENUM")

# Bin travel_speed
bins = np.arange(0, 105, 5)
bin_labels = np.arange(0, 105, 5)[1:]
df_merged["TRAV_SP_1_bins"] = pd.cut(
    df_merged["TRAV_SP_1"], bins=bins, labels=bin_labels
)

# TODO filter based on speed related crashes? df_vehicle['SPEEDREL']

In [15]:
# TODO addressed unknown values with imputed
# df_merged[['INJ_SEV','INJSEV_IM']].head(50)

Define imputed values. 
Impact of using imputed values p.141
Percentages of Unknown and Not Reported Values p.14

Used Indicators:

    1. TRAV_SP
        - TRAV_SP_1
        - TRAV_SP_1_bins

    2. VSPD_LIM
        - VSPD_LIM_1
        
    3. INJ_SEVNAME
        - (INJSEV_IM)

    4. EJECTIONNAME
        - (EJECTTION, EJECT_IM)
    
    5. REST_USENAME

## Problem Identification and General Statistics

show general statistics and trends in data
	age
	injuries
	death
	damage

car crash injury severity based on vehicle age, make, model, build

Probability based on relative incidences 
- Exmaple: 65 mph speed limiit 5x greater deaths than 40 mph speed limit


In [47]:
# 🚗

catdog = pd.DataFrame({
    'FSA': ['M1X', 'M5G', 'M4H', 'M5C', 'M5H'],
    'PropDogs': [76.35, 63.36,54.76, 20.10, 10.5]
})
catdog

catdog['PropCats']=100-catdog['PropDogs']
catdog=catdog.melt(id_vars ='FSA')
catdog

# transform scale of value to 1-10
catdog['value']=(catdog['value']/10)
# add emoji column
catdog['emoji'] =[{'PropCats': '🐈', 'PropDogs': '🐕'}[animal] *int(value) for animal,value in catdog[['variable','value']].values ]
catdog.head()

Unnamed: 0,FSA,variable,value,emoji
0,M1X,PropDogs,7.635,🐕🐕🐕🐕🐕🐕🐕
1,M5G,PropDogs,6.336,🐕🐕🐕🐕🐕🐕
2,M4H,PropDogs,5.476,🐕🐕🐕🐕🐕
3,M5C,PropDogs,2.01,🐕🐕
4,M5H,PropDogs,1.05,🐕


In [48]:
def bar_accident_region(df):

    columns = ['REGIONNAME']
    df = df[columns]
    # Normalize percent of accidents based on REGIONNAME, intersection type
    bar = alt.Chart(df, title='Percent of Accidents based on region name').transform_aggregate(
        count='count()',
        groupby=['REGIONNAME']
    ).transform_joinaggregate(
        total='sum(count)'
    ).transform_calculate(
        frac='datum.count / datum.total'
    ).mark_bar(size=10).encode(
        x=alt.X('REGIONNAME:N'),
        y=alt.Y('frac:Q', title='Region',axis=alt.Axis(format='%')),
    ).properties(
        width=1000,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)
    return bar

bar_accident_region(df_merged)

In [49]:
def bar_injury_sev(df):

    columns = ['INJSEV_IMNAME']
    df = df[columns]

    # Normalize percent of accidents based on INJURY
    bar = alt.Chart(df, title='Percent of Accidents based on Injury').transform_aggregate(
        count='count()',
        groupby=['INJSEV_IMNAME']
    ).transform_joinaggregate(
        total='sum(count)'
    ).transform_calculate(
        frac='datum.count / datum.total'
    ).mark_bar(size=10).encode(
        x=alt.X('INJSEV_IMNAME:N'),
        y=alt.Y('frac:Q', title='Percent of Accidents',axis=alt.Axis(format='%')),
    ).properties(
        width=1000,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)
    return bar

bar_injury_sev(df_merged)

In [50]:
def bar_intersection_type(df):

    columns = ['TYP_INTNAME']
    df = df[columns]

    # Normalize percent of accidents based on TYP_INTNAME, intersection type
    bar = alt.Chart(df, title='Percent of Accidents based on intersection').transform_aggregate(
        count='count()',
        groupby=['TYP_INTNAME']
    ).transform_joinaggregate(
        total='sum(count)'
    ).transform_calculate(
        frac='datum.count / datum.total'
    ).mark_bar(size=10).encode(
        x=alt.X('TYP_INTNAME:N'),
        y=alt.Y('frac:Q', title='Intersection Type',axis=alt.Axis(format='%')),
    ).properties(
        width=1000,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)
    return bar

bar_intersection_type(df_merged)

In [51]:
def bar_harmful_event(df):

    columns = ['HARM_EVNAME']
    df = df[columns]

    # Normalize percent of accidents based on HARM_EVNAME, intersection type
    bar = alt.Chart(df, title='Percent of Accidents based on harmful event').transform_aggregate(
        count='count()',
        groupby=['HARM_EVNAME']
    ).transform_joinaggregate(
        total='sum(count)'
    ).transform_calculate(
        frac='datum.count / datum.total'
    ).mark_bar(size=10).encode(
        x=alt.X('HARM_EVNAME:N'),
        y=alt.Y('frac:Q', title='Harmful Event',axis=alt.Axis(format='%'), scale=alt.Scale(type='log')),
    ).properties(
        width=1000,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)

    return bar

bar_harmful_event(df_merged)

In [52]:
def bar_accidents_and_junctions(df):

    columns = ['RELJCT2_IMNAME']
    df = df[columns]

    # Normalize percent of accidents based on RELJCT2_IMNAME, relation to junction, specific name
    bar = alt.Chart(df, title='Percent of Accidents based relation to junction').transform_aggregate(
        count='count()',
        groupby=['RELJCT2_IMNAME']
    ).transform_joinaggregate(
        total='sum(count)'
    ).transform_calculate(
        frac='datum.count / datum.total'
    ).mark_bar(size=10).encode(
        x=alt.X('RELJCT2_IMNAME:N'),
        y=alt.Y('frac:Q', title='Relation to Junction',axis=alt.Axis(format='%')),
    ).properties(
        width=1000,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)
    return bar

bar_accidents_and_junctions(df_merged)

## The Fast Driver Profile

All visualizations referenced and explained in the text. Visualizations are complete, including appropriate title, axis labels, etc. Visualizations are annotated appropriately (note: not all visualizations need annotations).

In [17]:
# travel speed vs speed limit ########################################################################
def point_plot(df):

    # Reduce size of dataframe by only including relevant columns
    columns=['TRAV_SP_1_bins', 'VSPD_LIM_1']
    df = df[columns]

    bars = alt.Chart(df).mark_point().encode(
    y=alt.Y('TRAV_SP_1_bins:Q',  axis=alt.Axis(values=list(range(0, 100, 5))), title='Driver Speed'),
    x=alt.X('VSPD_LIM_1:Q', axis=alt.Axis(values=list(range(0, 100, 5))), title='Speed Limit'),
    size=alt.Size('TRAV_SP_1_bins', aggregate='count')
    ).properties(
        width=300,
        height=300
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )
    return bars
        
point_plot(df_merged)

In [18]:
def heatmap_plot(df):
    
    # Reduce size of dataframe by only including relevant columns
    columns=['TRAV_SP_1', 'VSPD_LIM_1', 'TRAV_SP_1_bins']
    df = df[columns]

    heatmap = alt.Chart(df).mark_rect().encode(
        alt.Y('TRAV_SP_1:Q', title='Driver Speed').bin(maxbins=20),
        alt.X('VSPD_LIM_1:Q', title='Speed Limit').bin(maxbins=20),
        alt.Color('count():Q').scale(scheme='greenblue')
    ).transform_filter(
        (datum.TRAV_SP_1_bins <= 80)# & (datum.sex == 1)
    ).properties(
        width=300,
        height=300
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )
    return heatmap

heatmap_plot(df_merged)

In [19]:
# INJSEV_IM
# TODO use pandas cut,
# TODO pandas groupby, plot travel speed vs count(INJ_SEV),

# 0 No Apparent Injury (O)
# 1 Possible Injury (C)
# 2 Suspected Minor Injury (B)
# 3 Suspected Serious Injury (A)
# 4 Fatal Injury (K)
# 5 Injured, Severity Unknown (U)
# 6 Died Prior to Crash
# 9 Unknown/Not Reported

def point_plot_1(df):

    # Reduce size of dataframe by only including relevant columns
    columns=['TRAV_SP_1_bins', 'INJ_SEVNAME']
    df = df[columns]
    bars = alt.Chart(df).mark_point().encode(
        x=alt.X('TRAV_SP_1_bins:Q',  axis=alt.Axis(values=list(range(0, 100, 5))), title='Driver Speed'),
        y=alt.Y('INJ_SEVNAME:N', title='Injury Severity'),
        size=alt.Size('TRAV_SP_1_bins', aggregate='count')
    ).properties(
        width=500,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )
    return bars

point_plot_1(df_merged)

In [21]:
def injury_speed_dist_plot(df):
    
    # Reduce size of dataframe by only including relevant columns
    columns=['VSPD_LIM_1', 'INJ_SEVNAME', 'TRAV_SP_1']
    df = df[columns]

    speed_limit = alt.Chart(df).mark_area().encode(
        x=alt.X('VSPD_LIM_1:Q', title='Speed Limit (mph)'),
        y=alt.Y('count(VSPD_LIM_1):Q', title="",scale=alt.Scale(domain=[0,200])).scale(type="log"),
        color=alt.Color('INJ_SEVNAME:N', legend=None, title=''),
        row=alt.Row('INJ_SEVNAME:N',header=alt.Header(labelAngle=0, labelAlign='left'),title='COUNT') #.sort(['MSFT', 'AAPL', 'IBM', 'AMZN'])
    ).transform_filter(
        (datum.INJ_SEVNAME == 'No Apparent Injury (O)') | (datum.INJ_SEVNAME == 'Suspected Minor Injury (B)') |
        (datum.INJ_SEVNAME == 'Suspected Serious Injury (A)') | (datum.INJ_SEVNAME == 'Fatal Injury (K)') |
        (datum.INJ_SEVNAME == 'Unknown/Not Reported') | (datum.INJ_SEVNAME == 'Possible Injury (C)')

    ).properties(height=60, width=400
    )

    driver_speed = alt.Chart(df).mark_area().encode(
        x=alt.X('TRAV_SP_1:Q', title='Driver Speed (mph)'),
        y=alt.Y('count(TRAV_SP_1):Q', title="",scale=alt.Scale(domain=[0,200])).scale(type="log"),
        color=alt.Color('INJ_SEVNAME:N', legend=None, title=''),
        row=alt.Row('INJ_SEVNAME:N',header=None,title='') #.sort(['MSFT', 'AAPL', 'IBM', 'AMZN'])
    ).transform_filter(
        (datum.INJ_SEVNAME == 'No Apparent Injury (O)') | (datum.INJ_SEVNAME == 'Suspected Minor Injury (B)') |
        (datum.INJ_SEVNAME == 'Suspected Serious Injury (A)') | (datum.INJ_SEVNAME == 'Fatal Injury (K)') |
        (datum.INJ_SEVNAME == 'Unknown/Not Reported') | (datum.INJ_SEVNAME == 'Possible Injury (C)')

    ).properties(height=60, width=400
    )

    combined = (speed_limit | driver_speed).configure_axis(
        labelFontSize=12,
        titleFontSize=12
    )
    return combined

injury_speed_dist_plot(df_merged)

In [22]:
def bar_driver_speed_dist(df):

    # Reduce size of dataframe by only including relevant columns
    columns=['TRAV_SP_1']
    df = df[columns]

    # Normalize percent of accidents based on driver speed
    bar = alt.Chart(df, title='Percent of Accidents based on Driver Speed').transform_aggregate(
        count='count()',
        groupby=['TRAV_SP_1']
    ).transform_joinaggregate(
        total='sum(count)'
    ).transform_calculate(
        frac='datum.count / datum.total'
    ).mark_bar(size=10).encode(
        x=alt.X('TRAV_SP_1:Q'),
        y=alt.Y('frac:Q', title='Percent of Accidents',axis=alt.Axis(format='%')),
    ).properties(
        width=1000,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)
    return bar

bar_driver_speed_dist(df_merged)

In [23]:
def bar_speed_lim_dist(df):

    # Reduce size of dataframe by only including relevant columns
    columns=['VSPD_LIM_1']
    df = df[columns]

    # Normalize percent of accidents based on speed limit
    bar = alt.Chart(df, title='Percent of Accidents based on Speed Limit').transform_aggregate(
        count='count()',
        groupby=['VSPD_LIM_1']
    ).transform_joinaggregate(
        total='sum(count)'
    ).transform_calculate(
        frac='datum.count / datum.total'
    ).mark_bar(size=10).encode(
        x=alt.X('VSPD_LIM_1:Q'),
        y=alt.Y('frac:Q', title='Percent of Accidents',axis=alt.Axis(format='%')),
    ).properties(
        width=1000,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)
    return bar

bar_speed_lim_dist(df_merged)

In [25]:


def bars_injury_types_speed_lim(df):

    columns = ['VSPD_LIM_1', 'INJSEV_IMNAME']
    df = df[columns]

    #risk per speed limit
    height= 200
    width = 1000

    fatal = alt.Chart(df, title='Driver Fatalities vs Speed Limit').mark_bar(size=10).encode(
        x=alt.X('VSPD_LIM_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(INJSEV_IMNAME)', title='Frequency of Fatalities'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Fatal Injury (K)')
    ).properties(
        width=width,
        height=height
    )

    severe = alt.Chart(df, title='Severe Injury vs Speed Limit').mark_bar(size=10).encode(
        x=alt.X('VSPD_LIM_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(INJSEV_IMNAME)', title='Frequency of Severe Injuries'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Suspected Serious Injury (A)')
    ).properties(
        width=width,
        height=height
    )

    minor = alt.Chart(df, title='Minor Injury vs Speed Limit').mark_bar(size=10).encode(
        x=alt.X('VSPD_LIM_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(INJSEV_IMNAME)', title='Frequency of Minor Injuries'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Suspected Minor Injury (B)')
    ).properties(
        width=width,
        height=height
    )

    injury = alt.Chart(df, title='Injury vs Speed Limit').mark_bar(size=10).encode(
        x=alt.X('VSPD_LIM_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(INJSEV_IMNAME)', title='Frequency of Injury'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Possible Injury (C)')
    ).properties(
        width=width,
        height=height
    )

    no = alt.Chart(df, title='No Injury vs Speed Limit').mark_bar(size=10).encode(
        x=alt.X('VSPD_LIM_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(INJSEV_IMNAME)', title='Frequency of No Injury'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'No Apparent Injury (O)')
    ).properties(
        width=width,
        height=height
    )

    combined = (fatal & severe & minor & injury).resolve_scale(
        x='shared'
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)

    return combined

bars_injury_types_speed_lim(df_merged)

In [26]:

def bars_injury_types_driver_speed(df):

    columns = ['TRAV_SP_1', 'INJSEV_IMNAME']
    df = df[columns]

    # risk per driver speed
    height= 200
    width = 1000

    fatal = alt.Chart(df, title='Driver Fatalities vs Driver Speed').mark_bar(size=10).encode(
        x=alt.X('TRAV_SP_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(TRAV_SP_1)', title='Frequency of Fatalities'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Fatal Injury (K)')
    ).properties(
        width=width,
        height=height
    )

    severe = alt.Chart(df, title='Severe Injury vs Driver Speed').mark_bar(size=10).encode(
        x=alt.X('TRAV_SP_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(TRAV_SP_1)', title='Frequency of Severe Injuries'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Suspected Serious Injury (A)')
    ).properties(
        width=width,
        height=height
    )

    minor = alt.Chart(df, title='Minor Injury vs Driver Speed').mark_bar(size=10).encode(
        x=alt.X('TRAV_SP_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(TRAV_SP_1)', title='Frequency of Minor Injuries'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Suspected Minor Injury (B)')
    ).properties(
        width=width,
        height=height
    )

    injury = alt.Chart(df, title='Injury vs Driver Speed').mark_bar(size=10).encode(
        x=alt.X('TRAV_SP_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(TRAV_SP_1)', title='Frequency of Injury'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'Possible Injury (C)')
    ).properties(
        width=width,
        height=height
    )

    no = alt.Chart(df, title='No Injury vs Driver Speed').mark_bar(size=10).encode(
        x=alt.X('TRAV_SP_1',title='Speed Limit', axis=alt.Axis(values=np.arange(5,75,5))), #INJSEV_IMNAME
        y=alt.Y('count(TRAV_SP_1)', title='Frequency of No Injury'),
    ).transform_filter(
        (datum.INJSEV_IMNAME == 'No Apparent Injury (O)')
    ).properties(
        width=width,
        height=height
    )

    combined = (fatal & severe & minor & injury).resolve_scale(
        x='shared'
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    ).configure_title(fontSize=24)

    return combined

bars_injury_types_driver_speed(df_merged)

In [46]:
def circle_speed_lim_driver_speed(df):

    columns = ['TRAV_SP_1', 'VSPD_LIM_1', 'INJ_SEVNAME','AGE_1']
    df = df[columns]

    circle = alt.Chart(df).mark_circle().encode(
        alt.X('TRAV_SP_1').scale(zero=False),
        alt.Y('VSPD_LIM_1').scale(zero=False, padding=1),
        color='INJ_SEVNAME',
        size='AGE_1'
    ).properties(
        width=500,
        height=500
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )

    return circle

circle_speed_lim_driver_speed(df_merged)

## The Unbuckled Driver Profile

In [27]:
def bar_ejection(df):

    # Ejected EJECTIONNAME
    # Seatbelt REST_USENAME
    # travel speed TRAV_SP_1_bins
    # injury severity INJ_SEVNAME

    columns = ['EJECTIONNAME']
    df = df[columns]

    bar = alt.Chart(df).mark_bar().encode(
        x=alt.X('count(EJECTIONNAME):Q'),
        y=alt.Y('EJECTIONNAME:N', title='Ejection'),
    ).properties(
        width=1000,
        height=200
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )
    return bar

bar_ejection(df_merged)

In [28]:
def circle_ejection(df):

    columns = ['REST_USENAME', 'EJECT_IMNAME']
    df = df[columns]

    circle = alt.Chart(df).mark_circle().encode(
        x=alt.X('REST_USENAME:N'),
        y=alt.Y('EJECT_IMNAME:N', title='Ejection'),
        # size=alt.Size('count(EJECT_IMNAME)')
    ).properties(
        width=1000,
        height=200
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )
    return circle

circle_ejection(df_merged)

In [29]:
def circle_injury_ejection(df):

    columns = ['EJECTIONNAME', 'INJ_SEVNAME']
    df = df[columns]

    circle = alt.Chart(df).mark_circle().encode(
        x=alt.X('EJECTIONNAME:N'),
        y=alt.Y('INJ_SEVNAME:N', title='Ejection vs Injury Severity'),
        size=alt.Size('count(INJ_SEVNAME)')
    ).properties(
        width=1000,
        height=200
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )
    return circle

circle_injury_ejection(df_merged)

In [30]:
def circle_driver_speed_ejection(df):

    columns = ['TRAV_SP_1', 'EJECTIONNAME']
    df = df[columns]

    circle = alt.Chart(df).mark_circle().encode(
        x=alt.X('TRAV_SP_1:Q'),
        y=alt.Y('EJECTIONNAME:N', title='Ejection vs Driver speed')
    ).properties(
        width=1000,
        height=200
    ).configure_axis(
        labelFontSize=14,
        titleFontSize=14
    )
    return circle

circle_driver_speed_ejection(df_merged)


## Analysis

A key goal of this project is bringing together two different data resources to answer an interesting question or find a new insight that could not have been answered with either data resource alone (which you summarized in the previous section). 

- What interesting relationships or insights did you get from your analysis?
- What didn’t work, and why?

In [44]:
# correlation coefficients

# damage assessment 

# show location of damage for speeders 
# bucklers and non bucklers

# ADVISORIES 

# NMCC
# non motorist contributing circumstances

# weather

# visibility