# Airbags and Other Influences on Accident Fatalities
##### Description:
* US data, for 1997-2002, from police-reported car crashes in which there is a harmful event (people or property),<br> and from which at least one vehicle was towed. Data are restricted to front-seat occupants, include only a subset<br> of the variables recorded, and are restricted in other ways also.
##### dvcat
* Ordered factor with levels (estimated impact speeds) 1-9km/h, 10-24, 25-39, 40-54, 55+
##### weight
* Observation weights, albeit of uncertain accuracy, designed to account for varying sampling probabilities.
##### dead
* Factor with levels alive dead
##### airbag
* A factor with levels none airbag
##### seatbelt
* A factor with levels none belted
##### frontal
* A numeric vector; 0 = non-frontal, 1=frontal impact
##### sex
* A factor with levels f m
##### ageOFocc
* Age of occupant in years
##### yearacc
* Year of accident
##### yearVeh
* Year of model of vehicle; a numeric vector
##### abcat
* Did one or more (driver or passenger) airbag(s) deploy? This factor has levels deploy nodeploy unavail
##### occRole
* A factor with levels driver pass
##### deploy
* A numeric vector: 0 if an airbag was unavailable or did not deploy; 1 if one or more bags deployed.
##### injSeverity
* A numeric vector: <br>0=None, 1=Possible Injury, 2=No Incapacity, 3=Incapacity, 4=Killed, 5=Unknown, 6=Prior Death
##### caseid
* A character created by pasting together the populations sampling unit, the case number, and the vehicle number.<br> Within each year, use this to uniquely identify the vehicle.



# Import Dependencies / Machine Learning

In [48]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

# Import Dataset

In [49]:
# Import the dataset from Google Drive:
url = ('https://drive.google.com/file/d/1t3Z8Blgy2BPmBB4FqrQkC_jie9IwYuQb/view?usp=sharing')
path = 'https://drive.google.com/uc?export=download&id='+url.split('/')[-2]
crash_1 = pd.read_csv(path,index_col=0)
crash_1.head()

Unnamed: 0,dvcat,weight,dead,airbag,seatbelt,frontal,sex,ageOFocc,yearacc,yearVeh,abcat,occRole,deploy,injSeverity,caseid
1,25-39,25.069,alive,none,belted,1,f,26,1997,1990.0,unavail,driver,0,3.0,2:3:1
2,10-24,25.069,alive,airbag,belted,1,f,72,1997,1995.0,deploy,driver,1,1.0,2:3:2
3,10-24,32.379,alive,none,none,1,f,69,1997,1988.0,unavail,driver,0,4.0,2:5:1
4,25-39,495.444,alive,airbag,belted,1,f,53,1997,1995.0,deploy,driver,1,1.0,2:10:1
5,25-39,25.069,alive,none,belted,1,f,32,1997,1988.0,unavail,driver,0,3.0,2:11:1


# Initialize Dataset for Analysis

In [50]:
# Remove unneeded columns:
# (weight refers to a value of unknown significance or origin)
# (caseid numbers are not individual accident identifiers, numerous indicents assigned to single id's)
crash_2 = crash_1.drop(['weight','caseid'], axis=1)

# Rename the columns so they are easier to understand:
crash_2.rename(columns={'dvcat':'est_impact_kmh',
                         'dead':'occupant_status',
                         'airbag':'airbag_available',
                         'frontal':'front_impact',
                         'ageOFocc':'occupant_age',
                         'yearacc':'accident_year',
                         'yearVeh':'vehicle_year',
                         'abcat':'airbag_deployment',
                         'occRole':'occupant_role',
                         'deploy':'deployment',
                         'injSeverity':'injury_severity'},inplace=True)

crash_2

Unnamed: 0,est_impact_kmh,occupant_status,airbag_available,seatbelt,front_impact,sex,occupant_age,accident_year,vehicle_year,airbag_deployment,occupant_role,deployment,injury_severity
1,25-39,alive,none,belted,1,f,26,1997,1990.0,unavail,driver,0,3.0
2,10-24,alive,airbag,belted,1,f,72,1997,1995.0,deploy,driver,1,1.0
3,10-24,alive,none,none,1,f,69,1997,1988.0,unavail,driver,0,4.0
4,25-39,alive,airbag,belted,1,f,53,1997,1995.0,deploy,driver,1,1.0
5,25-39,alive,none,belted,1,f,32,1997,1988.0,unavail,driver,0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
26213,25-39,alive,none,belted,1,m,17,2002,1985.0,unavail,driver,0,0.0
26214,10-24,alive,airbag,belted,1,m,54,2002,2002.0,nodeploy,driver,0,2.0
26215,10-24,alive,airbag,belted,1,f,27,2002,1990.0,deploy,driver,1,3.0
26216,25-39,alive,airbag,belted,1,f,18,2002,1999.0,deploy,driver,1,0.0


In [51]:
# Check the dataset for any null values:
for column in crash_2.columns:
    print(f'Column {column} has {crash_2[column].isnull().sum()}\
    null values')    

Column est_impact_kmh has 0    null values
Column occupant_status has 0    null values
Column airbag_available has 0    null values
Column seatbelt has 0    null values
Column front_impact has 0    null values
Column sex has 0    null values
Column occupant_age has 0    null values
Column accident_year has 0    null values
Column vehicle_year has 1    null values
Column airbag_deployment has 0    null values
Column occupant_role has 0    null values
Column deployment has 0    null values
Column injury_severity has 153    null values


In [52]:
# Drop the null row:
crash_3 = crash_2.dropna()
for column in crash_3.columns:
    print(f'Column {column} has {crash_3[column].isnull().sum()}\
    null values')

Column est_impact_kmh has 0    null values
Column occupant_status has 0    null values
Column airbag_available has 0    null values
Column seatbelt has 0    null values
Column front_impact has 0    null values
Column sex has 0    null values
Column occupant_age has 0    null values
Column accident_year has 0    null values
Column vehicle_year has 0    null values
Column airbag_deployment has 0    null values
Column occupant_role has 0    null values
Column deployment has 0    null values
Column injury_severity has 0    null values


In [53]:
# Print out the est_impact_kmh value counts:
impact = crash_3.est_impact_kmh.value_counts()
impact

10-24      12766
25-39       8165
40-54       2965
55+         1491
1-9km/h      676
Name: est_impact_kmh, dtype: int64

In [54]:
# Rename values in est_impacgt_kmh & airbag_available:
speed = '1-9'
airbagsY = 'yes'
airbagsN = 'no'

crash_3['est_impact_kmh'] = crash_3['est_impact_kmh'].replace({'1-9km/h':speed})
crash_3['airbag_available'] = crash_3['airbag_available'].replace({'airbag':airbagsY,'none':airbagsN})

In [55]:
# Print out the occupant_status value counts:
survive = crash_3.occupant_status.value_counts()
survive

alive    24883
dead      1180
Name: occupant_status, dtype: int64

In [56]:
# Print out the airbag value counts:
airbag_available = crash_3.airbag_available.value_counts()
airbag_available

yes    14336
no     11727
Name: airbag_available, dtype: int64

In [57]:
# Print out the seatbelt value counts:
seatbelt = crash_3.seatbelt.value_counts()
seatbelt

belted    18465
none       7598
Name: seatbelt, dtype: int64

In [58]:
# Change the values to reflect belted or not belted:
crash_3['seatbelt'] = crash_3['seatbelt'].replace({'none':'not_belted'})
belted = crash_3.seatbelt.value_counts()
belted

belted        18465
not_belted     7598
Name: seatbelt, dtype: int64

In [59]:
# Print out the seatbelt value counts:
front = crash_3.front_impact.value_counts()
front

1    16775
0     9288
Name: front_impact, dtype: int64

In [23]:
# Print out the vehicle_year value counts: 
year = crash_3.vehicle_year.value_counts()
year

1995.0    2026
1997.0    1885
1994.0    1832
1996.0    1813
1998.0    1809
1993.0    1622
1999.0    1568
1992.0    1415
1991.0    1406
1989.0    1352
1990.0    1317
2000.0    1259
1988.0    1239
1987.0    1019
1986.0     905
2001.0     708
1985.0     708
1984.0     519
2002.0     362
1983.0     267
1982.0     191
1981.0     143
1979.0     129
1978.0     122
1980.0     102
1977.0      58
1976.0      37
1973.0      34
2003.0      31
1975.0      28
1974.0      25
1969.0      23
1972.0      23
1966.0      17
1971.0      17
1970.0      16
1968.0      13
1967.0       9
1963.0       4
1965.0       4
1956.0       2
1961.0       1
1964.0       1
1953.0       1
1959.0       1
Name: vehicle_year, dtype: int64

In [60]:
# See how many vahicles from each year have airbags installed:
crash_3.airbag_available.eq('yes').astype(int).groupby(crash_3.vehicle_year).sum()

vehicle_year
1953.0       0
1956.0       0
1959.0       0
1961.0       0
1963.0       0
1964.0       0
1965.0       0
1966.0       0
1967.0       0
1968.0       0
1969.0       0
1970.0       0
1971.0       0
1972.0       0
1973.0       0
1974.0       0
1975.0       0
1976.0       0
1977.0       0
1978.0       0
1979.0       0
1980.0       0
1981.0       0
1982.0       0
1983.0       0
1984.0       0
1985.0       0
1986.0       4
1987.0       4
1988.0      11
1989.0      37
1990.0     236
1991.0     339
1992.0     498
1993.0     805
1994.0    1217
1995.0    1870
1996.0    1752
1997.0    1858
1998.0    1800
1999.0    1559
2000.0    1249
2001.0     704
2002.0     362
2003.0      31
Name: airbag_available, dtype: int64

In [61]:
# Print out the vehicle_year value counts: 
deploy = crash_3.airbag_deployment.value_counts()
deploy

unavail     11727
deploy       8799
nodeploy     5537
Name: airbag_deployment, dtype: int64