# Exploratory Analysis - Fatal Accident Reporting System (FARS)

## Hypothesis

Despite increases in AI technology in cars for things like collision detection/avoidance, lane drift systems, and various other
safety systems, pedestrian, cyclist, and motorist deaths are NOT significantly reduced by these interventions. In fact, the increase
in size of the average vehicle has increased the danger of collisions on average, especially for people outside of the driver's vehicle.
The increase in the average size of vehicles outpaces the overall increase in vehicle safety. Further, increases in vehicle safety are
disproportionately allocated to drivers and passengers, not those outside of the vehicle.

## Primary Research Question

* Is there a relationship between vehicle weight and fatality of auto crashes?
* Are cars with standard / optional AI based safety features less likely to be involved in fatal accidents?
* Do AI features offset risk factors (weight, alcohol consumption, time of day, etc)?



## Data Summary

There are multiple data files, and I'll be utilizing both the auxiliary and national files. The auxiliary files contain 
commonly extracted information about the vehicles, persons involved, and accidents. The raw files contain a large amount of data,
but we will only be utilizing a few variables. Code definitions can be found [here](https://static.nhtsa.gov/nhtsa/downloads/FARS/Links%20for%20FARS%20Manuals.pdf)


* FARS - Fatal Accident Reporting System
* CRSS - Crash reporting 

The variable coding is the same for both sources, though obviously one contains fatal accidents, where the other does not.

### FARS

#### Accident Data

* **accident.csv**
  * ST_CASE - the merging key across files, used with VEH_NO
  * PERNOTMVIT - number of persons not in motor vehicle
  * MVIT - number of motor vehicles in transport
  * ROUTE - route name (interstate, local street, etc)
  * TYPE_INT - type of intersection
  * LGT_COND - light condition
  * WEATHER - weather conditions
  * FATALS - number of fatalities

#### Motorist Data

* **vpicdecode.csv**
  * ST_CASE
  * VEH_NO
  * PER_NO
  * VEHICLETYPE
  * MANUFACTURERFULLNAME
  * MODEL
  * MODELYEAR
  * TRIM
  * BODYCLASS_ID/BODYCLASS
  * CURBWEIGHTLB
  * ANTILOCKBRAKESYSTEMID / ANTILOCKBRAKESYSTEM
  * AUTOPEDESTRIANALERTINGSOUNDID / AUTOPEDESTRIANALERTINGSOUND
  * ELECTRONICSTABILITYCONTROLID, ELECTRONICSTABILITYCONTROL
  * TRACTIONCONTROLID, TRACTIONCONTROL
  * SAE AUTOMATION LEVEL FROM / TO - how automated the vehicle can be
  * CRASHIMMINENTBRAKINGID, CRASHIMMINENTBRAKING
  * DYNAMICBRAKESUPPORTID, DYNAMICBRAKESUPPORT
  * PEDESTRIANAUTOEMERGENCYBRAKINGID, PEDESTRIANAUTOEMERGENCYBRAKING
  * ADAPTIVECRUISECONTROLID, ADAPTIVECRUISECONTROL
  * LANEDEPARTUREWARNINGID, LANEDEPARTUREWARNING
  * DAYTIMERUNNINGLIGHTID, DAYTIMERUNNINGLIGHT
  * ENGINEBRAKEHP_FROM/TO
  * BODYCLASS

* **drimpair.csv**
  * ST_CASE
  * VEH_NO
  * PER_NO
  * DRIMPAIR - Was the driver impaired or not
* **distract.csv**
  * ST_CASE
  * VEH_NO
  * PER_NO
  * DRDISTRACT - Was the driver distracted or not


#### Non-motorist data

* **safetyeq.csv**
  * ST_CASE
  * VEH_NO
  * NMHELMET - Helmet Use
  * NMREFCLO - Reflective Clothing
  * NMLIGHT - Nonmotorist use of Lights

* **pbtype.csv**
  * ST_CASE
  * VEH_NO
  * PBAGE - age of cyclist/pedestrian
  * PBPTYPE - pedestrian or cyclist, basically
  * PEDLOC - location of pedestrian
  * BIKELOC - location of bicycle 
  * PEDPOS - position of pedestrian
  * BIKEPOS - position of bicycle 
  * PEDCGRP - crash group pedestrian (right turn/ failure to yield, etc)
  * BIKECGRP - crash group bicycle (right turn/ failure to yield, etc)


### CRSS

In many cases, coding is shared in the CRSS and FARS. Most notably, this is the case for VIN based features.

* CASENUM - case number
* REGION
* URBANICITY - rural/urban
* STRATUM - basically crash severity
* WEIGHT - case weight for creating national samples from the data
* VEH_NO
* PER_NO

* **accident.csv**
  * PERNOTMVIT - number of people not in vehicle
  * VE_FORMS - Number of vehicles in transport
  * PERMVIT - persons in motor vehicle
  * TYPE_INT - type of intersection
  * LGT_COND - light condition
  * WEATHER - weather conditions
  * MAX_SEV - Maximum injury severity in the crash
  * ALCOHOL / ALCOHOL_IM
* **person.csv**
  * AGE
  * PER_TYP
  * INJ_SEV
  * REST_USE - restraint use
  * HELM_USE - helmet use
  * ALC_RES / DRUGS
  * STR_VEH - number of vehicle striking non-motorist
  * DEVTYPE - non motorist device type
  * LOCATION - non motorist location
  * PEDLOC - location of pedestrian
  * BIKELOC - location of bicycle 
  * PEDPOS - position of pedestrian
  * BIKEPOS - position of bicycle 
  * PEDCGP - crash group pedestrian (right turn/ failure to yield, etc)
  * BIKECGP - crash group bicycle (right turn/ failure to yield, etc)
* **distract.csv**
* **drimpair.csv**

* **vpicdecode.csv**
  * ST_CASE
  * VEH_NO
  * PER_NO
  * VEHICLETYPE
  * MANUFACTURERFULLNAME
  * MODEL
  * MODELYEAR
  * TRIM
  * BODYCLASS_ID/BODYCLASS
  * CURBWEIGHTLB
  * ANTILOCKBRAKESYSTEMID / ANTILOCKBRAKESYSTEM
  * AUTOPEDESTRIANALERTINGSOUNDID / AUTOPEDESTRIANALERTINGSOUND
  * ELECTRONICSTABILITYCONTROLID, ELECTRONICSTABILITYCONTROL
  * TRACTIONCONTROLID, TRACTIONCONTROL
  * SAE AUTOMATION LEVEL FROM / TO - how automated the vehicle can be
  * CRASHIMMINENTBRAKINGID, CRASHIMMINENTBRAKING
  * DYNAMICBRAKESUPPORTID, DYNAMICBRAKESUPPORT
  * PEDESTRIANAUTOEMERGENCYBRAKINGID, PEDESTRIANAUTOEMERGENCYBRAKING
  * ADAPTIVECRUISECONTROLID, ADAPTIVECRUISECONTROL
  * LANEDEPARTUREWARNINGID, LANEDEPARTUREWARNING
  * DAYTIMERUNNINGLIGHTID, DAYTIMERUNNINGLIGHT
  * ENGINEBRAKEHP_FROM/TO
  * BODYCLASS



In [25]:
import pandas as pd

# non_fatal = polars.read_csv("data/CRSS2022CSV/accident.csv", ignore_errors=True)
fatal = pd.read_csv("./data/FARS2022NationalCSV/accident.csv", index_col="ST_CASE")

fatal


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 260925: invalid continuation byte

AttributeError: 'DataFrame' object has no attribute 'groupby'

In [17]:
all_accidents

STATE,STATENAME,ST_CASE,PEDS,PERNOTMVIT,VE_TOTAL,VE_FORMS,PVH_INVL,PERSONS,PERMVIT,COUNTY,COUNTYNAME,CITY,CITYNAME,MONTH,MONTHNAME,DAY,DAYNAME,DAY_WEEK,DAY_WEEKNAME,YEAR,HOUR,HOURNAME,MINUTE,MINUTENAME,TWAY_ID,TWAY_ID2,ROUTE,ROUTENAME,RUR_URB,RUR_URBNAME,FUNC_SYS,FUNC_SYSNAME,RD_OWNER,RD_OWNERNAME,NHS,NHSNAME,…,STRATUM,STRATUMNAME,PJ,WEIGHT,WKDY_IM,WKDY_IMNAME,YEARNAME,HOUR_IM,HOUR_IMNAME,MINUTE_IM,MINUTE_IMNAME,EVENT1_IM,EVENT1_IMNAME,MANCOL_IM,MANCOL_IMNAME,RELJCT1_IM,RELJCT1_IMNAME,RELJCT2_IM,RELJCT2_IMNAME,LGTCON_IM,LGTCON_IMNAME,WEATHR_IM,WEATHR_IMNAME,INT_HWY,INT_HWYNAME,MAX_SEV,MAX_SEVNAME,MAXSEV_IM,MAXSEV_IMNAME,NUM_INJ,NUM_INJNAME,NO_INJ_IM,NO_INJ_IMNAME,ALCOHOL,ALCOHOLNAME,ALCHL_IM,ALCHL_IMNAME
i64,str,i64,i64,i64,i64,i64,i64,i64,i64,i64,str,i64,str,i64,str,i64,i64,i64,str,i64,i64,str,i64,str,str,str,i64,str,i64,str,i64,str,i64,str,i64,str,…,i64,str,i64,f64,i64,str,i64,i64,str,i64,i64,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str,i64,str
1,"""Alabama""",10001,0,0,2,2,0,3,3,107,"""PICKENS (107)""",0,"""NOT APPLICABLE""",1,"""January""",1,1,7,"""Saturday""",2022,12,"""12:00pm-12:59pm""",30,"""30""","""US-82 SR-6""",,2,"""US Highway""",1,"""Rural""",3,"""Principal Arterial - Other""",1,"""State Highway Agency""",1,"""This section IS ON the NHS""",…,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,"""Alabama""",10002,0,0,2,2,0,5,5,101,"""MONTGOMERY (101)""",0,"""NOT APPLICABLE""",1,"""January""",1,1,7,"""Saturday""",2022,16,"""4:00pm-4:59pm""",40,"""40""","""US-231 SR-53""",,2,"""US Highway""",1,"""Rural""",3,"""Principal Arterial - Other""",1,"""State Highway Agency""",1,"""This section IS ON the NHS""",…,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,"""Alabama""",10003,0,0,1,1,0,2,2,115,"""ST. CLAIR (115)""",0,"""NOT APPLICABLE""",1,"""January""",1,1,7,"""Saturday""",2022,1,"""1:00am-1:59am""",33,"""33""","""CR-KELLY CREEK RD""",,4,"""County Road""",1,"""Rural""",5,"""Major Collector""",2,"""County Highway Agency""",0,"""This section IS NOT on the NHS""",…,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,"""Alabama""",10004,0,0,1,1,0,1,1,101,"""MONTGOMERY (101)""",0,"""NOT APPLICABLE""",1,"""January""",2,2,1,"""Sunday""",2022,14,"""2:00pm-2:59pm""",46,"""46""","""I-65""",,1,"""Interstate""",1,"""Rural""",1,"""Interstate""",1,"""State Highway Agency""",1,"""This section IS ON the NHS""",…,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,"""Alabama""",10005,1,1,1,1,0,1,1,73,"""JEFFERSON (73)""",0,"""NOT APPLICABLE""",1,"""January""",2,2,1,"""Sunday""",2022,18,"""6:00pm-6:59pm""",48,"""48""","""I-20""",,1,"""Interstate""",2,"""Urban""",1,"""Interstate""",1,"""State Highway Agency""",1,"""This section IS ON the NHS""",…,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…,…
,,,0,0,1,1,0,,3,,,,,12,"""December""",,,7,"""Saturday""",2022,12,"""12:00pm-12:59pm""",10,"""10""",,,,,,,,,,,,,…,4,"""Stratum 4 - LMY PV Serious Inj…",1208,13.858565,7,"""Saturday""",2022,12,"""12:00pm-12:59pm""",10,10,5,"""Fell/Jumped from Vehicle""",0,"""The First Harmful Event was No…",0,"""No""",1,"""Non-Junction""",1,"""Daylight""",10,"""Cloudy""",0,"""No""",3,"""Suspected Serious Injury (A)""",3,"""Suspected Serious Injury (A)""",1,"""1""",1,"""1""",2,"""No Alcohol Involved""",2,"""No Alcohol Involved"""
,,,0,0,2,2,0,,5,,,,,12,"""December""",,,7,"""Saturday""",2022,17,"""5:00pm-5:59pm""",18,"""18""",,,,,,,,,,,,,…,10,"""Stratum 10 - Other""",1207,228.960571,7,"""Saturday""",2022,17,"""5:00pm-5:59pm""",18,18,12,"""Motor Vehicle In-Transport""",6,"""Angle""",0,"""No""",2,"""Intersection""",3,"""Dark - Lighted""",10,"""Cloudy""",0,"""No""",0,"""No Apparent Injury (O)""",0,"""No Apparent Injury (O)""",0,"""No Person Injured/Property Dam…",0,"""No Person Injured/Property Dam…",2,"""No Alcohol Involved""",2,"""No Alcohol Involved"""
,,,0,0,2,2,0,,4,,,,,12,"""December""",,,3,"""Tuesday""",2022,13,"""1:00pm-1:59pm""",33,"""33""",,,,,,,,,,,,,…,8,"""Stratum 8 - NLMY PV Minor Inju…",1207,139.072222,3,"""Tuesday""",2022,13,"""1:00pm-1:59pm""",33,33,12,"""Motor Vehicle In-Transport""",1,"""Front-to-Rear""",0,"""No""",3,"""Intersection-Related""",1,"""Daylight""",1,"""Clear""",0,"""No""",2,"""Suspected Minor Injury (B)""",2,"""Suspected Minor Injury (B)""",2,"""2""",2,"""2""",2,"""No Alcohol Involved""",2,"""No Alcohol Involved"""
,,,0,0,2,2,0,,3,,,,,12,"""December""",,,5,"""Thursday""",2022,16,"""4:00pm-4:59pm""",30,"""30""",,,,,,,,,,,,,…,9,"""Stratum 9 - LMY PV No Injuries…",1207,143.622403,5,"""Thursday""",2022,16,"""4:00pm-4:59pm""",30,30,12,"""Motor Vehicle In-Transport""",7,"""Sideswipe - Same Direction""",0,"""No""",1,"""Non-Junction""",3,"""Dark - Lighted""",10,"""Cloudy""",0,"""No""",0,"""No Apparent Injury (O)""",0,"""No Apparent Injury (O)""",0,"""No Person Injured/Property Dam…",0,"""No Person Injured/Property Dam…",2,"""No Alcohol Involved""",2,"""No Alcohol Involved"""
