# A comparison of the driving experience across the United States.

I have long been interested in the experiences of driving across the United States. Almost everyone I have met in the United States has their own anecdotes about how bad certain states are for drivers. 

- "There is so much construction in Pennsylvania!"
- "Arizona drivers are reckless!"
- "Jersey drivers are THE WORST."

I decided to pull data that are available to me for a slightly tongue-in-cheek exercise to determine which US states actually are most unpleasant for drivers. There are several major factors that come to mind when considering "unpleasantness for drivers":

1. Traffic
2. Construction (related to traffic)
3. Accidents (related to traffic)
4. Difficulty of driving (rough roads, poorly signed roads, single-lane roads)

Traffic is the single biggest item. I don't think anyone would argue that commuting by car in San Francisco or New York City is pleasant. However, this is difficult to quantify across the entire US - in addition to requiring some measure of the speed of traffic flow, we would also need average trip lengths across the country to make a meaningful comparison. Additionally, when browsing Kaggle and the internet in general, I was unable to find a reaonably accessible dataset that would allow me to quantify the concept of "traffic."

I think most people would agree, though, that the presence of road construction and accidents both contribute to negative driving experiences. Especially when considering long distance driving, where a driver may cross multiple state boundaries, accident and construction frequency may be a better indicator of driving experiences than commuting traffic around major metropolitan areas. Personally, I have had several jobs that regularly required me to drive several hundred miles regularly, and I developed plenty of anecdotal assumptions while doing so (Ohio is OK, Pennsylvania is bad for construction, DC is bad for accidents, etc.). Lastly, I am interested in comparing US states against eachother; what is it like for a driver who is "just passing through?" I also think there is some fun to be had in a sports-like fashion by using states as the unit of comparison. I have a major grudge against Pennsylvania for the amount of traffic I have incurred on my many drives from Pittsburgh to Philadelphia, and I will try not to let this affect my judgement while conducting this analysis.

## Data

I was able to find two interesting datasets being hosted on Kaggle that pertain to accidents and construction, respectively:

- US Accidents (2016 - 2021) - https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents?datasetId=199387&sortBy=voteCount
- US Road Construction and Closures (2016 - 2021) | Kaggle - https://www.kaggle.com/datasets/sobhanmoosavi/us-road-construction-and-closures

These have been assembled and gratiously provided by Sobhan Moosavi and their research team:
- Moosavi, Sobhan, Mohammad Hossein Samavatian, Srinivasan Parthasarathy, and Rajiv Ramnath. “A Countrywide Traffic Accident Dataset.”, 2019.

Already, we can imagine simply comparing the frequency of accidents and construction (hereafter called "traffic events") across states. However, such a comparison would not be very fair if we did not account for the size of a state; we would obviously expect California to have more roads than Rhode Island, and so simply comparing their absolute number of traffic events is not fair. The Federal Highway Administration provides a yearly report on the number and length of roads across the United States - we can use this to normalize our traffic event counts, and we can even compare highway vs freeway events as well as urban vs rural roads:

- Table HM-60 - Highway Statistics 2020 - Policy | Federal Highway Administration (dot.gov) - https://www.fhwa.dot.gov/policyinformation/statistics/2020/hm60.cfm

Finally, and this is especially for fun and some light ribbing, I was able to easily locate the US Federal Transportaion Administration's yearly funding allotment to each state. I am very curious about comparing the "least-pleasant" states for driving with the amount of federal funding they receive:

- FTA Allocations for Formula and Discretionary Programs by State FY 1998-2022 Full Year | FTA (dot.gov) - https://www.transit.dot.gov/funding/grants/fta-allocations-formula-and-discretionary-programs-state-fy-1998-2022-full-year

## Specific questions we plan to answer in this analysis

Using the data I have been able to collect, I plan to try and answer the following questions:

1. 

In [11]:
import pandas as pd
import seaborn as sns
pd.options.display.max_rows = 100
pd.options.display.max_columns = 200

FNAME_ACCIDENTS = r"data\raw\US_Accidents_Dec21_updated.csv"
FNAME_CONSTRUCTION = r"data\raw\US_Constructions_Dec21.csv"
FNAME_ROADLENGTHS = r"data\working\hm60_raw.csv"
FNAME_FUNDING = r"data\working\FTA_funding_2021_raw.csv"

In [2]:
df_accident = pd.read_csv(FNAME_ACCIDENTS)
df_accident.head()

Unnamed: 0,ID,Severity,Start_Time,End_Time,Start_Lat,Start_Lng,End_Lat,End_Lng,Distance(mi),Description,Number,Street,Side,City,County,State,Zipcode,Country,Timezone,Airport_Code,Weather_Timestamp,Temperature(F),Wind_Chill(F),Humidity(%),Pressure(in),Visibility(mi),Wind_Direction,Wind_Speed(mph),Precipitation(in),Weather_Condition,Amenity,Bump,Crossing,Give_Way,Junction,No_Exit,Railway,Roundabout,Station,Stop,Traffic_Calming,Traffic_Signal,Turning_Loop,Sunrise_Sunset,Civil_Twilight,Nautical_Twilight,Astronomical_Twilight
0,A-1,3,2016-02-08 00:37:08,2016-02-08 06:37:08,40.10891,-83.09286,40.11206,-83.03187,3.23,Between Sawmill Rd/Exit 20 and OH-315/Olentang...,,Outerbelt E,R,Dublin,Franklin,OH,43017,US,US/Eastern,KOSU,2016-02-08 00:53:00,42.1,36.1,58.0,29.76,10.0,SW,10.4,0.0,Light Rain,False,False,False,False,False,False,False,False,False,False,False,False,False,Night,Night,Night,Night
1,A-2,2,2016-02-08 05:56:20,2016-02-08 11:56:20,39.86542,-84.0628,39.86501,-84.04873,0.747,At OH-4/OH-235/Exit 41 - Accident.,,I-70 E,R,Dayton,Montgomery,OH,45424,US,US/Eastern,KFFO,2016-02-08 05:58:00,36.9,,91.0,29.68,10.0,Calm,,0.02,Light Rain,False,False,False,False,False,False,False,False,False,False,False,False,False,Night,Night,Night,Night
2,A-3,2,2016-02-08 06:15:39,2016-02-08 12:15:39,39.10266,-84.52468,39.10209,-84.52396,0.055,At I-71/US-50/Exit 1 - Accident.,,I-75 S,R,Cincinnati,Hamilton,OH,45203,US,US/Eastern,KLUK,2016-02-08 05:53:00,36.0,,97.0,29.7,10.0,Calm,,0.02,Overcast,False,False,False,False,True,False,False,False,False,False,False,False,False,Night,Night,Night,Day
3,A-4,2,2016-02-08 06:51:45,2016-02-08 12:51:45,41.06213,-81.53784,41.06217,-81.53547,0.123,At Dart Ave/Exit 21 - Accident.,,I-77 N,R,Akron,Summit,OH,44311,US,US/Eastern,KAKR,2016-02-08 06:54:00,39.0,,55.0,29.65,10.0,Calm,,,Overcast,False,False,False,False,False,False,False,False,False,False,False,False,False,Night,Night,Day,Day
4,A-5,3,2016-02-08 07:53:43,2016-02-08 13:53:43,39.172393,-84.492792,39.170476,-84.501798,0.5,At Mitchell Ave/Exit 6 - Accident.,,I-75 S,R,Cincinnati,Hamilton,OH,45217,US,US/Eastern,KLUK,2016-02-08 07:53:00,37.0,29.8,93.0,29.69,10.0,WSW,10.4,0.01,Light Rain,False,False,False,False,False,False,False,False,False,False,False,False,False,Day,Day,Day,Day


In [3]:
df_construction = pd.read_csv(FNAME_CONSTRUCTION)
df_construction.head()

Unnamed: 0,ID,Severity,Start_Time,End_Time,Start_Lat,Start_Lng,End_Lat,End_Lng,Distance(mi),Description,Number,Street,Side,City,County,State,Zipcode,Country,Timezone,Airport_Code,Weather_Timestamp,Temperature(F),Wind_Chill(F),Humidity(%),Pressure(in),Visibility(mi),Wind_Direction,Wind_Speed(mph),Precipitation(in),Weather_Condition,Amenity,Bump,Crossing,Give_Way,Junction,No_Exit,Railway,Roundabout,Station,Stop,Traffic_Calming,Traffic_Signal,Turning_Loop,Sunrise_Sunset,Civil_Twilight,Nautical_Twilight,Astronomical_Twilight
0,C-1,4,2019-04-05 16:00:00.000000000,2020-09-29 11:53:57.000000000,32.83836,-93.152378,32.85074,-93.164388,1.103497,Construction on LA-534 WB near EDMONDS LOOP Ro...,4200.0,Highway 534,R,Haynesville,Claiborne,LA,71038-7130,US,US/Central,KMNE,2019-04-05 15:55:00,75.0,75.0,58.0,29.72,10.0,S,3.0,0.0,Fair,False,False,False,False,False,False,False,False,False,False,False,False,False,Day,Day,Day,Day
1,C-2,2,2021-11-12 07:59:00.000000000,2021-11-12 08:22:30.000000000,30.221331,-92.008625,30.216642,-92.003809,0.433173,Slow traffic on US-90 E from US-167/Louisiana ...,1098.0,SW Evangeline Trwy,R,Lafayette,Lafayette,LA,70501-8244,US,US/Central,KLFT,2021-11-12 07:59:00,55.0,55.0,100.0,30.09,3.0,CALM,0.0,0.0,Mostly Cloudy,False,False,False,False,False,False,False,False,False,False,False,False,False,Day,Day,Day,Day
2,C-3,2,2021-10-12 07:17:30.000000000,2021-10-12 09:18:55.000000000,39.653153,-104.910224,39.65312,-104.913838,0.192266,Slow traffic on CO-30 from S Tamarac Dr (E Ham...,6779.0,E Hampden Ave,R,Denver,Denver,CO,80224-3007,US,US/Mountain,KBKF,2021-10-12 06:58:00,37.0,33.0,82.0,24.09,10.0,WSW,5.0,0.0,Partly Cloudy,False,False,True,False,False,False,False,False,True,False,False,False,False,Day,Day,Day,Day
3,C-4,4,2021-02-10 02:46:10.000000000,2021-02-17 23:59:00.000000000,33.961506,-118.029339,33.961919,-118.029082,0.032112,Closed road from Whittier to College Ave due t...,13585.0,Whittier Blvd,L,Whittier,Los Angeles,CA,90605-1935,US,US/Pacific,KFUL,2021-02-10 02:53:00,54.0,54.0,83.0,29.92,9.0,CALM,0.0,0.0,Cloudy,False,False,False,False,False,False,False,False,False,False,False,False,False,Night,Night,Night,Night
4,C-5,2,2020-09-24 15:58:00.000000000,2020-09-25 21:04:54.000000000,40.008734,-79.599696,40.022822,-79.595703,0.996057,Construction on US-119 NB near SAMPSON ST Allo...,1144.0,Schley St,R,Connellsville,Fayette,PA,15425-2945,US,US/Eastern,KLBE,2020-09-24 15:53:00,73.0,73.0,,28.78,10.0,SSW,7.0,0.0,Partly Cloudy,False,False,False,False,False,False,False,False,False,False,False,False,False,Day,Day,Day,Day


In [4]:
df_roadlengths = pd.read_csv(FNAME_ROADLENGTHS)
df_roadlengths.head()

Unnamed: 0,STATE,RURAL INTERSTATE,RURAL OTHER FREEWAYS AND EXPRESSWAYS,RURAL OTHER PRINCIPAL ARTERIAL,RURAL MINOR ARTERIAL,RURAL MAJOR COLLECTOR,RURAL MINOR COLLECTOR,RURAL LOCAL (2),RURAL TOTAL,URBAN INTERSTATE,URBAN OTHER FREEWAYS AND EXPRESSWAYS,URBAN OTHER PRINCIPAL ARTERIAL,URBAN MINOR ARTERIAL,URBAN MAJOR COLLECTOR,URBAN MINOR COLLECTOR,URBAN LOCAL (2),URBAN TOTAL,TOTAL LANE MILES
0,Alabama,2434.106,0.0,6132.137,8292.522,24615.908,11847.076,90777.198,144098.947,2229.299,150.761,4888.909,6104.632,7563.354,376.649,44147.498,65461.102,209560.049
1,Alaska,2057.403,0.0,1611.935,867.227,2741.066,2854.07,18974.316,29106.017,319.811,0.0,504.637,476.119,511.584,473.979,4515.416,6801.546,35907.563
2,Arizona,3721.622,76.968,3579.349,4551.759,6764.702,5924.0,59403.344,84021.744,1479.266,1786.303,3791.872,9513.577,6327.305,5438.546,34402.26,62739.129,146760.873
3,Arkansas,1751.711,288.144,5185.903,6398.375,23668.683,13494.408,116003.682,166790.906,1481.653,425.713,2233.968,4375.822,4449.561,483.016,23864.266,37313.999,204104.905
4,California,5480.322,1798.278,8768.01,13001.42,25018.96,15176.46,83503.132,152746.582,9821.58,9042.421,26555.933,30845.963,27891.613,422.128,139290.086,243869.724,396616.306


In [12]:
df_funding = pd.read_csv(FNAME_FUNDING)
df_funding.head()

Unnamed: 0,State,Metropolitan Planning,Statewide Planning,Urbanized Area Formula,Fixed Guideway Capital Investment Grants,Enhanced Mobility for Older Adults and People with Disabilities,Nonurbanized Area Formula,RTAP,Appalachian Dev. Public Trans. Assist. Program,Indian Reserv. Formula,State of Good Repair,Bus and Bus Facilities Formula,Low or No Emission,Safety Research and Demonstration (SRD) Program,Bus Operator Compartment Redesign (BCP) Program,Public Transportation COVID19 RDG Program,Real-Time Transit Infrastructure and Rolling Stock Condition Assessment Research and Demo Program,Tribal Transit Competitive,Research,Technical Assistance Standards & Training,Bus Testing,State Safety Oversight,State Total
0,Alabama,931335,260679,25309848,0,5126196,18097393,306694,5000000,21675,0,6216580.0,4275820,0,0,300000,0,0,0,0,0,0,65846220.0
1,Alaska,467811,130949,17386001,0,463736,9576760,103845,0,630952,24016329,4164008.0,0,0,0,0,0,3821630,0,0,0,0,60762020.0
2,American Samoa,0,0,0,0,12043,367453,15467,0,0,0,1000000.0,0,0,0,0,0,0,0,0,0,0,1394963.0
3,Arizona,2683586,573731,86594000,147110967,6837434,13954184,190477,0,3458596,7516659,12543030.0,611840,0,0,600000,0,445000,0,0,0,672128,283791600.0
4,Arkansas,468769,130949,13761191,0,3652195,14289603,238615,0,0,332874,4958159.0,4900000,0,0,288750,0,0,0,0,0,277078,43298180.0
