# **The Effects of Automobile Dependency in the United States**
## Contributors: Nicholas Breymaier and Zachary Osborne
Nicholas UID: 117920871 <br>
Zachary UID: 117941609 <br>
<a href='https://github.com/nicholasbreymaier/nicholasbreymaier.github.io'> Source Code </a>

# Introduction
A place, ranging from a neighborhood-scale to nation-scale geographic area, is referred to as automobile dependent, or car dependent, when the only means by which residents are realistically capable of reaching necessities is by driving a personal automobile. By necessities we mean any locations necessary to live a successful life in modern society, such as institutions of education, jobs, and grocery stores.

This analysis will focus prinarily on automobile dependency in the United States, but data from other countries will be used for comparative purposes.

## The Importance of Automobile Dependency
Some of the information presented in this section is drawn directly from the datasets we analyze in this project, and will be cited as such. Please see the bibliography at the end of this document/website for sources.

## Relevance in the Field of Data Science
Coverage by the major media institutions in the United States fails to correlate with what Americans die from (!cite 1). Among other things, the need for major media institutions to "keep up" with the incessant and immediate news which is able to be transmitted via social media has lead to institutions (!cite) switching to more eyecatching topics to maintain viewership and therefore advertisement revenue. Relying on the media is therefore not an effective way to build public awareness about systemic problems.

Data Science offers a promising alternative medium of news coverage as opposed to televised reports transmitted by major media corporations and news transmitted via social media because it is significantly less prone to outlier bias. This project hopes to demonstrate that data science can be used to generate awareness about important yet overlooked societal issues by doing so with car dependency.

# Step 1: Data Collection
Thankfully, the US National Highway Traffic Safety Administration (NHTSA) keeps detailed data related to traffic fatalities, which serves as a good starting point for discussing the problem of car dependency.

https://www-fars.nhtsa.dot.gov/Main/index.aspx

In [None]:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import matplotlib.pyplot as plt

In [None]:

# # Use the python requests library to extract the data from the website.
# web_extract = requests.get(
#     'https://www-fars.nhtsa.dot.gov/Main/index.aspx').text

# # Organise and make legible the extracted html using the BeautifulSoup library.
# web_extract = bs(web_extract)
# web_extract.prettify()

# loci = web_extract.find('table')

# # Extract the html tables into pandas dataframes. Tables 1–9 were
# # stylistic elements in the website.
# nhtsa_nat_stats = pd.read_html(str(loci))[10]

nhtsa_nat_stats = pd.read_feather('nhtsa_nat_stats.feather')
nhtsa_nat_stats.head()

https://www.bts.gov/content/transit-profile-0

only do rough selection of categories of data we do/dont want here, do fine-grained removal in data processing (for all 3 profiles)

In [None]:
blank_headers = [2, 45, 73, 140]
undesired_data = list(range(27, 45)) + list(range(46, 73)) + list(range(74, 92)) + list(range(101, 140)) + list(range(159, 168))

transit_data = pd.read_excel(
    'table_transit_profile_032123.xlsx', 
    header=1, 
    index_col=0, 
    nrows=165,
    skiprows=blank_headers+undesired_data,
    usecols=[0]+list(range(5, 31))
)

transit_data.head()

https://www.bts.gov/content/highway-profile

In [None]:
blank_headers = [2, 24, 67, 86]
undesired_data = list(range(25, 67)) + list(range(83, 86)) + [89]

highway_data = pd.read_excel(
    'table_highway_profile_072322.xlsx', 
    header=1, 
    index_col=0, 
    nrows=87,
    skiprows=blank_headers+undesired_data,
    usecols=[0]+list(range(5, 31))
)

highway_data.head()

# Step 2: Data Processing
The NHTSA crash fatality data we extracted is far from tidy, so we shall clean and organise it here. 

Firstly, this DataFrame holds features in its rows and observations in its columns. Obervations are, in this case, years. We therefore transpose the dataframe to make the rows observations and the columns features.

In [None]:
# We will want these to be the column labels after transposing the DataFrame.
nhtsa_nat_stats = nhtsa_nat_stats.set_index('Unnamed: 0')

# Transpose the DataFrame.
nhtsa_nat_stats = nhtsa_nat_stats.transpose()

# Rename index column to 'Year' and sort data chronologically.
nhtsa_nat_stats.columns.names = ['Year']
nhtsa_nat_stats = nhtsa_nat_stats.sort_index()

nhtsa_nat_stats.head()

The original data table groups its features into various categories via extra, empty features which essentially function as "headers". This is a very messy way to do this, and furthermore having the categorization within the table is unnecessary. Here, we remove the unnecessary columns and rename the rest for clarity and to preserve information which would otherwise have been lost during the purging of the headers.

In [None]:
# Drop empty header columns.
columns_to_remove = [
    'Motor Vehicle Traffic Crashes', 
    'Traffic Crash Fatalities',
    'Vehicle Occupants', 
    'Nonmotorists',
    'Other National Statistics', 
    'National Rates: Fatalities'
]

nhtsa_nat_stats.drop(columns=columns_to_remove, inplace=True)

In [None]:
# Rename remaining columns for clarity 
nhtsa_nat_stats = nhtsa_nat_stats.rename(
    columns={
        'Drivers' : 'Driver Fatalities',
        'Passengers' : 'Passenger Fatalities',
        'Unknown' : 'Unknown Vehicle Occupant Fatalities',
        'Sub Total1' : 'Total Vehicle Occupant Fatalities',
        'Motorcyclists' : 'Motorcyclist Fatalities',
        'Pedestrians' : 'Pedestrian Fatalities',
        'Pedalcyclists' : 'Pedalcyclist Fatalities',
        'Other/ Unknown' : 'Other/Unknown Nonmotorist Fatalities',
        'Sub Total2' : 'Total Nonmotorist Fatalities',
        'Total*' : 'Total Fatalities'
    }
)

nhtsa_nat_stats.head()

### Transit Profile

In [None]:
transit_data = transit_data.transpose()

transit_data.head()

In [None]:
transit_data = transit_data.rename(
    columns={
        'Light raila' : 'Light rail',
        'Light raila ' : 'Light rail',
        'Ferryboatb' : 'Ferryboat',
        'Otherc' : 'Other',
        'Operating assistanced, total' : 'Operating assistance, total',
        'Commuter railf' : 'Commuter rail',
        'Injured persons, all modesk' : 'Injured persons, all modes'
    }
)

transit_data.head()

Removing undesired non-headers

In [None]:
columns_to_remove = [
    'Motor bus',
    'Heavy rail',
    'Light rail',
    'Trolley bus',
    'Demand responsive',
    'Ferryboat',
    'Commuter rail',
    'Other',
    'Other operating revenue',
    'Operating assistance, total',
    'State and local',
    'Federal'
]
transit_data.drop(columns=columns_to_remove, inplace=True)

transit_data.head()

Passenger operating revenues, total (millions of dollars) functions as a header for Operating revenues, total and Passenger fares, total so we have to rename them before we can remove it

In [None]:
transit_data.rename(
    columns={
        'Operating revenues, total' : 'Operating revenues, total (millions of dollars)',
        'Passenger fares, total' : 'Passenger fares, total (millions of dollars)'
    },
    inplace=True
)

transit_data.drop(columns='Passenger operating revenues, total (millions of dollars)', inplace=True)

transit_data.head()

### Highway Profile

In [None]:
highway_data = highway_data.transpose()

highway_data.head()

In [None]:
highway_data = highway_data.rename(
    columns={
        'Highway trust funda' : 'Highway trust fund',
        'Otherb' : 'Other',
        'State highway user tax revenuesc, total (millions of dollars)' : 'State highway user tax revenues, total (millions of dollars)',
        'Other motor fuel receiptsd' : 'Other motor fuel receipts',
        'Other motor vehicle feese' : 'Other motor vehicle fees',
        'Motor carrier taxesf' : 'Motor carrier taxes',
        'Miscellaneous feesg' : 'Miscellaneous fees',
        'Vehicle-miles of travel by functional system (millions), totaln' : 'Vehicle-miles of travel by functional system (millions), total',
        'Collectorj' : 'Collector'
    }
)

highway_data.head()

Removing undesired non-headers

In [None]:
highway_data.drop(
    columns={
        'Federal, total',
        'Highway trust fund',
        'Other',
        'State and local, total',
        'State and D.C.',
        'Local',
        'Motor fuel tax',
        'Other motor fuel receipts',
        'Motor vehicle registration fees ',
        'Other motor vehicle fees',
        'Motor carrier taxes',
        'Miscellaneous fees',
        'Rural mileage, total',
        'Interstate',
        'Interstate ',
        'Other freeways and expressways',
        'Other principal arterial',
        'Minor arterial',
        'Collector',
        'Urban mileage, total'
    },
    inplace=True
)

highway_data.head()

### Automobile Profile

### Combining the data

#### convert below to working with the combined df

Some of the units of measurements in the nhtsa national statistics data are poorly chosen, thus hindering interpretability. Specifically, the columns "Vehicle Miles Traveled", "Resident Population", "Registered Vehicles", and "Licensed Drivers" are all measured in thousands or billions yet also have column values in the thousands. This results in numbers such as "2,358 billion" and "260,327 thousand". While these aren't terribly hard to interpret, they do reduce ease of use and interpretation. Below, we resolve this issue by increasing the units of measurement in the aforementioned columns and accordingly scaling column values.

In [None]:
# Store new names in list for better readability/conciseness and because they
# will be used multiple times
new_names = [
    'Vehicle Miles Traveled (Trillions)',
    'Resident Population (Millions)',
    'Registered Vehicles (Millions)',
    'Licensed Drivers (Millions)'
]

nhtsa_nat_stats = nhtsa_nat_stats.rename(
    columns={
        'Vehicle Miles Traveled (Billions)' : new_names[0],
        'Resident Population (Thousands)' : new_names[1],
        'Registered Vehicles (Thousands)' : new_names[2],
        'Licensed Drivers (Thousands)' : new_names[3]
    }
)

nhtsa_nat_stats[new_names] = nhtsa_nat_stats[new_names] / 1000

nhtsa_nat_stats[new_names].head()

### Missing Data

Keeping the below as an example in case we need to do something similar later, remove before submitting

essentially converts header column structure to multi-index structure

In [None]:
# # preserving information which would otherwise be lost due to removal of the sub-header formatting of
# # the first 15 columns
# trans_p = trans_p.rename(
#     columns={
#         'Operating revenues, total' : 'Total operating revenues',
#         'Passenger fares, total' : 'Total passenger fares',
#         'Other operating revenue' : 'Non-fare operating revenue',
#         'Operating assistance, total' : 'Total operating assistance',
#         'State and local' : 'State and local operating assistance',
#         'Federal' : 'Federal operating assistance'
#     }
# )
# for i in range(3, 11):
#     trans_p.columns.values[i] += ' passenger fares'

# # Renaming columns to remove information which will soon be contained in the top-level columns
# trans_p = trans_p.rename(
#     columns={
#         'Passenger operating revenues, total (millions of dollars)' : 'Total',
#         'Operating expenses, total (millions of dollars)' : 'Total',
#         'Average passenger revenue per passenger-mile, all modes (dollars)' : 'All modes',
#         'Number of vehicles, total' : 'Total',
#         'Vehicle-miles, total (millions)' : 'Total',
#         'Passenger-miles, total (millions)' : 'Total',
#         'Energy consumption, diesel, total (million gallons)' : 'Total',
#         'Energy consumption, other, total (million gallons)' : 'Total',
#         'Energy consumption, electric power, total (million kWh)' : 'Total',
#         'Fatalities, all modes' : 'All modes',
#         'Injured persons, all modes' : 'All modes',
#         'Incidents, all modes' : 'All modes'
#     }
# )

# headers = (
#     ['Passenger Operating Revenues (Millions USD)'] * 15
#     + ['Operating Expenses (Millions USD)'] * 9
#     + ['Avg Passenger Revenue Per Passenger Mile (USD)'] * 9
#     + ['Number of Vehicles'] * 9
#     + ['Vehicle Miles (Millions)'] * 9
#     + ['Passenger Miles (Millions)'] * 9
#     + ['Diesel Energy Consumption (Million Gallons)'] * 9
#     + ['Other Energy Consumption (Million Gallons)'] * 3
#     + ['Electric Energy Consumption (Million kWh)'] * 9
#     + ['Fatalities'] * 9
#     + ['Injured Persons'] * 9
#     + ['Incidents'] * 9
# )

# trans_p.columns = pd.MultiIndex.from_tuples(list(zip(headers, trans_p.columns)))

# trans_p.head()

# Step 3: Exploratory Analysis & Data Visualization

As a part of our exploration and analysis of the data, we will want to visualize it through graphs. To aid in this, we have defined a function for plotting columns of a dataframe against the indices/other columns of the same dataframe below. Note that if one wants to make additional modifications to the plot beyond those of this function, they need only make such modifications prior to calling it. 

In [None]:
def plot_cols(df, y_cols, title, xlabel=None, ylabel=None, x_col=None):
    """
    Plots the specified columns y_cols of the dataframe df against either the indices of
    the dataframe or optionally a specified column x_col. All plotted lines will be on
    the same graph, with a legend to differentiate them

    If not specified, xlabel will default to x_col/index name and ylabel will default to y_cols[0]
    """

    # y_cols is expected to be a type list, so if the user passes a single column name we
    # want to make sure to convert it to a list of a single item
    y_cols = [y_cols] if type(y_cols) == str else y_cols
    # as explained above, y columns will be graphed against the dataframe indices if no column
    # is specifed for the x-axis
    x = df.index if x_col == None else df[x_col]
    # setting to defaults, as defined above, if necessary
    xlabel = df.columns.name if xlabel == None else xlabel
    ylabel = y_cols[0] if ylabel == None else ylabel

    # plots a line for each column
    for y_col in y_cols:
        plt.plot(x, df[y_col], label=y_col)
    
    plt.title(title)
    plt.xlabel(xlabel)
    plt.ylabel(ylabel)
    plt.legend()
    plt.show()

In [None]:
plt.xticks(range(0, len(nhtsa_nat_stats.index), 5))
plot_cols(nhtsa_nat_stats, 'Fatal Crashes', 'Yearly Fatal Motor Vehicle Crashes in the U.S.')

# Step 4: Modeling and Further Analysis with Machine Learning and Statistics

# Step 5: Data Interpretation and Insight

# Further Information for the Inquisitive Reader

There are many advocates and advocacy groups, from local international, fighting to spread awareness about this issue and push for solutions to be implemented. Here a few are listed, as well as more general sources wherein the details of car dependency are discussed:
> * <a href='https://en.wikipedia.org/wiki/Car_dependency'>Wikipedia</a> is always a good starting point.
> * <a href='https://www.planetizen.com/definition/automobile-dependency'>Planetizen</a> and <a href='https://www.vtpi.org/tdm/tdm100.htm'>Victorya Transport Poliy Institute</a> both have very good articles defining and detailing some common aspects of car dependency.
> * <a href='https://www.youtube.com/@NotJustBikes/featured'>Not Just Bikes</a>, courtesy of Jason Slaughter, became most popular 'urban planning' channel on YouTube as of 2022. Not Just Bikes explores Dutch urban design and transportation engineering with a focus on comparing it to American and Canadian development. Car dependency is a central topic throughout these videos.
> * <a href='strongtowns.org'>Strong Towns</a> is an American advocacy organisation focused on local governance, city finances, and urban development. Its founder, Charles Marohn, served as a professional traffic engineer, and advocates for reduced car dependency in tandem with zoning reform and road design reform in the US.

# Bibliography
https://ourworldindata.org/does-the-news-reflect-what-we-die-from 1
https://www-fars.nhtsa.dot.gov/Main/index.aspx \\ 2
https://nepis.epa.gov/Exe/ZyPDF.cgi?Dockey=P1013L1O.pdf 3