# Analysis of US Department of Commerce (USDOC) American Community Survey on Commute, from (2010-2019)
### for predicting ideal Robotaxi network locations
The goal of this analysis is to best predict ideal locations for deploying Tesla's Robotaxi network. Using the USDOC data, ideal Robotaxi service locations are best characterized using trends in usage of public transportation per state and usage of taxi. It may also be useful to look at carpool usage trends.
_____
## Import data
Data downloaded from US Bureau of Transportation Statistics (USBTS), which retrieved data from the USDOC annual study on Commute. Data needed to be converted to UTF-8 to be readable by pandas.


In [58]:
import pandas as pd
import us

raw = pd.read_csv('data/commute_survey_data_utf8.csv')
raw = raw.rename(columns={'Commute mode share (percent)': 'percent'})

# Exclude US state record, so only values for individual states
raw = raw[raw.State != 'United States']
raw = raw.reset_index()

index = 0
for i in raw['State']:
    raw.loc[index, 'StateAbbr'] = us.states.lookup(i).abbr
    index += 1
wrk = raw.copy()

## Analysis Using 2019 Data

### Top Five States for Robotaxi Network, Using Survey Responses Reporting Use of Public Transportation
Calculated using a threshold of 9%, meaning 9% of survey respondents in the top five states with use of taxi, motorcycle, or other means of transport. The threshold is chosen by continually decreasing the threshold until only five states appear. Using this information, the ideal states to deploy Robotaxi to at first are:
1. **District of Columbia**, 34.1% of respondents report primarily using public transportation
2. **New York**, 27.7% of respondents report primarily using public transportation.
3. **New Jersey**, 11.6% of respondents report primarily using public transportation.
4. **Massachusetts**, 10.4% of respondents report primarily using public transportation.
5. **Illinois (Chicago)**, 9.7% of respondents report primarily using public transportation.

In [11]:
# Consider state to be 'key' if pct public transport greater than value
wrk[wrk.Mode == 'Public transportation'][wrk.percent >= 0.09]


Boolean Series key will be reindexed to match DataFrame index.



Unnamed: 0,index,State,Mode,Year,percent,StateAbbr
3271,3334,District of Columbia,Public transportation,2019,0.341522,DC
3306,3369,Illinois,Public transportation,2019,0.0965,IL
3362,3425,Massachusetts,Public transportation,2019,0.10425,MA
3425,3488,New Jersey,Public transportation,2019,0.115526,NJ
3439,3502,New York,Public transportation,2019,0.277237,NY


### Top Five States for Robotaxi Network, Using Survey Responses Reporting Use of Taxi, motorcycle, other
Calculated using a threshold of 1.8%, meaning 1.8% of survey respondents in the top five states with use of public transportation. The threshold is chosen by continually decreasing the threshold until only five states appear. Using this information, the ideal states to deploy Robotaxi to at first are:
1. **Alaska**, 5.01% of respondents report primarily using taxi, motorcycle, or other.
2. **Nevada**, 2.62% of respondents report primarily using taxi, motorcycle, or other.
3. **District of Columbia**, 2.56% of respondents report primarily using taxi, motorcycle, or other.
4. **Hawaii**, 2.2% of respondents report primarily using taxi, motorcycle, or other.
5. **Florida**, 1.97% of respondents report primarily using taxi, motorcycle, or other.

In [23]:
wrk[wrk.Mode == 'Taxi, motorcycle, or other'][wrk.percent >= 0.018]


Boolean Series key will be reindexed to match DataFrame index.



Unnamed: 0,index,State,Mode,Year,percent,StateAbbr
3225,3288,Alaska,"Taxi, motorcycle, or other",2019,0.050144,AK
3274,3337,District of Columbia,"Taxi, motorcycle, or other",2019,0.025578,DC
3281,3344,Florida,"Taxi, motorcycle, or other",2019,0.019734,FL
3295,3358,Hawaii,"Taxi, motorcycle, or other",2019,0.021964,HI
3414,3477,Nevada,"Taxi, motorcycle, or other",2019,0.026218,NV


### Top Five States for Robotaxi Network, Using Survey Responses Reporting Use of Carpool
Calculated using a threshold of 10.63%, meaning 10.63% of survey respondents in the top five states with use of carpool. The threshold is chosen by continually decreasing the threshold until only five states appear. Using this information, the ideal states to deploy Robotaxi to at first are:
1. **Hawaii**, 13.3% of respondents report primarily carpooling.
2. **Alaska**, 12% of respondents report primarily carpooling.
3. **Wyoming**, 11.5% of respondents report primarily carpooling.
4. **Arkansas**, 10.7% of respondents report primarily carpooling.
5. **Arizona**, 10.6% of respondents report primarily carpooling.

In [34]:
wrk[wrk.Mode == 'Carpool'][wrk.percent >= 0.1063]


Boolean Series key will be reindexed to match DataFrame index.



Unnamed: 0,index,State,Mode,Year,percent,StateAbbr
3221,3284,Alaska,Carpool,2019,0.119993,AK
3228,3291,Arizona,Carpool,2019,0.106488,AZ
3235,3298,Arkansas,Carpool,2019,0.107092,AR
3291,3354,Hawaii,Carpool,2019,0.13256,HI
3564,3627,Wyoming,Carpool,2019,0.115136,WY


### Visualizing These Percentages
All forms of transport and their reported use can be seen in an interactive chart. Start at the inner "ring" and click the desired state. Then, you should see each form of transit for that state in detail.

In [3]:
import plotly.express as px

# create working table in case any changes need to be made
wrk = raw[raw.Year == 2019]

# visualize usage
import numpy as np
fig = px.sunburst(wrk, path=['State', 'Mode'], values='percent', color='State',
                  color_continuous_scale='RdBu', maxdepth=2)
fig.show()

### What does this mean for selecting where do deploy Robotaxi networks?
#### Conclusions from public transport usage
Each state is notorious for having a wide hub of public transportation being used. Each of these states should be looked at more in depth for the particular municipalities where use of transit is popular. Public transportation users may find a new transport method to be more reliable. Others will have a hard time transitioning. However, there is still demand present in these states for public transportation, and a new transportation network may alleviate some public transport congestion commonly found in these areas.
#### Conclusions from usage of taxi, motorcycle, other forms of transit
Using this particular response proves to be invalid. For example, residents of Alaska will likely not find any use in a taxi network, despite having the highest percentage found in this category. Conclusions cannot be drawn with certainty from this data because this mode of transportation covers too broad of a range of possibilities. Alaska typically requires travel by plane and snowmobile (both fall under "other" category), hence its high percentage of responses within this category. Therefore, the data is not representative of demand for a taxi network.
#### Conclusions from usage of carpool
Carpool is important to look at, since people who carpool are more likely to use a ridesharing service, such as Uber or Lyft. Thus, some carpool users may be more open to using a robotaxi network. However, conclusions derived from reported carpool usage should be taken lightly, as its predictive quality is unknown.

## Continued Analysis Using 2010-2019 Data
Now that we've established current trends, we expand on trends from 'key' states from 2010-2019 to examine the change in reported transit types.

### Public Transporation Usage Trends
One of the most significant points discussed earlier was reported use of public transportation. We will examine the trends in the states DC, NY, NJ, MA, and IL, using an ordinary-least-squares linear regression model.

In [67]:
public_transport = raw[raw.Mode == 'Public transportation']
public_transport = public_transport[public_transport.StateAbbr.isin(['DC', 'NY', 'NJ', 'MA', 'IL'])]

fig2 = px.scatter(public_transport, x='Year', y='percent', trendline='ols', color='State', title='Public Transportation Usage Trends')
fig2.show()

# results = px.get_trendline_results(fig2)
# results = results.iloc[0]["px_fit_results"].summary()
# print(results)

## Carpool Usage Trends
A possible predictor of demand for a Robotaxi network is reported usage of carpool, as discussed earlier. We will look at the trends in the states AK, AR, AZ, HI, and WY, using a polynomial regression with LOWESS smoothing. The percent of reported carpool usage in each of the states varied too much over the years to use a linear regression and see any clear trends, so the smoothed polynomial regression is a better fit. The volatility in reported carpool usage could indicate that carpool use quickly changes.

In [68]:
carpool = raw[raw.Mode == 'Carpool']
carpool = carpool[carpool.StateAbbr.isin(['AK', 'AR', 'AZ', 'HI', 'WY'])]

fig3 = px.scatter(carpool, x='Year', y='percent', trendline='lowess', color='State', title='Carpool Usage Trends')
fig3.show()

## Conclusions
### Carpool Usage Trends
There was notable volatility in the reported usage of carpooling across the five selected states. It would seem that, visibly, there has been a sudden significant increase in Wyoming's use of carpool. Further analysis would have to be conducted to find any extraneous reasons for the increase, and if that increase is correlated with an increase in ridesharing service usage. Other than that, the trends seem to be mostly constant across time. There is little change overall in carpool usage.
### Public Transportation Usage Trends
There are several trends seen in reported use of public transportation over time. The first notable and most significant trend is seen in District of Columbia's reported use of public transit. There has been a constant a decrease over the last decade or so. Why is this the case?

In [70]:
dc = raw[raw.StateAbbr == 'DC']

fig4 = px.scatter(dc, x='Year', y='percent', color='Mode', trendline='ols', title='Reported Use of Transportation Modes (2010-2019), District of Columbia')
fig4.show()

The above plot provides insight into the previous question of why there may be an overall decline in the use of public transit from 2010-2019. Driving alone stays relatively constant, while all other modes of transport see a slight increase over the time period. It is possible that more people are switching to other modes of transport, because of dissatisfaction, or maybe it is due to a lack of availability of transit. Further analysis would need to be conducted to figure out why exactly there is a shift in transit methods in DC. To continue the conclusion, there is also a noticeable steady increase in reported use of public transit over the time period, across the listed states, DC excluded. It is possible that increase correlates with population growth, or more people working in those states.

### What does this mean for deploying the Robotaxi network?
First, deploying the network in DC is risky due to the decline in public transportation use. However, the upside of the risk is that residents are adopting other means of transit, which could include Robotaxi. As for the other states, deploying the network in those states steadily and conservatively could yield a greater chance of success. This is because the overall increase in use of public transit suggests an increasing demand. Residents are likely to adopt the network over time if it is conservatively implemented.