### Scooters in Austin

![scooter picture](Lyft_Scooters.jpg "Title")

#### Our Questions:
• Are some neighborhoods guilty of complaining about scooters more, even when adjusted per capita?\
• Or, do the neighborhoods with the highest numbers of scooter ride endpoints also have the most scooter related complaints?\
• Are people more likely to complain about scooters in a neighborhood where scooters are less commonly found?\
• Do local events, month of year, or day of week impact the total number of scooter rides?\
• Where are the scooters going?\
• What are the average, minimum, and maximum trip distances?\
• Do longer scooter rides to farther-out neighborhoods result in more complaints?

#### Our Resources:
• City of Austin 311 OpenData: https://data.austintexas.gov/resource/i26j-ai4z.json \
• Austin Shared Mobility API: https://data.austintexas.gov/resource/7d8e-dm7r.json \
• FCC API call for lat/long to census tract conversion: https://geo.fcc.gov/api/census/ \
• Flat file for census tract to zip code: https://www.huduser.gov/portal/datasets/usps_crosswalk.html

#### What we think we'll see:
• We think some neighborhoods complain about scooters regardless of the number of scooter rides per person in the population. \
• We think that scooter rides, and complaints about them, soar when events are happening in town, along with a rise every weekend. \
• We think complaints raise as people take scooters into neighborhoods where scooters are less common. 

#### Our method:

• The first thing we did was use austintexas.gov data to create a data frame with the Austin Shared Micromobility data. This tracks every scooter ride in town, regardless of brand / parent company. This data is visible by census tract. \
• Then, we took the census zipcode key and appended it, to tell us the zipcodes for the various rides. \
• From there, we could join our tables by zipcode and then compare them to the 311 complaints dataset for scooters. 



#### Dependencies and packages

In [1]:
import os
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import math as math
import datetime as dt
#import seaborn as sns
import pandas as pd
from sodapy import Socrata

#### Dataframes

##### Shared Mobility:

In [None]:
output_data_file = "Output_Data/shared_mobility.csv"
# url = https://data.austintexas.gov/resource/7d8e-dm7r.json
# Data Extraction:
client = Socrata("data.austintexas.gov", None)
results = client.get("7d8e-dm7r", limit=8300000)
# Convert to pandas DataFrame
results_df = pd.DataFrame.from_records(results)
results_df.head()



In [None]:
clean_results_df = results_df.copy()
clean_results_df = clean_results_df.rename(columns = {
    "trip_id": "Trip ID",
    "device_id": "Device ID",
    "modified_date": "Data Modified Date",
    "vehicle_type": "Vehicle Type",
    "trip_duration": "Trip Duration",
    "trip_distance": "Trip Distance",
    "start_time": "Trip Start Time",
    "end_time": "Trip End Time",
    "hour": "Hour",
    "day_of_week": "Day Of Week",
    "month": "Month",
    "year": "Year",
    "census_geoid_start": "GEOID Start",
    "census_geoid_end": "GEOID End",
    "council_district_start": "Start Council District",
    "council_district_end" : "Return Council District"
})
# Drop all the null values
clean_results_df = clean_results_df.dropna(how='any')
# Change the time and date format for columns - 'Data Modified Date', 'Trip Start time' and 'Trip End Time'
clean_results_df['Trip Start Time'] = pd.to_datetime(clean_results_df['Trip Start Time'])
clean_results_df['Trip End Time'] = pd.to_datetime(clean_results_df['Trip End Time'])
clean_results_df['Data Modified Date'] = pd.to_datetime(clean_results_df['Data Modified Date'])
clean_results_df.head()

##### Austin API:

#### Visualizations

In [None]:
# Count trips per day and sort by day:
daily_total = pd.DataFrame(clean_results_df['Day Of Week'].value_counts().sort_index())

# Map day of week for better labels. Data documentation indicates "0=Sunday and so on.""
days = ['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday']
daily_total['Day'] = days

# Plot
daily_total.plot(kind='bar', x='Day', y='Day Of Week', title='Total Trip Counts by Day of week', figsize = (10,5), rot= 30, legend=False)
plt.ylabel('Number of Trips')
plt.savefig("Plots/trips_per_week.png")
plt.show()

In [None]:
# Count trips per hour and sort by hour:
hourly_total = pd.DataFrame(clean_results_df['Hour'].value_counts().sort_index())
#hours = ['0','1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18', '19','20','21','22','23']
#hours = ['0','1','10','11','12','13','14','15','16','17','18','19','2','20','21','22','23','3','4','5','6','7','8','9']
#hourly_total['Hours'] = hours

hourly_total.reset_index().plot(kind='bar', x='index', y='Hour', title='Total Trip Counts by Hour', figsize = (10,5), legend=False)
plt.xlabel('Hours')
plt.ylabel('Number of Trips')
plt.savefig("Plots/trips_per_hour.png")
plt.show()
#now shows correct plot but xticks not sorted

In [None]:
monthly_total = pd.DataFrame(clean_results_df['Month'].value_counts()).sort_index()
import calendar
mn=[calendar.month_name[int(x)] for x in monthly_total.index.values.tolist()]
monthly_chart = monthly_total.plot.bar(title = "Total Trips per Month ",width = 0.75,figsize = (10,5),rot = 30, legend = False)
monthly_chart.set_xticklabels(mn)
monthly_chart.set_xlabel("Trip Months")
monthly_chart.set_ylabel("Total Trip Count")
plt.savefig("Plots/trips_per_month1.png")
plt.show()