## Bikeshare Comparisons Across Three Cities 

We are interested in ridership across three major cities throughout the country (East, Midwest, West) during 2018. We will uncover relationships between users and their: age, gender, etc. We will also look at the relationships between ridership and weather using the US weather data API as well as using the US Census API to investigate relationships between median household income of geographic areas and volume of rides.

* East Coast: Boston, MA
    * Data source: [Blue Bikes](https://www.bluebikes.com/system-data)
* Midwest: Minneapolis, MN
    * Data source: [niceride](https://www.niceridemn.com/system-data)
* West Coast: Portland, OR
    * Data source: [BIKETOWN](https://www.biketownpdx.com/system-data)


![alt text](images/bike-share.jpg "Niceride")

The bikeshare company of each city openly makes their data available in one month intervals. These can be downloaded as CSV files. Some data fields include:
* Origin station name and coordinates
* Desitnation station name and coordinates
* Date and time
* Rider age and gender

We also looked at [BikeShare-Research.org](https://bikeshare-research.org/). This is an open API with bikeshare data around the world. We decided not to use this resoruce because the data was only broken down by number of rides per day in each city. We wanted to pursue an option where we could look at individual trips and riders. 


In [None]:
# bring in our mods
import pandas as pd
import requests
from pprint import pprint
import scipy.stats as st
import matplotlib.pyplot as plt
from scipy.stats import linregress
import numpy as np

# import census
from census import Census
from us import states
# Census API Key
from config import api_key

## Reading in the bike data

The first step was to read in the data from each city. Because they came in individual months and we wanted data for the entire year, we needed to add a field for month and then append each months data to a new data frame. Below is the example for Minneapolis:

In [None]:
# Read in the individual bike files, append a new column for month
m_df1 = pd.read_csv("resources/Minneapolis/201804-niceride-tripdata.csv")
m_df1["Month"] = "April"
m_df2 = pd.read_csv("resources/Minneapolis/201805-niceride-tripdata.csv")
m_df2["Month"] = "May"
m_df3 = pd.read_csv("resources/Minneapolis/201806-niceride-tripdata.csv")
m_df3["Month"] = "June"
m_df4 = pd.read_csv("resources/Minneapolis/201807-niceride-tripdata.csv")
m_df4["Month"] = "July"
m_df5 = pd.read_csv("resources/Minneapolis/201808-niceride-tripdata.csv")
m_df5["Month"] = "August"
m_df6 = pd.read_csv("resources/Minneapolis/201809-niceride-tripdata.csv")
m_df6["Month"] = "September"
m_df7 = pd.read_csv("resources/Minneapolis/201810-niceride-tripdata.csv")
m_df7["Month"] = "October"
m_df8 = pd.read_csv("resources/Minneapolis/201811-niceride-tripdata.csv")
m_df8["Month"] = "November"

# create a new data frame and append individual data frames together
m_df = pd.DataFrame()
m_df = m_df1.append(m_df2).append(m_df3).append(m_df4).append(m_df5).append(m_df6).append(m_df7).append(m_df8)
?m_df.to_csv('census_output/m_bike.csv')
m_df.tail()

## Gathering census and cleaning data

One of our goals with this study was to determine if median household income has any relationship to the number of rides. 

To do this, we needed to gather some demographic data in a geography that can be spatially related to bike share trips. Census tracts are a common geography to map demographic data. Therefore, we used the US Census API to gather Median Household Income (referred to as MHI below), Poverty Rate, and percentage of people who drive to work per census tract.

We could not determine how to extract only the census tracts within the boundaries of each city, so instead we extracted the census tracts for the county(ies) in which each city resides (below, we filtered this down to specifically the boundaries of each city using shapefiles).

Below is an example of the census tract extraction for Minneapolis:

In [None]:
# Get census data for Minneapolis
c = Census(api_key, year=2017)
m_census_data = c.acs5.get(('NAME','B01003_001E', 'B19013_001E', 'B17001_002E', 'B08301_001E', 'B08301_003E', 'B08101_041E',), geo={'for': 'tract:*',
                       'in': 'state:{} county:053'.format(states.MN.fips)}) #  county:053 is Hennepin County
# Convert to DataFrame
m_census_pd = pd.DataFrame(m_census_data)

# Column Renaming
m_census_pd = m_census_pd.rename(columns={"B01003_001E": "Population",
                                      "B19013_001E": "Median Household Income",
                                      "B17001_002E": "Poverty count",
                                      "B08301_001E": "Commuting count",
                                      "B08301_003E": "Commuting by car count",
                                      "B08101_041E": "Commuting OTHER count",
                                      "NAME": "Name", "tract": "Census Tract"})

# Add in Poverty Rate (Poverty Count / Population)
m_census_pd["Poverty Rate"] = 100 * \
    m_census_pd["Poverty count"].astype(
        int) / m_census_pd["Population"].astype(int)

# Calculate commute by car
m_census_pd["Car Rate"] = 100 * \
    m_census_pd["Commuting by car count"].astype(
        int) / m_census_pd["Commuting count"].astype(int)

# Calculate commute by OTHER
m_census_pd["Commute OTHER rate"] = 100 * \
    m_census_pd["Commuting OTHER count"].astype(
        int) / m_census_pd["Commuting count"].astype(int)

# Calculate GEOID <= this is used to join to census geography below
m_census_pd["GEOID"] = m_census_pd["state"].astype(str)+m_census_pd["county"]+m_census_pd["Census Tract"]
m_census_pd.head()

Are these bikeshare companies providing their services in disadvantaged neighborhoods? Is everyone granted the same access to these healthy transportation alternatives? 

One route we can take to determine this is to see how the census tracts with the lowest 25% of Median Household Income perform. Additionally, we can see how the census tracts with the highest poverty rates perform. Below, we add columns to the end of our census data that tells us if that tract is in the lowest 25% of MHI or highest 25% of poverty rate. 

In [None]:
# # find the lowest median household income tracts and highest poverty tracts

# <<<<<<< Minneapolis >>>>>>>
# Sort by median household income, export low 25 to new df, create new column indicating if it is a low 25 census tract
m_census_pd = m_census_pd.sort_values("Median Household Income", ascending=True)
m_mhi_25 = m_census_pd.head(25)
m_census_pd["MHI_25"] = np.where(m_census_pd['Median Household Income']<=m_mhi_25["Median Household Income"].max(), 'yes', 'no')

# Sort by poverty rate, export top 25 to new df, create new column indicating if it is a top 25 tract for poverty rate
m_census_pd = m_census_pd.sort_values("Poverty Rate", ascending=False)
m_pvt_25 = m_census_pd.head(25)
m_census_pd["PVT_25"] = np.where(m_census_pd['Poverty Rate']>=m_pvt_25["Poverty Rate"].min(), 'yes', 'no')
m_census_pd.head()

Now, we need to determine how many rides originated in each census tract. 

We can't perform a merge, because there are no common columns between our census data and our bike data. 

However, our bike data has coordinates for the origin of each bike trip. We'll need to make use of those coordinates to determine how many are found in each census tract. Although the demographic data that we extracted from the US Census API did not include any spatial information (geometry), we can find geometry for these census tracts from other soruces. 

We downloaded shapefiles of census tracts for Massachusetts, Minnesota and Oregon from the US Censu Bureau.  Due to some limitations in how the US Census Bureau provides their data, we could not find a straighforward way to filter down to only the census tracts in each city, so some GIS software was used to filter these shapefiles down to just the boundaries of Boston, Minneapolis and Portland. 

Because we are now working with data that has spatial geometry, we needed to use a new library: [GeoPandas](http://geopandas.org/)

We initially had issues running geopandas in jupyter lab/notebook, and we also only needed one of us to download/learn/use geopandas. Therefore, we created a standalong python file to perform the geopandas work. 

This .py file uses geopandas to import the filtered shapefiles. Then, using the GEOID that we generated in the Census API data above, we joined our demographic data to these census shapefiles. Now our census data is related to a spatial location in a "geo data frame"

Then we use the coordinate fields of the bike data to create a geo data frame. 

The final step in this .py file is to perform a spatial join and join each bike trip to a census tract. Therefore, we can produce demographic data for each bike trip. See the geopandas code below (don't run, it takes forever). 

In [None]:
<THIS LINE IS INTENDED TO BREAK THIS CODE BLOCK, WE DON'T WANT TO RUN IT>
import geopandas as gpd
import numpy as np
import pandas as pd


# # Read in the shapefiles of city census tracts ( with their geometry)
m_census_gdf = gpd.read_file("Census_subsets/Mpls_CensusTracts.shp")
# # convert the GEOID field to an integer for merging later
m_census_gdf = m_census_gdf.astype({'GEOID': 'int64'})
b_census_gdf = gpd.read_file("Census_subsets/Bos_CensusTracts.shp")
b_census_gdf = b_census_gdf.astype({'GEOID': 'int64'})
p_census_gdf = gpd.read_file("Census_subsets/Por_CensusTracts.shp")
p_census_gdf = p_census_gdf.astype({'GEOID': 'int64'})

# # Read in the ACS census data from the census api
m_census_df = pd.read_csv('census_output/m_census_25.csv', index_col=0)
b_census_df = pd.read_csv('census_output/b_census_25.csv', index_col=0)
p_census_df = pd.read_csv('census_output/p_census_25.csv', index_col=0)

# # Merge the census api data to the census geometry data adn export to csv
m_merged = m_census_gdf.merge(m_census_df, on='GEOID')
# m_merged.to_csv('census_output/m_merged.csv')
b_merged = b_census_gdf.merge(b_census_df, on='GEOID')
# b_merged.to_csv('census_output/b_merged.csv')
p_merged = p_census_gdf.merge(p_census_df, on='GEOID')
# p_merged.to_csv('census_output/p_merged.csv')

# # Read in the bike trips info
m_bike_df = pd.read_csv('census_output/m_bike.csv', index_col=0)
m_bike_df = m_bike_df.rename(columns={"start station longitude":"s_longitude", "start station latitude":"s_latitude",
    "end station longitude":"e_longitude", "end station latitude":"e_latitude"})
b_bike_df = pd.read_csv('census_output/b_bike.csv', index_col=0)
b_bike_df = b_bike_df.rename(columns={"start station longitude":"s_longitude", "start station latitude":"s_latitude",
    "end station longitude":"e_longitude", "end station latitude":"e_latitude"})
p_bike_df = pd.read_csv('census_output/p_bike.csv', index_col=0)
p_bike_df = p_bike_df.rename(columns={"StartLongitude":"s_longitude", "StartLatitude":"s_latitude",
    "EndLongitude":"e_longitude", "EndLatitude":"e_latitude"})


# Create geo dfs from the city bike data
m_bike_gdf = gpd.GeoDataFrame(
    m_bike_df, geometry=gpd.points_from_xy(m_bike_df.s_longitude, m_bike_df.s_latitude))
b_bike_gdf = gpd.GeoDataFrame(
    b_bike_df, geometry=gpd.points_from_xy(b_bike_df.s_longitude, b_bike_df.s_latitude))
p_bike_gdf = gpd.GeoDataFrame(
    p_bike_df, geometry=gpd.points_from_xy(p_bike_df.s_longitude, p_bike_df.s_latitude))
print("------------------------------------")
print(" Geo dfs have been created from the bike data")
#
# Warning---- takes a long time to run!!!!!
# Spatial join
m_sjoin = gpd.sjoin(m_merged, m_bike_gdf, how="inner", op='intersects')
m_sjoin.to_csv('census_output/m_sjoin.csv')
print("------------------------------------")
print("Minneapolis data has been joined")
b_sjoin = gpd.sjoin(b_merged, b_bike_gdf, how="inner", op='intersects')
b_sjoin.to_csv('census_output/b_sjoin.csv')
print("------------------------------------")
print("Boston data has been joined")
p_sjoin = gpd.sjoin(p_merged, p_bike_gdf, how="inner", op='intersects')
p_sjoin.to_csv('census_output/p_sjoin.csv')
print("------------------------------------")
print("Portland data has been joined")

In [None]:
<THIS LINE IS MEANT TO BREAK THIS CODE BLOCK, TAKES TOO LONG TO RUN>
m_census_bike = pd.read_csv('output/m_sjoin.csv', index_col=0)
m_census_bike.head()

![alt text](Images/Mpls_Map.png "Minneapolis Map")

Now we have the MHI and poverty rate of the census tract in which each bike trip originates. 

First we'll remove the fields that we don't care about.

Then, lets see if the number of trips in the lowest MHI 25% of census tracts and highest poverty rate census tracts are proportional. 

In [None]:
< THIS LINE MEANT TO BREAK THIS CODE BLOCK>
# Extract a subset of the data frames containing only the fields that we care about
m_census_bike_sub = m_census_bike[["GEOID", "Population", "Median Household Income", "Poverty Rate", "MHI_25", "PVT_25", "start station name"]]

# Create a function to create pie charts for MHI
def piechart_mhi(df_subset, city):
    # Find out how trips are in each group
    mhi_groups = df_subset.groupby('MHI_25')
    # # Chart our data, give it a title
    explode = (.1, 0)
    mhi_chart = mhi_groups['MHI_25'].count().plot(kind="pie", title=(f"{city} bike trips in lowest 25% of Median Household Income"),
                                               autopct="%1.1f%%", explode = explode, startangle=140, shadow=True,)
    mhi_chart.set_xlabel("")
    mhi_chart.set_ylabel("")

    plt.show()
    plt.tight_layout()
    
# Create a function to create pie charts for Poverty Rate
def piechart_pvt(df_subset, city):
    # Find out how trips are in each group
    pvt_groups = df_subset.groupby('PVT_25')
    # # Chart our data, give it a title
    explode = (.1, 0)
    pvt_chart = pvt_groups['PVT_25'].count().plot(kind="pie", title=(f"{city} bike trips in highest 25% of Poverty Rate"),
                                               autopct="%1.1f%%", colors = ['red', 'purple'], explode = explode, startangle=140, shadow=True,)
    pvt_chart.set_xlabel("")
    pvt_chart.set_ylabel("")

    plt.show()
    plt.tight_layout()

# run functions


![alt text](Images/Boston_PVT.png "Boston pvt")
![alt text](Images/Portland_PVT.png "Portland PVT")
![alt text](Images/Minneapolis_PVT.png "Minneapolis PVT")
![alt text](Images/Boston_MHI.png "Boston mhi")
![alt text](Images/Portland_MHI.png "Portland mhi")
![alt text](Images/Minneapolis_MHI.png "Minneapolis MHI")


Perform linear regression on bike data vs MHI.

* Boston MHI vs bike trips is: 0.28845154131829437
* Portland MHI vs bike trips is: -0.43017099537257414
* Minneapolis MHI vs bike trips is: -0.010342502738792694


![alt text](Images/Boston_linregress.png "Boston linregress")
![alt text](Images/Portland_linregress.png "Portland linregress")
![alt text](Images/Minneapolis_linregress.png "Minneapolis linregress")

Conduct independent TTest: 

![alt text](Images/Trips_vs_MHI_scatter.png "Scatter")

In [None]:
< THIS LINE MEANT TO BREAK THIS CODE BLOCK> 
# Extract individual groups
m_geoid_grouped = m_census_bike_sub.groupby(['GEOID'])
b_geoid_grouped = b_census_bike_sub.groupby(['GEOID'])
p_geoid_grouped = p_census_bike_sub.groupby(['GEOID'])


# Note: Setting equal_var=False performs Welch's t-test which does 
# not assume equal population variance
def ttest(dataset1, dataset2, name1, name2):
    test = st.ttest_ind(dataset1['GEOID'].count(), 
             dataset2['GEOID'].count(),
             equal_var=False)
    return print(f"{name1} and {name2} number of trips dataset pvalue:{round(test[1], 5)}" )
ttest(m_geoid_grouped, b_geoid_grouped, "Minneapolis", "Boston" )
ttest(m_geoid_grouped, p_geoid_grouped, "Minneapolis", "Portland" )
ttest(b_geoid_grouped, p_geoid_grouped, "Boston", "Portland" )

* Minneapolis and Boston number of trips dataset pvalue: 0.02204
* Minneapolis and Portland number of trips dataset pvalue: 0.08612
* Boston and Portland number of trips dataset pvalue: 0.0001

Beginning of Paul's description

End of Paul's Descriptions

Beginning of Micah's Description

Data Cleaning for Bike Sharing System Data:

In [None]:
# MINNEAPOLIS 'NICERIDE' BIKE SHARE INFORMATION DATAFRAME FOR 2018 ------------------------------
# SELECTING DESIRED COLUMNS - MONTH : 'Month', USER TYPE : 'usertype', GENDER : 'gender', TRIP DURATION : 'tripduration'
# AND START DATE : 'start_time'
m_df_desired = m_df[["Month", "usertype", "gender", "tripduration", "start_time"]]

# CLEANING DATAFRAME COLUMN NAMES FOR MINNEAPOLIS BIKE SHARE INFORMATION
m_df_clean_m = m_df_desired.rename(columns = {"usertype": "User Type",
                                              "gender": "User Gender",
                                              "tripduration" : "Trip Duration",
                                              "start_time" : "Trip Date"})

Data Cleaning for Weather Data:

In [None]:
# =================================================================================================
# GETTING HISTORICAL WEATHER DATA FROM 01 JANUARY 2018 THROUGH 31 DECEMBER 2018 (OBTAINED THROUGH NOAA)
# =================================================================================================

# READING IN .CSV FILE CALLED 'Weather-Data.csv'
weather_data_df = pd.read_csv("resources/Weather-Data.csv")

weather_data_df

# CLEANING DATAFRAME COLUMN NAMES
weather_data_df = weather_data_df.rename( columns = {'AWND' : 'Average Wind Speed',
                                                     'NAME' : 'Name',
                                                     'DATE' : 'Date',
                                                     'MDPR' : 'Multiday Precipitation Total',
                                                     'PGTM' : 'Peak Gust Time',
                                                     'PRCP' : 'Precipitation',
                                                     'PSUN' : 'Daily Percent of Possible Sunshine',
                                                     'SNOW' : 'Snowfall',
                                                     'SNWD' : 'Snow Depth',
                                                     'TAVG' : 'Average Temperature',
                                                     'TMAX' : 'Maximum Temperature',
                                                     'TMIN' : 'Minimum Temperature',
                                                     'TOBS' : 'Temperature at Time of Observation',
                                                     'TSUN' : 'Total Sunshine',
                                                     'WDMV' : 'Total Wind Movement',
                                                     'WT01' : 'Fog, Ice Fog, or Freezing Fog',
                                                     'WT02' : 'Heavy Fog or Heaving Freezing Fog',
                                                     'WT03' : 'Thunder',
                                                     'WT04' : 'Ice pellets, Sleet, Snow Pellets, or Small Hail',
                                                     'WT05' : 'Hail',
                                                     'WT08' : 'Smoke or Haze',
                                                     'WT09' : 'Blowing or Drifting Snow',
                                                     'WT11' : 'High or Damaging Winds'})

# FILLING NAN VALUES WITH ZEROS
weather_data_df_cleaned = weather_data_df.fillna(0)

# SELECTING ONLY COLUMNS WE WANT AND HAVE CLEANED
weather_data_df_cleansed = weather_data_df_cleaned[["Name",
                                                    "Date",
                                                    "Average Wind Speed",
                                                    "Multiday Precipitation Total",
                                                    "Peak Gust Time",
                                                    "Precipitation",
                                                    "Daily Percent of Possible Sunshine",
                                                    "Snowfall",
                                                    "Snow Depth",
                                                    "Average Temperature",
                                                    "Maximum Temperature",
                                                    "Minimum Temperature",
                                                    "Temperature at Time of Observation",
                                                    "Total Sunshine",
                                                    "Total Wind Movement",
                                                    "Fog, Ice Fog, or Freezing Fog",
                                                    "Heavy Fog or Heaving Freezing Fog",
                                                    "Thunder",
                                                    "Ice pellets, Sleet, Snow Pellets, or Small Hail",
                                                    "Hail",
                                                    "Smoke or Haze",
                                                    "Blowing or Drifting Snow",
                                                    "High or Damaging Winds"]]


New Experience:

In [None]:
# SPLITTING 'Name' COLUMN TO FIRST WORD ONLY AND PLACING IN NEW COLUMN CALLED 'city_name'
weather_data_df_cleansed["city_name"] = weather_data_df_cleansed["Name"].str.split(" ").str[0]

Question: Do subscribers or daily customers mostly support the bike share systems?

In [None]:
# LOOKING AT MINNEAPOLIS USER TYPE DATA

# GET DISTANCE BETWEEN LATITUDE AND LONGITUDE POINTS FOR MINNEAPOLIS RIDES
from geopy import distance

m_start_lat = list(m_df["start station latitude"])
m_start_lng = list(m_df["start station longitude"])
m_end_lat = list(m_df["end station latitude"])
m_end_lng = list(m_df["end station longitude"])

i = 0 

minneapolis_trip_distances = []

for i in np.arange(len(m_start_lat)):
    
    minneapolis_trip_distances.append(distance.distance((m_start_lat[i], m_start_lng[i]), (m_end_lat[i], m_end_lng[i])).miles)
   
    i += 1


In [None]:
# ADD TRIP DISTANCES COLUMN TO THE MINNEAPOLIS BIKE SHARE DATAFRAME
m_df_clean_m["Distance Traveled (mi)"] = minneapolis_trip_distances

# AVERAGE DISTRANCE TRAVELED FOR EACH GROUP OF USER CLASSIFICATIONS IN MINEAPOLIS FOR ENTIRE 2018
average_annual_distance_by_user_type_m = m_df_clean_m.groupby(["User Type"])["Distance Traveled (mi)"].mean()

# TOTAL DISTANCE TRAVLED BY USER TYPES OVER THE YEAR IN PORTLAND
total_distance_by_user_type_m = m_df_clean_m.groupby(["User Type"])["Distance Traveled (mi)"].sum()

# COUNT OF HOW MANY RIDES WERE TAKEN FOR USER TYPES THERE ARE IN PORTLAND
rides_per_user_type_m = m_df_clean_m["User Type"].value_counts()

user_type_list_m = list(m_df_clean_m["User Type"].unique())


In [None]:
# PLOTTING MINNEAPOLIS USER TYPE COMPARISONS
# PLOTTING RIDE COUNT PER USER TYPE IN MINNEAPOLIS
plt.bar(user_type_list_m, rides_per_user_type_m, color = 'black')
plt.title('Number of Rides Taken in Minneapolis, MN by User Type in 2018')
plt.xlabel('User Type')
plt.ylabel('Number of Rides')
plt.savefig("Images/number_rides_user_minneapolis.png")
plt.show()

# plotting subscriber type vs mean distance traveled per trip
plt.bar(user_type_list_m, average_annual_distance_by_user_type_m, color = 'black')
plt.title('Average Trip Distance Traveled by User Type in Minneapolis, MN during 2018')
plt.xlabel('User Type')
plt.ylabel('Distance Traveled (mi)')
plt.savefig("Images/mean_distance_per_user_minneapolis.png")
plt.show()

# # plotting subscriber type vs total distance traveled 
plt.bar(user_type_list_m, total_distance_by_user_type_m, color = 'black')
plt.title('Total Distance Traveled by User Type in Minneapolis, MN during 2018')
plt.xlabel('User Type')
plt.ylabel('Distance Traveled (mi)')
plt.savefig("Images/total_distance_per_user_minneapolis.png")
plt.show()

Plots for all three cities:

![title](Images/mean_distance_per_user_portland.png)
![title](Images/mean_distance_per_user_minneapolis.png)
![title](Images/mean_distance_per_user_boston.png)

![title](Images/number_rides_user_boston.png)
![title](Images/number_rides_user_minneapolis.png)
![title](Images/number_rides_user_portland.png)

![title](Images/total_distance_per_user_boston.png)
![title](Images/total_distance_per_user_minneapolis.png)
![title](Images/total_distance_per_user_portland.png)

Hypothesis: The month of the year (warmer temperatures) affects ride volume.

In [None]:
# COMPARING RIDES BY MONTH ===============================================

# DEFINING ACTIVE MONTHS LIST FOR EACH CITY
active_months_mpls = list(m_df_clean_m["Month"].unique())
active_months_boston = list(b_df_clean_m["Month"].unique())
active_months_portland = list(p_df_clean_m["Month"].unique())

# DEFINING RIDES PER MONTH
monthly_rides_m = m_df_clean_m.groupby("Month", sort = False)["User Type"].count()
monthly_rides_b = b_df_clean_m.groupby("Month", sort = False)["User Type"].count()
monthly_rides_p = p_df_clean_m.groupby("Month", sort = False)["User Type"].count()

# PLOTTING RIDES PER MONTH FOR EACH CITY
# MINNEAPOLIS RIDES PER MONTH BAR GRAPH
plt.bar(active_months_mpls, monthly_rides_m, color = 'black')
plt.title('Number of Rides Taken per Month in Minneapolis, MN in 2018')
plt.xticks(rotation = 45)
plt.xlabel('Month')
plt.ylabel('Number of Rides')
plt.savefig("Images/number_rides_per_month_mpls.png")
plt.show()

# BOSTON RIDES PER MONTH BAR GRAPH
plt.bar(active_months_boston, monthly_rides_b, color = 'black')
plt.title('Number of Rides Taken per Month in Boston, MA in 2018')
plt.xticks(rotation = 45)
plt.xlabel('Month')
plt.ylabel('Number of Rides')
plt.savefig("Images/number_rides_per_month_boston.png")
plt.show()

# PORTLAND RIDES PER MONTH BAR GRAPH
plt.bar(active_months_portland, monthly_rides_p, color = 'black')
plt.title('Number of Rides Taken per Month in Portland, OR in 2018')
plt.xticks(rotation = 45)
plt.xlabel('Month')
plt.ylabel('Number of Rides')
plt.savefig("Images/number_rides_per_month_portland.png")
plt.show()

Plots for Rides per Month: 

![title](Images/number_rides_per_month_boston.png)
![title](Images/number_rides_per_month_portland.png)
![title](Images/number_rides_per_month_mpls.png)


Normal Testing on Monthly Ride Volume: 

In [None]:
# TESTING EACH RIDES BY MONTH SET FOR NORMAL DISTRIBUTION USING scipy.stats.normaltest() ======================================
import scipy.stats as sts

# minneapolis normal test
m_normal_test = sts.normaltest(monthly_rides_m)

# boston normal test
b_normal_test = sts.normaltest(monthly_rides_b)

# portland normal test
p_normal_test = sts.normaltest(monthly_rides_p)

# print statement sumamrizing normal tests for rides per month across the three cities
print(f"""The p-value for Minneapolis' rides per month data is: {m_normal_test[1]}.
The p-value for Boston's rides per month data is: {b_normal_test[1]}.
The p-value for Portland's rides per month data is: {p_normal_test[1]}""")

Checking for ride volume in awful weather: 

In [None]:
# check for ridership in rain
portland_rain = portland_merged.loc[(portland_merged["Precipitation"] > 0) & ((portland_merged['User Type'] == 'Subscriber') | (portland_merged['User Type'] == 'Casual')), :]
minneapolis_rain = minneapolis_merged.loc[(minneapolis_merged["Precipitation"] > 0) & ((minneapolis_merged['User Type'] == 'Subscriber') | (minneapolis_merged['User Type'] == 'Customer')), :]
boston_rain = boston_merged.loc[(boston_merged["Precipitation"] > 0) & ((boston_merged['User Type'] == 'Subscriber') | (boston_merged['User Type'] == 'Customer')), :]

print(f"Number of riders who rode in the rain in Portland: {len(portland_rain)}, in Minneapolis: {len(minneapolis_rain)}, and in Boston: {len(boston_rain)}")

# check for ridership in high winds
portland_wind = portland_merged.loc[(portland_merged["High or Damaging Winds"] > 0) & ((portland_merged['User Type'] == 'Subscriber') | (portland_merged['User Type'] == 'Casual')), :]
minneapolis_wind = minneapolis_merged.loc[(minneapolis_merged["High or Damaging Winds"] > 0) & ((minneapolis_merged['User Type'] == 'Subscriber') | (minneapolis_merged['User Type'] == 'Customer')), :]
boston_wind = boston_merged.loc[(boston_merged["High or Damaging Winds"] > 0) & ((boston_merged['User Type'] == 'Subscriber') | (boston_merged['User Type'] == 'Customer')), :]

print(f"Number of riders who rode through high winds in Portland: {len(portland_wind)}, in Minneapolis: {len(minneapolis_wind)}, and in Boston: {len(boston_wind)}")

End of Micah's Description