<h1>Capital Bikeshare Analytic Report and Research Proposal</h1>

<p>The purpose of this project is to analyze the affect of weather on Capitol Bikeshare ridership.</p>
<p>Data Source: https://www.kaggle.com/marklvl/bike-sharing-dataset</p>

<p>This dataset contains data from the <a href='https://www.capitalbikeshare.com/' target='blank'>Capital Bikeshare</a> in Washington DC and corresponding weather data for 2011 and 2012.
    
The Capital Bikeshare has two different membership types: Annual and Daily.  These are represented in the data as casual and registered, with casual corresponding to daily memberships and registered to the annual memberships.

Each row in the dataset represents one day.  Columns include the day of week, month, season, weather situation, temperature, humidity, windspeed, the number of casual rides, registered rides, and the total number of rides.

<ol>
<h3>What can we determine from the data?</h3>
    <p>From an examination of the data, it was decided that several questions could be addressed:</p>
                <li>Does season affect bikeshare usage?</li>
                <li>Do ridership trends vary based on type of membership?</li>
                <li>Does weather affect bikeshare usage?</li>
                <li>What would be a valuable direction for further investigation?</li>
</ol>

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind
%matplotlib inline


In [None]:
plt.rcParams['figure.figsize'] = [20.0, 7.0]
plt.rcParams.update({'font.size': 22})

sns.set_style('whitegrid')
sns.set_context('talk')

In [None]:
df = pd.read_csv('../input/bike_sharing_daily.csv')

In [None]:
#view column names
df.columns

In [None]:
df.info()

In [None]:
#view summary stats of numeric variables
df.describe()

In [None]:
df.corr()

In [None]:
#rename columns
df = df.rename(columns={'dteday':'datetime',
                        'yr':'year',
                        'mnth':'month',
                        'weathersit':'weather',
                        'hum':'humidity',
                        'cnt':'total_rides'})

#set categorical variables
##why set as categories??
df['season'] = df['season'].astype('category')
df['year'] = df['year'].astype('category')
df['month'] = df['month'].astype('category')
df['holiday'] = df['holiday'].astype('category')
df['weekday'] = df['weekday'].astype('category')
df['workingday'] = df['workingday'].astype('category')
df['weather'] = df['weather'].astype('category')

In [None]:
df.head()

<h3>Does season affect bikeshare usage?</h3>
<h2><font color='green'>Yes.</font></h2>
<p>As we can see below season has an impact on bikeshare usage.  There are the least number of rides in winter and the greatest in summer.  The <strong>large T-Value of -20.41</strong> and the <strong>small P-Value of 2.12e-62</strong> tell us there is a significant difference between summer and winter ridership.  The differences in ridership between Spring and Fall however are not significant with a <strong>small T-value of 1.48</strong> and a <strong>large P-Value of 0.14.</strong></p>


In [None]:
print('Winter vs Spring')
print(ttest_ind(df.total_rides[df['season'] == 1], df.total_rides[df['season'] == 2]))
print('Winter vs Summer')
print(ttest_ind(df.total_rides[df['season'] == 1], df.total_rides[df['season'] == 3]))
print('Winter vs Fall')
print(ttest_ind(df.total_rides[df['season'] == 1], df.total_rides[df['season'] == 4]))
print('Spring vs Fall')
print(ttest_ind(df.total_rides[df['season'] == 2], df.total_rides[df['season'] == 4]))
print('Spring vs Summer')
print(ttest_ind(df.total_rides[df['season'] == 2], df.total_rides[df['season'] == 3]))
print('Summer vs Fall')
print(ttest_ind(df.total_rides[df['season'] == 3], df.total_rides[df['season'] == 4]))

In [None]:
fig, ax = plt.subplots()
sns.barplot(data=df[['season','total_rides']],
            x='season',
            y='total_rides',
            ax=ax)

plt.title('Capital Bikeshare Ridership by Season')
plt.ylabel('Total Rides')
plt.xlabel('Season')

tick_val=[0, 1, 2, 3]
tick_lab=['Winter', 'Spring', 'Summer', 'Fall']
plt.xticks(tick_val, tick_lab)

plt.show()

In [None]:
fig, ax = plt.subplots()
sns.barplot(data=df[['month','total_rides']], x='month', y='total_rides', ax=ax)

plt.title('Capital Bikeshare Ridership by Month')
plt.ylabel('Total Rides')
plt.xlabel('Month')

tick_val=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
tick_lab=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
plt.xticks(tick_val, tick_lab)

plt.show()

In [None]:
df['day_of_month'] = df.datetime.str[-2:]
df.head()

fig, ax = plt.subplots()
sns.pointplot(data=df[['day_of_month', 'total_rides', 'season']],
              x='day_of_month',
              y='total_rides',
              hue='season',
              ax=ax)

plt.title('Capital Bikeshare Ridership by Day')
plt.ylabel('Total Rides')
plt.xlabel('Day of Month')

leg_handles = ax.get_legend_handles_labels()[0]
ax.legend(leg_handles, ['Winter', 'Spring', 'Summer', 'Fall'], title='Season', bbox_to_anchor=(1, 1), loc=2)

plt.show()

<h2>Do ridership trends vary based on type of membership?</h2>
<h3><font color='green'>Yes.</font></h3>

<p>As we can see below registered riders take more trips than casual riders.  The <strong>large T-Value of 44.54</strong> tells us there is a significant difference between the two groups.  The <strong>very small P-Value of 2.54e-274</strong> tells us this is unlikely to have occurred by chance.</p>

In [None]:
ttest_ind(df['registered'], df['casual'])

In [None]:
fig = plt.subplot()
sns.boxplot(data=df[['total_rides', 'casual', 'registered']])

In [None]:
fig, ax = plt.subplots()
sns.pointplot(data=df[['month', 'casual', 'registered']],
              x='month',
              y='casual',
              ax=ax,
              color='orange')

sns.pointplot(data=df[['month', 'casual', 'registered']],
              x='month',
              y='registered',
              ax=ax,
              color='green')

tick_val=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
tick_lab=['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']
plt.xticks(tick_val, tick_lab)

plt.title('Casual and Registered Bikeshare Ridership by Month')
plt.ylabel('Total Rides')
plt.xlabel('Month')

plt.show()

<h2>Does weather affect bikeshare usage?</h2>
<h3><font color='green'>Yes.</font></h3>

<p>As we can see below, types of weather has a large impact on ridership.  There are significantly less rides during snow and thunderstorms than during periods of nicer weather.  We can also see that this trend holds up across all seasons.</p>

<h5>Types of Weather</h5>
<ol>
    <li>Clear, Few clouds, Partly cloudy, Partly cloudy</li>
<li>Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist</li>
<li>Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds</li>
</ol>

In [None]:
plt.rcParams['figure.figsize'] = [10.0, 10.0]
sns.set_context('talk', font_scale=0.8)

g = sns.FacetGrid(data=df,
               col='season',
               row='weather',hue='season')
g.map(plt.hist,'total_rides')

plt.subplots_adjust(top=0.9)
g.fig.suptitle('Capital Bikeshare Ridership by Weather Type')

g.set_xlabels('Total Rides')
g.set_ylabels('Frequency')

plt.show()

<h3>What would be a valuable direction for further investigation?</h3>

<ul><p>There are several questions that would be interesting to research further.  These include:</p>
    <li>Do these trend hold up in new data?  This data comes from 2011 and 2012.  It would be interesting to see if the patterns hold up in more recent years.</li>
    <li>Do major events affect usage? For example major events like the presidential inauguration or women's march.  And also smaller and more frequent events such as baseball or hockey games.</li>
    <li>How is duration of trips affected by the weather?</li>
    <li>What are the most popular stations?  Does weather have an effect on this?</li>
    <li>It would be interesting to compare to Uber or Lift.  For example comparing how weather affects each method of transportation.</li>
</ul>