# Bicycle Sharing in Chicago

<p align="center">
  <img src="https://d21xlh2maitm24.cloudfront.net/chi/DivvyLogo_p_v2.svg?mtime=20170608140727"/>
</p>

Divvy is Chicagoland’s bike share system (in collaboration with Chicago Department of Transportation), with 6,000 bikes available at 570+ stations across Chicago and Evanston. Divvy provides residents and visitors with a convenient, fun and affordable transportation option for getting around and exploring Chicago.

Divvy, like other bike share systems, consists of a fleet of specially designed, sturdy and durable bikes that are locked into a network of docking stations throughout the region. The bikes can be unlocked from one station and returned to any other station in the system. People use bike share to explore Chicago, commute to work or school, run errands, get to appointments or social engagements, and more.

Divvy is available for use 24 hours/day, 7 days/week, 365 days/year, and riders have access to all bikes and stations across the system.

**Through this kernel**, we extract some basic statistics about bike sharing in the city of chicago over the years like - Who's using this services, when is it at its peak demand etc.

### First things first: Get the library imports

In [None]:
import gc
import warnings
warnings.filterwarnings("ignore")

import pandas as pd
import numpy as np
from IPython.display import display
from IPython.core.display import HTML
import plotly.plotly as py
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)
import plotly.graph_objs as go
import matplotlib.pyplot as plt
import matplotlib
#matplotlib.rc['font.size'] = 9.0
matplotlib.rc('font', size=20)
matplotlib.rc('axes', titlesize=20)
matplotlib.rc('axes', labelsize=20)
matplotlib.rc('xtick', labelsize=20)
matplotlib.rc('ytick', labelsize=20)
matplotlib.rc('legend', fontsize=20)
matplotlib.rc('figure', titlesize=20)
import seaborn as sns

%matplotlib inline

# Handling a large CSV

Since the input for this data `(data.csv)` is an extremely large CSV file `(1.9 GB)`, I'm going to **randomly sample 1 million samples** out of it for the purpose of this kernel.

In [None]:
import subprocess
#from https://stackoverflow.com/questions/845058/how-to-get-line-count-cheaply-in-python , Olafur's answer
def file_len(fname):
    p = subprocess.Popen(['wc', '-l', fname], stdout=subprocess.PIPE, 
                                              stderr=subprocess.PIPE)
    result, err = p.communicate()
    if p.returncode != 0:
        raise IOError(err)
    return int(result.strip().split()[0])

lines = file_len('../input/data.csv')
print('Number of lines in "train.csv" is:', lines)

In [None]:
skiplines = np.random.choice(np.arange(1, lines), size=lines-1-1000000, replace=False)
skiplines=np.sort(skiplines)
print('lines to skip:', len(skiplines))

data = pd.read_csv("../input/data.csv", skiprows=skiplines)

Check for any missing values & it is always a good idea to get a sneak peak into your data right from the beginning!

In [None]:
data.sample(5)

In [None]:
data.isnull().sum(0)

**It seems there are no missing values in our randomly sampled population!**

In [None]:
# Just a helper module to make visualizations more intuitive
num_to_month={
    1:"Jan",
    2:"Feb",
    3:"Mar",
    4:"Apr",
    5:"May",
    6:"June",
    7:"July",
    8:"Aug",
    9:"Sept",
    10:"Oct",
    11:"Nov",
    12:"Dec"
}
data['month'] = data.month.apply(lambda x: num_to_month[x])

In [None]:
gc.collect()

## Ridership Over the last few Years

We can observe an increasing trend!

In [None]:
pivot = data.pivot_table(index='year', columns='month', values='day', aggfunc=len)
colors = ["#8B8B00", "#8B7E66", "#EE82EE", "#00C78C", 
          "#00E5EE", "#FF6347", "#EED2EE", 
          "#63B8FF", "#00FF7F", "#B9D3EE", 
          "#836FFF", "#7D26CD"]
pivot.loc[:,['Jan','Feb', 'Mar',
            'Apr','May','June',
            'July','Aug','Sept',
            'Oct','Nov','Dec']].plot.bar(stacked=True, figsize=(20,10), color=colors)
plt.xlabel("Years")
plt.ylabel("Ridership")
plt.legend(loc=10)
plt.show()

Apart from a gradually increasing tred, during peak months of **June, July, Aug & Sept** ridership seems to be significantly higher than in Holiday months of **Dec & Jan**.

# Moving on to some other basic statistics

There are 2 main types of Riders that use the sharing service -

1. Subscriber: The members with an annual pass.
2. Customer: That use the service in the form a 'daily-dip' with a *24 Hour* Pass.

In [None]:
f, ax = plt.subplots(1,2, figsize=(20,7))
colors = ['#66b3ff','#ff9999']
pie = ax[0].pie(list(data['gender'].value_counts()), 
                   labels=list(data.gender.unique()),
                  autopct='%1.1f%%', shadow=True, startangle=90, colors=colors)
count = sns.countplot(x='usertype', data=data, ax=ax[1], color='g', alpha=0.75)
ax[0].set_title("Gender Distribution in Ridership")
ax[1].set_xlabel("Type of Rider")
ax[1].set_ylabel("Ridership")
ax[1].set_title("Type of Customers")

In [None]:
data.usertype.value_counts()

As it turns out, there are extremely low number of **Customer** or **Dependent** type of riders. It seems most of the riders **prefer annual membership**.

# Geolocation information of Sharing Stations

Using the mapbox service within plotly, we can plot the **649 unique starting stations** given in the dataset in the following manner.

In [None]:
station_info = data[['from_station_name','latitude_start','longitude_start']].drop_duplicates(subset='from_station_name')

In [None]:
station_info.sample(5)

In [None]:
lat_list = list(station_info.latitude_start)
lat_list = [str(i) for i in lat_list]
lon_list = list(station_info.longitude_start)
lon_list = [str(i) for i in lon_list]
names = list(station_info.from_station_name)

In [None]:
display(HTML("""
<div>
    <a href="https://plot.ly/~sominw/6/?share_key=y6irxkKqSVolnuF0l4w420" target="_blank" title="Chicago Cycle Sharing Stations" style="display: block; text-align: center;"><img src="https://plot.ly/~sominw/6.png?share_key=y6irxkKqSVolnuF0l4w420" alt="Chicago Cycle Sharing Stations" style="max-width: 100%;width: 600px;"  width="600" onerror="this.onerror=null;this.src='https://plot.ly/404.png';" /></a>
    <script data-plotly="sominw:6" sharekey-plotly="y6irxkKqSVolnuF0l4w420" src="https://plot.ly/embed.js" async></script>
</div>"""))