Divvy bikes is your average bike share company in Chicago, Illinois. Recently, there was a desire to increase the number of bikes in their fleet. This has led to an in-depth look at all factors that could influence the need for more bikes. Seeing how divvy is both customer and subscriber based, I propose switching over to a solely customer-based system to increase ride times, which will ultimately create a need for more bikes.

In order to track this data, each ride will need to be monitored and recorded. We will be monitoring 6,000 rides of both customers and subscribers through the peak summer season. Through this, there shouldbe enough data to get an accurate duration average.

In [24]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
from scipy import stats
%matplotlib inline

In [25]:
divvy = pd.read_csv('/Users/juliantheriot/downloads/divvy-bike-chicago-2018/Divvy_Trips_2018_Q3.csv', encoding='unicode_escape')
divvy.head()

Unnamed: 0,trip_id,start_time,end_time,bikeid,tripduration,from_station_id,from_station_name,to_station_id,to_station_name,usertype,gender,birthyear
0,19244622,2018-07-01 00:00:03,2018-07-01 23:56:11,5429,86168.0,140,Dearborn Pkwy & Delaware Pl,106,State St & Pearson St,Customer,,
1,19244623,2018-07-01 00:00:13,2018-07-01 00:06:39,93,386.0,153,Southport Ave & Wellington Ave,250,Ashland Ave & Wellington Ave,Subscriber,Male,1986.0
2,19244624,2018-07-01 00:00:15,2018-07-01 00:23:26,2461,1391.0,76,Lake Shore Dr & Monroe St,301,Clark St & Schiller St,Subscriber,Female,1987.0
3,19244625,2018-07-01 00:00:25,2018-07-01 00:23:31,2991,1386.0,76,Lake Shore Dr & Monroe St,301,Clark St & Schiller St,Subscriber,Male,1986.0
4,19244626,2018-07-01 00:00:27,2018-07-01 00:11:23,2851,656.0,60,Dayton St & North Ave,166,Ashland Ave & Wrightwood Ave,Subscriber,Male,1961.0


In [26]:
divvy.drop(columns=['trip_id','bikeid','from_station_id','from_station_name','to_station_id','to_station_name'], inplace=True)
divvy.head()

Unnamed: 0,start_time,end_time,tripduration,usertype,gender,birthyear
0,2018-07-01 00:00:03,2018-07-01 23:56:11,86168.0,Customer,,
1,2018-07-01 00:00:13,2018-07-01 00:06:39,386.0,Subscriber,Male,1986.0
2,2018-07-01 00:00:15,2018-07-01 00:23:26,1391.0,Subscriber,Female,1987.0
3,2018-07-01 00:00:25,2018-07-01 00:23:31,1386.0,Subscriber,Male,1986.0
4,2018-07-01 00:00:27,2018-07-01 00:11:23,656.0,Subscriber,Male,1961.0


In [27]:
def is_a_string(x):
    return str(x).isnumeric()
print(divvy.apply(is_a_string))

start_time      False
end_time        False
tripduration    False
usertype        False
gender          False
birthyear       False
dtype: bool


In [28]:
def is_a_string(x):
    return str(x).isalpha()
print(divvy.apply(is_a_string))

start_time      False
end_time        False
tripduration    False
usertype        False
gender          False
birthyear       False
dtype: bool


In [29]:
print(divvy.isnull().sum())

start_time           0
end_time             0
tripduration         0
usertype             0
gender          294996
birthyear       291580
dtype: int64


In [30]:
divvy.drop(columns=['gender', 'birthyear'], inplace=True)
divvy.head()

Unnamed: 0,start_time,end_time,tripduration,usertype
0,2018-07-01 00:00:03,2018-07-01 23:56:11,86168.0,Customer
1,2018-07-01 00:00:13,2018-07-01 00:06:39,386.0,Subscriber
2,2018-07-01 00:00:15,2018-07-01 00:23:26,1391.0,Subscriber
3,2018-07-01 00:00:25,2018-07-01 00:23:31,1386.0,Subscriber
4,2018-07-01 00:00:27,2018-07-01 00:11:23,656.0,Subscriber


In [31]:
divvy.describe()

Unnamed: 0,start_time,end_time,tripduration,usertype
count,1513570,1513570,1513570.0,1513570
unique,1282162,1230636,15782.0,2
top,2018-07-10 17:33:47,2018-09-14 16:47:38,389.0,Subscriber
freq,10,9,1536.0,1140637


In [32]:
divvy['usertype'].value_counts()

Subscriber    1140637
Customer       372933
Name: usertype, dtype: int64

In [33]:
divvy['tripduration'].value_counts()

389.0        1536
408.0        1508
411.0        1501
417.0        1476
365.0        1476
321.0        1475
378.0        1473
319.0        1471
306.0        1471
358.0        1469
398.0        1468
335.0        1464
315.0        1463
380.0        1462
392.0        1460
357.0        1460
418.0        1459
345.0        1458
351.0        1457
394.0        1456
415.0        1454
386.0        1454
375.0        1453
456.0        1452
446.0        1452
421.0        1450
397.0        1449
399.0        1448
377.0        1448
387.0        1447
             ... 
17,841.0        1
14,026.0        1
368,190.0       1
20,759.0        1
9,470.0         1
22,902.0        1
15,896.0        1
15,039.0        1
43,171.0        1
17,822.0        1
14,271.0        1
211,367.0       1
11,164.0        1
15,960.0        1
12,421.0        1
80,838.0        1
17,011.0        1
178,025.0       1
12,833.0        1
31,719.0        1
17,848.0        1
17,511.0        1
43,881.0        1
57,883.0        1
22,354.0  

In [34]:
divvy.describe()

Unnamed: 0,start_time,end_time,tripduration,usertype
count,1513570,1513570,1513570.0,1513570
unique,1282162,1230636,15782.0,2
top,2018-07-10 17:33:47,2018-09-14 16:47:38,389.0,Subscriber
freq,10,9,1536.0,1140637


In [35]:
divvy.drop([0])

Unnamed: 0,start_time,end_time,tripduration,usertype
1,2018-07-01 00:00:13,2018-07-01 00:06:39,386.0,Subscriber
2,2018-07-01 00:00:15,2018-07-01 00:23:26,1391.0,Subscriber
3,2018-07-01 00:00:25,2018-07-01 00:23:31,1386.0,Subscriber
4,2018-07-01 00:00:27,2018-07-01 00:11:23,656.0,Subscriber
5,2018-07-01 00:00:35,2018-07-01 00:16:09,934.0,Subscriber
6,2018-07-01 00:00:37,2018-07-01 00:10:14,577.0,Customer
7,2018-07-01 00:00:55,2018-07-01 00:09:20,505.0,Customer
8,2018-07-01 00:01:38,2018-07-01 00:25:25,1427.0,Customer
9,2018-07-01 00:01:44,2018-07-01 00:25:25,1421.0,Customer
10,2018-07-01 00:02:03,2018-07-01 00:35:21,1998.0,Customer


In [36]:
divvy['tripduration'] = divvy['tripduration'].str.replace(',', '').astype('float32')
divvy.head()

Unnamed: 0,start_time,end_time,tripduration,usertype
0,2018-07-01 00:00:03,2018-07-01 23:56:11,86168.0,Customer
1,2018-07-01 00:00:13,2018-07-01 00:06:39,386.0,Subscriber
2,2018-07-01 00:00:15,2018-07-01 00:23:26,1391.0,Subscriber
3,2018-07-01 00:00:25,2018-07-01 00:23:31,1386.0,Subscriber
4,2018-07-01 00:00:27,2018-07-01 00:11:23,656.0,Subscriber


In [49]:
cust = divvy[divvy['usertype'] == 'Customer'][:6000]
sub = divvy[divvy['usertype'] == 'Subscriber'][:6000]

In [50]:
cust.mean()

tripduration    4756.328613
dtype: float32

In [51]:
sub.mean()

tripduration    1018.899475
dtype: float32

With the averages rolling in, it would seem switching to a customer only based model could be more lucrative. However, to achieve even more accurate averages, it would be in the companies advantage to no only take a larger sample from each of their quarters, but also breaking rides down to certain areas.