How often do people use Toronto Bike Share together in pairs or groups? Where are they most likely to go? Does this tell us anything about tourist destinations in Toronto? 

Of these pair/group trips, how many are of mixed membership types? This would tell us the types of trips- a local showing a non-local around, two (or more!) locals travelling together, two visitors travelling together.

How far do people tend to bike?

In [1]:
import pandas as pd
from datetime import datetime
from datetime import timedelta
import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv("C:/Data/Projects/Toronto_Bikeshare/2016_Bike_Share_Toronto_Ridership_Q3.csv")
# Set index to be the trip id- is useful later
df.set_index("trip_id",inplace = True)
df = df.sort_index(ascending = True)
df.head(5)

Unnamed: 0_level_0,trip_start_time,trip_stop_time,trip_duration_seconds,from_station_name,to_station_name,user_type
trip_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
24008,7-1-16 0:00,7-1-16 0:08,505,College St W / Huron St,Queens Park / Bloor St W,Member
24009,7-1-16 0:00,7-1-16 0:10,603,Wellington St W / Bay St,King St W / Spadina Ave,Member
24010,7-1-16 0:00,7-1-16 0:42,2487,Bay St / Queens Quay W (Ferry Terminal),York St / Queens Quay W,Casual
24011,7-1-16 0:01,7-1-16 0:07,399,Trinity St /Front St E,Princess St / Adelaide St,Member
24012,7-1-16 0:01,7-1-16 0:12,662,Simcoe St / Queen St W,Queen St W / Spadina Ave,Member


In [3]:
def parse_date(data):
    dt = datetime.strptime(data, "%m-%d-%y %H:%M")
    parsed_date = datetime.strftime(dt, "%m-%d-%y")
    return parsed_date

def to_datetime(data):
    dt = datetime.strptime(data, "%m-%d-%y %H:%M")
    return dt

In [4]:
df["trip_date"] = df["trip_start_time"].apply(parse_date)
df["trip_start_datetime"] = df["trip_start_time"].apply(to_datetime)
df.dropna(inplace = True)
df["trip_duration_seconds"] = df["trip_duration_seconds"].astype(int)

In [5]:
# Logic: Same first and last station, and similar start time- start on the same day and within 60 seconds
same_first_station = (df["from_station_name"] == (df.shift(-1)["from_station_name"]))
same_last_station = (df["to_station_name"] == (df.shift(-1)["to_station_name"]))
similar_start = (df["trip_start_datetime"] <= ((df.shift(-1)["trip_start_datetime"])+timedelta(seconds = 60))) & \
                (df["trip_start_datetime"] <= ((df.shift(-1)["trip_start_datetime"])-timedelta(seconds = 60)))

In [6]:
# Get index of the first trip in the potential pair/group
# Add one to these indices to get all the group trip id's
first_in_group_index = df[same_first_station & same_last_station & similar_start].index
add_to_index = first_in_group_index + 1
# Get all group trips before splitting up by membership type
group_trips = df.ix[add_to_index | first_in_group_index]
group_trips.dropna(inplace =  True)
keep_cols = ["from_station_name", "to_station_name", "user_type", "trip_date", "trip_duration_seconds"]
group_trips = group_trips[keep_cols]

In [7]:
# Look at membership type
# Mixed group
mixed_type_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) & 
            (group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
            (group_trips["user_type"] != group_trips.shift(-1)["user_type"])].index
mixed_trips = df.ix[(mixed_type_index+1) | mixed_type_index]
mixed_trips_index = mixed_trips.index
# Members only
member_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) & 
            (group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
            (group_trips["user_type"] == "Member") & 
            (group_trips.shift(-1)["user_type"] == "Member")].index
member_trips = df.ix[(member_index+1) | member_index]
member_trips_index = member_trips.index
# Casual only
casual_index = group_trips[(group_trips["from_station_name"] == group_trips.shift(-1)["from_station_name"]) & 
            (group_trips["to_station_name"] == group_trips.shift(-1)["to_station_name"]) &
            (group_trips["user_type"] == "Casual") & 
            (group_trips.shift(-1)["user_type"] == "Casual")].index
casual_trips = df.ix[(casual_index+1) | casual_index]
casual_trips_index = casual_trips.index

In [8]:
# group_trips dataframe for some reason has some single trips- 
# filter the group_trip df by those that are in the member_trip, casual_trip, or mixed_trip df.
# This gets rid of the single line cases.
group_trips = group_trips.ix[casual_trips_index | member_trips_index | mixed_trips_index]

In [9]:
# Trip numbers
num_group_trips = len(group_trips)
num_mixed_trips = len(mixed_trips)
num_member_trips = len(member_trips)
num_casual_trips = len(casual_trips)
total_trips = len(df)
# Percentages
pct_group_trips = round(100*num_group_trips/total_trips)
pct_mixed_trips = round(100*num_mixed_trips/num_group_trips)
pct_member_trips = round(100*num_member_trips/num_group_trips)
pct_casual_trips = round(100*num_casual_trips/num_group_trips)
# Print summary statistics
print("There were {num_group_trips} group trips during Q3 2016, {pct_group_trips} percent of the total {total_trips} trips.".
     format(num_group_trips = num_group_trips, 
            pct_group_trips = pct_group_trips,
           total_trips = total_trips))
print("Of the {num_group_trips} group trips, {pct_mixed_trips} percent of these ({num_mixed_trips} trips) are of mixed membership type.".
     format(num_group_trips = num_group_trips,
           pct_mixed_trips = pct_mixed_trips,
           num_mixed_trips =  num_mixed_trips))
print("{pct_member_trips} percent ({num_member_trips} trips) are of the membership type.".
     format(pct_member_trips = pct_member_trips,
           num_member_trips =  num_member_trips))
print("{pct_casual_trips} percent or ({num_casual_trips} trips) are of the casual type.".
     format(pct_casual_trips = pct_casual_trips,
           num_casual_trips =  num_casual_trips))

There were 5942 group trips during Q3 2016, 2 percent of the total 367957 trips.
Of the 5942 group trips, 4 percent of these (242 trips) are of mixed membership type.
12 percent (724 trips) are of the membership type.
84 percent or (4977 trips) are of the casual type.


In [10]:
# How long is the average trip?
avg_mixed = round(mixed_trips["trip_duration_seconds"].agg("mean")/60)
avg_member = round(member_trips["trip_duration_seconds"].agg("mean")/60)
avg_casual = round(casual_trips["trip_duration_seconds"].agg("mean")/60)
print("The average mixed trip is {avg_mixed} minutes, \
while the average member trip is {avg_member} minutes. \
The average casual trip is {avg_casual} minutes.".
format(avg_casual = avg_casual, avg_member = avg_member, avg_mixed = avg_mixed))

The average mixed trip is 17 minutes, while the average member trip is 14 minutes. The average casual trip is 32 minutes.


In [11]:
# Where are the most popular starting and ending points? 
def make_trip_df(data, input_col, output_col):
    # Get all the stations
    stations = data[input_col].value_counts().index.tolist()
    # Get number of trips
    values = data[input_col].value_counts().tolist()
    # Create dataframe of just stations and their total number of trips
    # Set index to the station name
    trip_df = pd.DataFrame({"station": stations, output_col: values})
    trip_df.set_index("station", inplace = True)
    return trip_df

mixed_departures = make_trip_df(mixed_trips, "from_station_name", "total_departures")
member_departures = make_trip_df(member_trips, "from_station_name", "total_departures")
casual_departures = make_trip_df(casual_trips, "from_station_name", "total_departures")
mixed_arrivals = make_trip_df(mixed_trips, "to_station_name", "total_arrivals")
member_arrivals = make_trip_df(member_trips, "to_station_name", "total_arrivals")
casual_arrivals = make_trip_df(casual_trips, "to_station_name", "total_arrivals")

In [12]:
from IPython.display import display_html
def display_side_by_side(*args):
    html_str=''
    for df in args:
        html_str+=df.to_html()
    display_html(html_str.replace('table','table style="display:inline"'),raw=True)

In [13]:
display_side_by_side(mixed_departures.head(5), mixed_arrivals.head(5))

Unnamed: 0_level_0,total_departures
station,Unnamed: 1_level_1
Elizabeth St / Edward St (Bus Terminal),8
Strachan Ave / Princes' Blvd,8
York St / Queens Quay W,8
Madison Ave / Bloor St W,6
Augusta Ave / Denison Sq,6

Unnamed: 0_level_0,total_arrivals
station,Unnamed: 1_level_1
Queen St W / Ossington Ave,8
Dundas St / Yonge St,8
York St / Queens Quay W,6
161 Bleecker St (South of Wellesley),6
Bay St / Queens Quay W (Ferry Terminal),6


In [14]:
display_side_by_side(casual_departures.head(5), casual_arrivals.head(5))

Unnamed: 0_level_0,total_departures
station,Unnamed: 1_level_1
York St / Queens Quay W,194
Bay St / Queens Quay W (Ferry Terminal),186
HTO Park (Queen's Quay W),154
Bremner Blvd / Rees St,136
Queens Quay W / Lower Simcoe St,130

Unnamed: 0_level_0,total_arrivals
station,Unnamed: 1_level_1
Bay St / Queens Quay W (Ferry Terminal),190
York St / Queens Quay W,159
HTO Park (Queen's Quay W),148
Queens Quay W / Lower Simcoe St,137
Dockside Dr / Queens Quay E (Sugar Beach),113


In [15]:
display_side_by_side(member_departures.head(5), member_arrivals.head(5))

Unnamed: 0_level_0,total_departures
station,Unnamed: 1_level_1
Dockside Dr / Queens Quay E (Sugar Beach),20
King St W / Douro St,14
College St W / Markham St,14
Queen St W / Portland St,14
University Ave / King St W,14

Unnamed: 0_level_0,total_arrivals
station,Unnamed: 1_level_1
Princess St / Adelaide St,20
Beverly St / Dundas St W,20
Euclid Ave / Bloor St W,16
University Ave / Elm St,16
Queen St W / Portland St,14


For mixed trips, Young-Dundas Square, the waterfront, and Queen Street West are all popular. For casual trips, more people seem to enjoy biking along the waterfront. For member trips, it's a bit more residential, which seems to imply that member group trips in the summer are more done by people who live in Toronto and are trying going around with their friend(s) in their neighbourhood, not so much for sightseeing purposes.