The purpose of this notebook is to aggregate the rideshare data into four dataframes, one for each year. The data from Toronto Open Data has been saved into quarterly csvs for 2017, 2018, and 2019, and monthly csvs for 2020. 

I'll start by importing the necessary packages for dataframe manipulation.

In [1]:
import pandas as pd
import numpy as np

# 2017 Data 

I'll read in the four quarters of data with pd.read_csv to four dataframes.

In [2]:
df_2017_1 = pd.read_csv('Desktop/2017_bikeshare/Bikeshare Ridership (2017 Q1).csv')
df_2017_2 = pd.read_csv('Desktop/2017_bikeshare/Bikeshare Ridership (2017 Q2).csv')
df_2017_3 = pd.read_csv('Desktop/2017_bikeshare/Bikeshare Ridership (2017 Q3).csv')
df_2017_4 = pd.read_csv('Desktop/2017_bikeshare/Bikeshare Ridership (2017 Q4).csv')

Let's quickly check out what the data looks like.

In [3]:
df_2017_1.head(5)

Unnamed: 0,trip_id,trip_start_time,trip_stop_time,trip_duration_seconds,from_station_id,from_station_name,to_station_id,to_station_name,user_type
0,712382,1/1/2017 0:00,1/1/2017 0:03,223,7051,Wellesley St E / Yonge St Green P,7089,Church St / Wood St,Member
1,712383,1/1/2017 0:00,1/1/2017 0:05,279,7143,Kendal Ave / Bernard Ave,7154,Bathurst Subway Station,Member
2,712384,1/1/2017 0:05,1/1/2017 0:29,1394,7113,Parliament St / Aberdeen Ave,7199,College St W / Markham St,Member
3,712385,1/1/2017 0:07,1/1/2017 0:21,826,7077,College Park South,7010,King St W / Spadina Ave,Member
4,712386,1/1/2017 0:08,1/1/2017 0:12,279,7079,McGill St / Church St,7047,University Ave / Gerrard St W,Member


Now let's concatenate the data along axis=0 (downwards) using pandas concat.

In [7]:
df_2017 = pd.concat([df_2017_1, df_2017_2], axis=0)

In [10]:
df_2017 = pd.concat([df_2017, df_2017_3], axis=0)

In [12]:
df_2017 = pd.concat([df_2017, df_2017_4], axis=0)

Let's check out the full dataframe now.

In [13]:
df_2017

Unnamed: 0,trip_id,trip_start_time,trip_stop_time,trip_duration_seconds,from_station_id,from_station_name,to_station_id,to_station_name,user_type
0,712382,1/1/2017 0:00,1/1/2017 0:03,223,7051.0,Wellesley St E / Yonge St Green P,7089.0,Church St / Wood St,Member
1,712383,1/1/2017 0:00,1/1/2017 0:05,279,7143.0,Kendal Ave / Bernard Ave,7154.0,Bathurst Subway Station,Member
2,712384,1/1/2017 0:05,1/1/2017 0:29,1394,7113.0,Parliament St / Aberdeen Ave,7199.0,College St W / Markham St,Member
3,712385,1/1/2017 0:07,1/1/2017 0:21,826,7077.0,College Park South,7010.0,King St W / Spadina Ave,Member
4,712386,1/1/2017 0:08,1/1/2017 0:12,279,7079.0,McGill St / Church St,7047.0,University Ave / Gerrard St W,Member
...,...,...,...,...,...,...,...,...,...
363400,2383642,12/31/17 23:46:27,12/31/17 23:46:53,26,,Bloor St / Brunswick Ave,,Bloor St / Brunswick Ave,Casual
363401,2383643,12/31/17 23:47:13,1/01/18 00:11:40,1467,,Bloor St / Brunswick Ave,,HTO Park (Queens Quay W),Casual
363402,2383644,12/31/17 23:47:40,12/31/17 23:57:49,609,,Kendal Ave / Spadina Rd,,Augusta Ave / Denison Sq,Member
363403,2383645,12/31/17 23:49:08,12/31/17 23:49:34,26,,Phoebe St / Spadina Ave,,Phoebe St / Spadina Ave,Member


Now the rides from 2017 have been fully concatenated into a single DataFrame. There were 1,492,369 rides in 2017. We have some null values but we can clean this up later. 

Let's save this dataframe to a new csv file.

In [15]:
df_2017.to_csv('Desktop/df_2017.csv')

---------

# 2018 Data

Let's read in the quarterly csvs to data frames.

In [18]:
df_2018_1 = pd.read_csv('Desktop/2018_bikeshare/Bike Share Toronto Ridership_Q1 2018.csv')
df_2018_2 = pd.read_csv('Desktop/2018_bikeshare/Bike Share Toronto Ridership_Q2 2018.csv')
df_2018_3 = pd.read_csv('Desktop/2018_bikeshare/Bike Share Toronto Ridership_Q3 2018.csv')
df_2018_4 = pd.read_csv('Desktop/2018_bikeshare/Bike Share Toronto Ridership_Q4 2018.csv')

Quickly checking one out....

In [19]:
df_2018_1.head(5)

Unnamed: 0,trip_id,trip_duration_seconds,from_station_id,trip_start_time,from_station_name,trip_stop_time,to_station_id,to_station_name,user_type
0,2383648,393,7018,1/1/2018 0:47,Bremner Blvd / Rees St,1/1/2018 0:54,7176,Bathurst St / Fort York Blvd,Annual Member
1,2383649,625,7184,1/1/2018 0:52,Ossington Ave / College St,1/1/2018 1:03,7191,Central Tech (Harbord St),Annual Member
2,2383650,233,7235,1/1/2018 0:55,Bay St / College St (West Side) - SMART,1/1/2018 0:59,7021,Bay St / Albert St,Annual Member
3,2383651,1138,7202,1/1/2018 0:57,Queen St W / York St (City Hall),1/1/2018 1:16,7020,Phoebe St / Spadina Ave,Annual Member
4,2383652,703,7004,1/1/2018 1:00,University Ave / Elm St,1/1/2018 1:12,7060,Princess St / Adelaide St E,Annual Member


Now the concatenation...

In [20]:
df_2018 = pd.concat([df_2018_1, df_2018_2], axis=0)

In [22]:
df_2018 = pd.concat([df_2018, df_2018_3], axis=0)

In [23]:
df_2018 = pd.concat([df_2018, df_2018_4], axis=0)

Let's inspect the final DataFrame.

In [24]:
df_2018

Unnamed: 0,trip_id,trip_duration_seconds,from_station_id,trip_start_time,from_station_name,trip_stop_time,to_station_id,to_station_name,user_type
0,2383648,393,7018,1/1/2018 0:47,Bremner Blvd / Rees St,1/1/2018 0:54,7176,Bathurst St / Fort York Blvd,Annual Member
1,2383649,625,7184,1/1/2018 0:52,Ossington Ave / College St,1/1/2018 1:03,7191,Central Tech (Harbord St),Annual Member
2,2383650,233,7235,1/1/2018 0:55,Bay St / College St (West Side) - SMART,1/1/2018 0:59,7021,Bay St / Albert St,Annual Member
3,2383651,1138,7202,1/1/2018 0:57,Queen St W / York St (City Hall),1/1/2018 1:16,7020,Phoebe St / Spadina Ave,Annual Member
4,2383652,703,7004,1/1/2018 1:00,University Ave / Elm St,1/1/2018 1:12,7060,Princess St / Adelaide St E,Annual Member
...,...,...,...,...,...,...,...,...,...
363485,4581273,379,7088,12/31/2018 23:43,Danforth Ave / Coxwell Ave,12/31/2018 23:49,7091,Donlands Station,Annual Member
363486,4581274,306,7030,12/31/2018 23:45,Bay St / Wellesley St W,12/31/2018 23:50,7031,Jarvis St / Isabella St,Annual Member
363487,4581275,340,7020,12/31/2018 23:49,Phoebe St / Spadina Ave,12/31/2018 23:55,7000,Fort York Blvd / Capreol Ct,Annual Member
363488,4581276,1466,7014,12/31/2018 23:52,Sherbourne St / Carlton St (Allan Gardens),1/1/2019 0:17,7269,Toronto Eaton Centre (Yonge St),Annual Member


Now the data from 2018 has been fully concatenated. There were 1,922,955 rides in 2018.

In [25]:
# saving to csv
df_2018.to_csv('Desktop/df_2018.csv')

------------------------

# 2019 Data

In [26]:
# again reading in the csv files
df_2019_1 = pd.read_csv('Desktop/2019_bikeshare/2019-Q1.csv')
df_2019_2 = pd.read_csv('Desktop/2019_bikeshare/2019-Q2.csv')
df_2019_3 = pd.read_csv('Desktop/2019_bikeshare/2019-Q3.csv')
df_2019_4 = pd.read_csv('Desktop/2019_bikeshare/2019-Q4.csv')

In [27]:
# checking out one of the dataframes
df_2019_1.head(5)

Unnamed: 0,Trip Id,Trip Duration,Start Station Id,Start Time,Start Station Name,End Station Id,End Time,End Station Name,Bike Id,User Type
0,4581278,1547.0,7021,01/01/2019 00:08,Bay St / Albert St,7233,01/01/2019 00:33,King / Cowan Ave - SMART,1296,Annual Member
1,4581279,1112.0,7160,01/01/2019 00:10,King St W / Tecumseth St,7051,01/01/2019 00:29,Wellesley St E / Yonge St (Green P),2947,Annual Member
2,4581280,589.0,7055,01/01/2019 00:15,Jarvis St / Carlton St,7013,01/01/2019 00:25,Scott St / The Esplanade,2293,Annual Member
3,4581281,259.0,7012,01/01/2019 00:16,Elizabeth St / Edward St (Bus Terminal),7235,01/01/2019 00:20,Bay St / College St (West Side) - SMART,283,Annual Member
4,4581282,281.0,7041,01/01/2019 00:19,Edward St / Yonge St,7257,01/01/2019 00:24,Dundas St W / St. Patrick St,1799,Annual Member


Concatenating the same as with the previous DataFrames...

In [28]:
df_2019 = pd.concat([df_2019_1, df_2019_2], axis=0)

In [29]:
df_2019 = pd.concat([df_2019, df_2019_3], axis=0)

In [30]:
df_2019 = pd.concat([df_2019, df_2019_4], axis=0)

In [31]:
# Inspecting the final DataFrame....
df_2019

Unnamed: 0,Trip Id,Trip Duration,Start Station Id,Start Time,Start Station Name,End Station Id,End Time,End Station Name,Bike Id,User Type
0,4581278,1547.0,7021,01/01/2019 00:08,Bay St / Albert St,7233.0,01/01/2019 00:33,King / Cowan Ave - SMART,1296,Annual Member
1,4581279,1112.0,7160,01/01/2019 00:10,King St W / Tecumseth St,7051.0,01/01/2019 00:29,Wellesley St E / Yonge St (Green P),2947,Annual Member
2,4581280,589.0,7055,01/01/2019 00:15,Jarvis St / Carlton St,7013.0,01/01/2019 00:25,Scott St / The Esplanade,2293,Annual Member
3,4581281,259.0,7012,01/01/2019 00:16,Elizabeth St / Edward St (Bus Terminal),7235.0,01/01/2019 00:20,Bay St / College St (West Side) - SMART,283,Annual Member
4,4581282,281.0,7041,01/01/2019 00:19,Edward St / Yonge St,7257.0,01/01/2019 00:24,Dundas St W / St. Patrick St,1799,Annual Member
...,...,...,...,...,...,...,...,...,...,...
468411,7334123,523.0,7098,12/31/2019 23:39,Riverdale Park South (Broadview Ave),7339.0,12/31/2019 23:48,Carlaw Ave / Strathcona Ave,861,Annual Member
468412,7334124,273.0,7044,12/31/2019 23:45,Church St / Alexander St,7273.0,12/31/2019 23:49,Bay St / Charles St - SMART,3776,Annual Member
468413,7334125,1055.0,7100,12/31/2019 23:51,Dundas St E / Regent Park Blvd,7100.0,01/01/2020 00:08,Dundas St E / Regent Park Blvd,2382,Annual Member
468414,7334126,459.0,7470,12/31/2019 23:55,York St / Lake Shore Blvd W,7102.0,01/01/2020 00:03,Nelson St / Duncan St,2800,Annual Member


2,439,517 trips in 2019. Cool! Now I'll save to csv...

In [33]:
df_2019.to_csv('Desktop/df_2019.csv')

----------------

# 2020 Data

For the 2020 data it was saved into monthly csvs, so I'll have to read in more files this time.

In [34]:
df_2020_1 = pd.read_csv('Desktop/2020_bikeshare/2020-01.csv')
df_2020_2 = pd.read_csv('Desktop/2020_bikeshare/2020-02.csv')
df_2020_3 = pd.read_csv('Desktop/2020_bikeshare/2020-03.csv')
df_2020_4 = pd.read_csv('Desktop/2020_bikeshare/2020-04.csv')
df_2020_5 = pd.read_csv('Desktop/2020_bikeshare/2020-05.csv')
df_2020_6 = pd.read_csv('Desktop/2020_bikeshare/2020-06.csv')
df_2020_7 = pd.read_csv('Desktop/2020_bikeshare/2020-07.csv')
df_2020_8 = pd.read_csv('Desktop/2020_bikeshare/2020-08.csv')
df_2020_9 = pd.read_csv('Desktop/2020_bikeshare/2020-09.csv')
df_2020_10 = pd.read_csv('Desktop/2020_bikeshare/2020-10.csv')
df_2020_11 = pd.read_csv('Desktop/2020_bikeshare/2020-11.csv')
df_2020_12 = pd.read_csv('Desktop/2020_bikeshare/2020-12.csv')

And now concatenating....

In [35]:
df_2020 = pd.concat([df_2020_1, df_2020_2], axis=0)

In [36]:
df_2020 = pd.concat([df_2020, df_2020_3], axis=0)

In [37]:
df_2020 = pd.concat([df_2020, df_2020_4], axis=0)

In [38]:
df_2020 = pd.concat([df_2020, df_2020_5], axis=0)

In [39]:
df_2020 = pd.concat([df_2020, df_2020_6], axis=0)

In [40]:
df_2020 = pd.concat([df_2020, df_2020_7], axis=0)

In [41]:
df_2020 = pd.concat([df_2020, df_2020_8], axis=0)

In [42]:
df_2020 = pd.concat([df_2020, df_2020_9], axis=0)

In [43]:
df_2020 = pd.concat([df_2020, df_2020_10], axis=0)

In [44]:
df_2020 = pd.concat([df_2020, df_2020_11], axis=0)

In [45]:
df_2020 = pd.concat([df_2020, df_2020_12], axis=0)

Let's check out the final DataFrame.

In [46]:
df_2020

Unnamed: 0,Trip Id,Trip Duration,Start Station Id,Start Time,Start Station Name,End Station Id,End Time,End Station Name,Bike Id,User Type
0,7334128,648,7003,01/01/2020 00:08,Madison Ave / Bloor St W,7271.0,01/01/2020 00:19,Yonge St / Alexander St - SMART,3104,Annual Member
1,7334129,419,7007,01/01/2020 00:10,College St / Huron St,7163.0,01/01/2020 00:17,Yonge St / Wood St,2126,Annual Member
2,7334130,566,7113,01/01/2020 00:13,Parliament St / Aberdeen Ave,7108.0,01/01/2020 00:22,Front St E / Cherry St,4425,Annual Member
3,7334131,1274,7333,01/01/2020 00:17,King St E / Victoria St,7311.0,01/01/2020 00:38,Sherbourne St / Isabella St,4233,Annual Member
4,7334132,906,7009,01/01/2020 00:19,King St E / Jarvis St,7004.0,01/01/2020 00:34,University Ave / Elm St,2341,Casual Member
...,...,...,...,...,...,...,...,...,...,...
95343,10644213,330,7010,12/31/2020 23:52,King St W / Spadina Ave,7216.0,12/31/2020 23:57,Wellington St W / Stafford St,3458.0,Annual Member
95344,10644214,216,7288,12/31/2020 23:54,Humber Bay Shores Park West,7514.0,12/31/2020 23:58,Humber Bay Shores Park / Marine Parade Dr,4085.0,Annual Member
95345,10644215,204,7288,12/31/2020 23:54,Humber Bay Shores Park West,7514.0,12/31/2020 23:58,Humber Bay Shores Park / Marine Parade Dr,3580.0,Annual Member
95346,10644216,1659,7270,12/31/2020 23:56,Church St / Dundas St E - SMART,7270.0,01/01/2021 00:24,Church St / Dundas St E - SMART,5137.0,Annual Member


The data from 2020 is now fully concatenated. There were 2,911,308 rides in 2020!

Let's save this 2020 DataFrame.

In [47]:
df_2020.to_csv('Desktop/df_2020.csv')

--------------