# In this notebook I bring together the yearly bikeshare data into a single DataFrame containing data from 2017-2020.

I'll start by importing our essential DataFrame manipulation packages...

In [3]:
import pandas as pd 
import numpy as np

Let's bring in our bikeshare data csvs...

In [4]:
df_2017 = pd.read_csv('Desktop/to_bikeshare_project/df_2017.csv')
df_2018 = pd.read_csv('Desktop/to_bikeshare_project/df_2018.csv')
df_2019 = pd.read_csv('Desktop/to_bikeshare_project/df_2019.csv')
df_2020 = pd.read_csv('Desktop/to_bikeshare_project/df_2020.csv')

I'll start by taking a look at the 2017 DataFrame.

In [5]:
df_2017.head(5)

Unnamed: 0.1,Unnamed: 0,trip_id,trip_start_time,trip_stop_time,trip_duration_seconds,from_station_name,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,0,712382,2017-01-01 00:00:00,2017-01-01 00:03:00,223,Wellesley St E / Yonge St Green P,Church St / Wood St,Member,01/01/2017,00:00,01/01/2017,00:03
1,1,712383,2017-01-01 00:00:00,2017-01-01 00:05:00,279,Kendal Ave / Bernard Ave,Bathurst Subway Station,Member,01/01/2017,00:00,01/01/2017,00:05
2,2,712384,2017-01-01 00:05:00,2017-01-01 00:29:00,1394,Parliament St / Aberdeen Ave,College St W / Markham St,Member,01/01/2017,00:05,01/01/2017,00:29
3,3,712385,2017-01-01 00:07:00,2017-01-01 00:21:00,826,College Park South,King St W / Spadina Ave,Member,01/01/2017,00:07,01/01/2017,00:21
4,4,712386,2017-01-01 00:08:00,2017-01-01 00:12:00,279,McGill St / Church St,University Ave / Gerrard St W,Member,01/01/2017,00:08,01/01/2017,00:12


We can definitely drop trip_id which will be unnecessary for analysis, as well as the extra index column.

In [9]:
df_2017.drop(['trip_id', 'Unnamed: 0'], axis=1, inplace=True)

Let's look at the unique values in the user_type column as I want to ensure that these are formatted the same across the DataFrames before joining them.

In [11]:
df_2017['user_type'].unique()

array(['Member', 'Casual'], dtype=object)

In [10]:
df_2017.head()

Unnamed: 0,trip_start_time,trip_stop_time,trip_duration_seconds,from_station_name,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,2017-01-01 00:00:00,2017-01-01 00:03:00,223,Wellesley St E / Yonge St Green P,Church St / Wood St,Member,01/01/2017,00:00,01/01/2017,00:03
1,2017-01-01 00:00:00,2017-01-01 00:05:00,279,Kendal Ave / Bernard Ave,Bathurst Subway Station,Member,01/01/2017,00:00,01/01/2017,00:05
2,2017-01-01 00:05:00,2017-01-01 00:29:00,1394,Parliament St / Aberdeen Ave,College St W / Markham St,Member,01/01/2017,00:05,01/01/2017,00:29
3,2017-01-01 00:07:00,2017-01-01 00:21:00,826,College Park South,King St W / Spadina Ave,Member,01/01/2017,00:07,01/01/2017,00:21
4,2017-01-01 00:08:00,2017-01-01 00:12:00,279,McGill St / Church St,University Ave / Gerrard St W,Member,01/01/2017,00:08,01/01/2017,00:12


Looks good. Now let's look at the 2018 DataFrame.

In [6]:
df_2018.head(5)

Unnamed: 0.1,Unnamed: 0,trip_id,trip_duration_seconds,from_station_id,trip_start_time,from_station_name,trip_stop_time,to_station_id,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,0,2383648,393,7018,2018-01-01 00:47:00,Bremner Blvd / Rees St,2018-01-01 00:54:00,7176,Bathurst St / Fort York Blvd,Annual Member,01/01/2018,00:47,01/01/2018,00:54
1,1,2383649,625,7184,2018-01-01 00:52:00,Ossington Ave / College St,2018-01-01 01:03:00,7191,Central Tech (Harbord St),Annual Member,01/01/2018,00:52,01/01/2018,01:03
2,2,2383650,233,7235,2018-01-01 00:55:00,Bay St / College St (West Side) - SMART,2018-01-01 00:59:00,7021,Bay St / Albert St,Annual Member,01/01/2018,00:55,01/01/2018,00:59
3,3,2383651,1138,7202,2018-01-01 00:57:00,Queen St W / York St (City Hall),2018-01-01 01:16:00,7020,Phoebe St / Spadina Ave,Annual Member,01/01/2018,00:57,01/01/2018,01:16
4,4,2383652,703,7004,2018-01-01 01:00:00,University Ave / Elm St,2018-01-01 01:12:00,7060,Princess St / Adelaide St E,Annual Member,01/01/2018,01:00,01/01/2018,01:12


We can drop the extra index, trip_id, and the station_id numbers since we already have the station names.

In [18]:
df_2018.drop(['Unnamed: 0', 'trip_id', 'from_station_id', 'to_station_id'], axis=1, inplace=True )

Let's look at the unique values in user_type.

In [19]:
df_2018['user_type'].unique()

array(['Annual Member', 'Casual Member'], dtype=object)

We can see that the names have been altered slightly. I'll modify them so they match the the 2017 DataFrame.

In [20]:
df_2018['user_type'] = df_2018['user_type'].replace({'Annual Member': 'Member', 'Casual Member': 'Casual'})

In [21]:
df_2018.head(5)

Unnamed: 0,trip_duration_seconds,trip_start_time,from_station_name,trip_stop_time,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,393,2018-01-01 00:47:00,Bremner Blvd / Rees St,2018-01-01 00:54:00,Bathurst St / Fort York Blvd,Member,01/01/2018,00:47,01/01/2018,00:54
1,625,2018-01-01 00:52:00,Ossington Ave / College St,2018-01-01 01:03:00,Central Tech (Harbord St),Member,01/01/2018,00:52,01/01/2018,01:03
2,233,2018-01-01 00:55:00,Bay St / College St (West Side) - SMART,2018-01-01 00:59:00,Bay St / Albert St,Member,01/01/2018,00:55,01/01/2018,00:59
3,1138,2018-01-01 00:57:00,Queen St W / York St (City Hall),2018-01-01 01:16:00,Phoebe St / Spadina Ave,Member,01/01/2018,00:57,01/01/2018,01:16
4,703,2018-01-01 01:00:00,University Ave / Elm St,2018-01-01 01:12:00,Princess St / Adelaide St E,Member,01/01/2018,01:00,01/01/2018,01:12


Looks good! Now let's move onto the 2019 data...

In [7]:
df_2019.head(5)

Unnamed: 0.1,Unnamed: 0,Trip Id,Trip Duration,Start Station Id,Start Time,Start Station Name,End Station Id,End Time,End Station Name,Bike Id,User Type,start_date,start_time,stop_date,stop_time
0,0,4581278,1547.0,7021,2019-01-01 00:08:00,Bay St / Albert St,7233.0,2019-01-01 00:33:00,King / Cowan Ave - SMART,1296,Annual Member,01/01/2019,00:08,01/01/2019,00:33
1,1,4581279,1112.0,7160,2019-01-01 00:10:00,King St W / Tecumseth St,7051.0,2019-01-01 00:29:00,Wellesley St E / Yonge St (Green P),2947,Annual Member,01/01/2019,00:10,01/01/2019,00:29
2,2,4581280,589.0,7055,2019-01-01 00:15:00,Jarvis St / Carlton St,7013.0,2019-01-01 00:25:00,Scott St / The Esplanade,2293,Annual Member,01/01/2019,00:15,01/01/2019,00:25
3,3,4581281,259.0,7012,2019-01-01 00:16:00,Elizabeth St / Edward St (Bus Terminal),7235.0,2019-01-01 00:20:00,Bay St / College St (West Side) - SMART,283,Annual Member,01/01/2019,00:16,01/01/2019,00:20
4,4,4581282,281.0,7041,2019-01-01 00:19:00,Edward St / Yonge St,7257.0,2019-01-01 00:24:00,Dundas St W / St. Patrick St,1799,Annual Member,01/01/2019,00:19,01/01/2019,00:24


Again I can drop some columns here, the same as 2018 but with the addition of the Bike Id column which I won't be looking into for my analysis. 

In [22]:
df_2019.drop(['Unnamed: 0', 'Trip Id', 'Start Station Id', 'End Station Id', 'Bike Id'], axis=1, inplace=True)

Let's also modify the names of the values in the user type column to match the other DataFrames.

In [23]:
df_2019['User Type'] = df_2019['User Type'].replace({'Annual Member': 'Member', 'Casual Member': 'Casual'})

Finally, I'll rename the columns so that they match the column names from the previous DataFrames.

In [24]:
df_2019 = df_2019.rename(columns={'Trip Duration': 'trip_duration_seconds', 'Start Time': 'trip_start_time', 'Start Station Name': 'from_station_name', 'End Time': 'trip_stop_time', 'End Station Name': 'to_station_name', 'User Type': 'user_type'})

In [30]:
df_2019.head(5)

Unnamed: 0,trip_duration_seconds,trip_start_time,from_station_name,trip_stop_time,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,1547.0,2019-01-01 00:08:00,Bay St / Albert St,2019-01-01 00:33:00,King / Cowan Ave - SMART,Member,01/01/2019,00:08,01/01/2019,00:33
1,1112.0,2019-01-01 00:10:00,King St W / Tecumseth St,2019-01-01 00:29:00,Wellesley St E / Yonge St (Green P),Member,01/01/2019,00:10,01/01/2019,00:29
2,589.0,2019-01-01 00:15:00,Jarvis St / Carlton St,2019-01-01 00:25:00,Scott St / The Esplanade,Member,01/01/2019,00:15,01/01/2019,00:25
3,259.0,2019-01-01 00:16:00,Elizabeth St / Edward St (Bus Terminal),2019-01-01 00:20:00,Bay St / College St (West Side) - SMART,Member,01/01/2019,00:16,01/01/2019,00:20
4,281.0,2019-01-01 00:19:00,Edward St / Yonge St,2019-01-01 00:24:00,Dundas St W / St. Patrick St,Member,01/01/2019,00:19,01/01/2019,00:24


Looks good! but i notice that trip_duration is a float datatype. I'll just quickly change that to an int. 

In [31]:
df_2019['trip_duration_seconds'] = df_2019['trip_duration_seconds'].astype(int)

In [32]:
df_2019.head(5)

Unnamed: 0,trip_duration_seconds,trip_start_time,from_station_name,trip_stop_time,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,1547,2019-01-01 00:08:00,Bay St / Albert St,2019-01-01 00:33:00,King / Cowan Ave - SMART,Member,01/01/2019,00:08,01/01/2019,00:33
1,1112,2019-01-01 00:10:00,King St W / Tecumseth St,2019-01-01 00:29:00,Wellesley St E / Yonge St (Green P),Member,01/01/2019,00:10,01/01/2019,00:29
2,589,2019-01-01 00:15:00,Jarvis St / Carlton St,2019-01-01 00:25:00,Scott St / The Esplanade,Member,01/01/2019,00:15,01/01/2019,00:25
3,259,2019-01-01 00:16:00,Elizabeth St / Edward St (Bus Terminal),2019-01-01 00:20:00,Bay St / College St (West Side) - SMART,Member,01/01/2019,00:16,01/01/2019,00:20
4,281,2019-01-01 00:19:00,Edward St / Yonge St,2019-01-01 00:24:00,Dundas St W / St. Patrick St,Member,01/01/2019,00:19,01/01/2019,00:24


Great! Now we can move onto the 2020 data...

In [34]:
df_2020.head(5)

Unnamed: 0,Trip Duration,Start Time,Start Station Name,End Time,End Station Name,User Type,start_date,start_time,stop_date,stop_time
0,648,2020-01-01 00:08:00,Madison Ave / Bloor St W,2020-01-01 00:19:00,Yonge St / Alexander St - SMART,Annual Member,01/01/2020,00:08,01/01/2020,00:19
1,419,2020-01-01 00:10:00,College St / Huron St,2020-01-01 00:17:00,Yonge St / Wood St,Annual Member,01/01/2020,00:10,01/01/2020,00:17
2,566,2020-01-01 00:13:00,Parliament St / Aberdeen Ave,2020-01-01 00:22:00,Front St E / Cherry St,Annual Member,01/01/2020,00:13,01/01/2020,00:22
3,1274,2020-01-01 00:17:00,King St E / Victoria St,2020-01-01 00:38:00,Sherbourne St / Isabella St,Annual Member,01/01/2020,00:17,01/01/2020,00:38
4,906,2020-01-01 00:19:00,King St E / Jarvis St,2020-01-01 00:34:00,University Ave / Elm St,Casual Member,01/01/2020,00:19,01/01/2020,00:34


Let's drop the unnecessary columns...

In [None]:
df_2020.drop(['Unnamed: 0', 'Trip Id', 'Start Station Id', 'End Station Id', 'Bike Id'], axis=1, inplace=True)

Let's change the User Type values to make sure they match. 

In [35]:
df_2020['User Type'] = df_2020['User Type'].replace({'Annual Member': 'Member', 'Casual Member': 'Casual'})

Let's rename the columns so that they match as well.

In [36]:
df_2020 = df_2020.rename(columns={'Trip  Duration': 'trip_duration_seconds', 'Start Time': 'trip_start_time', 'Start Station Name': 'from_station_name', 'End Time': 'trip_stop_time', 'End Station Name': 'to_station_name', 'User Type': 'user_type'})

In [37]:
df_2020.head(5)

Unnamed: 0,trip_duration_seconds,trip_start_time,from_station_name,trip_stop_time,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,648,2020-01-01 00:08:00,Madison Ave / Bloor St W,2020-01-01 00:19:00,Yonge St / Alexander St - SMART,Member,01/01/2020,00:08,01/01/2020,00:19
1,419,2020-01-01 00:10:00,College St / Huron St,2020-01-01 00:17:00,Yonge St / Wood St,Member,01/01/2020,00:10,01/01/2020,00:17
2,566,2020-01-01 00:13:00,Parliament St / Aberdeen Ave,2020-01-01 00:22:00,Front St E / Cherry St,Member,01/01/2020,00:13,01/01/2020,00:22
3,1274,2020-01-01 00:17:00,King St E / Victoria St,2020-01-01 00:38:00,Sherbourne St / Isabella St,Member,01/01/2020,00:17,01/01/2020,00:38
4,906,2020-01-01 00:19:00,King St E / Jarvis St,2020-01-01 00:34:00,University Ave / Elm St,Casual,01/01/2020,00:19,01/01/2020,00:34


Looks good!

I don't believe we need the trip_start_time or trip_stop_time in any of the dataframes anymore, as we have them nicely separated into individual date and time columns.

In [38]:
df_2017.drop(['trip_stop_time', 'trip_start_time'], axis=1, inplace=True)
df_2018.drop(['trip_stop_time', 'trip_start_time'], axis=1, inplace=True)
df_2019.drop(['trip_stop_time', 'trip_start_time'], axis=1, inplace=True)
df_2020.drop(['trip_stop_time', 'trip_start_time'], axis=1, inplace=True)

Now let's start appending the dataframes together...

In [40]:
df_2017_2018 = df_2017.append(df_2018)

In [42]:
df_2017_2018

Unnamed: 0,trip_duration_seconds,from_station_name,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,223,Wellesley St E / Yonge St Green P,Church St / Wood St,Member,01/01/2017,00:00,01/01/2017,00:03
1,279,Kendal Ave / Bernard Ave,Bathurst Subway Station,Member,01/01/2017,00:00,01/01/2017,00:05
2,1394,Parliament St / Aberdeen Ave,College St W / Markham St,Member,01/01/2017,00:05,01/01/2017,00:29
3,826,College Park South,King St W / Spadina Ave,Member,01/01/2017,00:07,01/01/2017,00:21
4,279,McGill St / Church St,University Ave / Gerrard St W,Member,01/01/2017,00:08,01/01/2017,00:12
...,...,...,...,...,...,...,...,...
1922950,379,Danforth Ave / Coxwell Ave,Donlands Station,Member,31/12/2018,23:43,31/12/2018,23:49
1922951,306,Bay St / Wellesley St W,Jarvis St / Isabella St,Member,31/12/2018,23:45,31/12/2018,23:50
1922952,340,Phoebe St / Spadina Ave,Fort York Blvd / Capreol Ct,Member,31/12/2018,23:49,31/12/2018,23:55
1922953,1466,Sherbourne St / Carlton St (Allan Gardens),Toronto Eaton Centre (Yonge St),Member,31/12/2018,23:52,01/01/2019,00:17


Looks good! Now let's append on 2019...

In [43]:
df_17_18_19 = df_2017_2018.append(df_2019)

In [44]:
df_17_18_19

Unnamed: 0,trip_duration_seconds,from_station_name,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,223,Wellesley St E / Yonge St Green P,Church St / Wood St,Member,01/01/2017,00:00,01/01/2017,00:03
1,279,Kendal Ave / Bernard Ave,Bathurst Subway Station,Member,01/01/2017,00:00,01/01/2017,00:05
2,1394,Parliament St / Aberdeen Ave,College St W / Markham St,Member,01/01/2017,00:05,01/01/2017,00:29
3,826,College Park South,King St W / Spadina Ave,Member,01/01/2017,00:07,01/01/2017,00:21
4,279,McGill St / Church St,University Ave / Gerrard St W,Member,01/01/2017,00:08,01/01/2017,00:12
...,...,...,...,...,...,...,...,...
2439042,523,Riverdale Park South (Broadview Ave),Carlaw Ave / Strathcona Ave,Member,31/12/2019,23:39,31/12/2019,23:48
2439043,273,Church St / Alexander St,Bay St / Charles St - SMART,Member,31/12/2019,23:45,31/12/2019,23:49
2439044,1055,Dundas St E / Regent Park Blvd,Dundas St E / Regent Park Blvd,Member,31/12/2019,23:51,01/01/2020,00:08
2439045,459,York St / Lake Shore Blvd W,Nelson St / Duncan St,Member,31/12/2019,23:55,01/01/2020,00:03


Looks good! Now let's append on the 2020 data for our final dataframe.

In [45]:
bikeshare_data = df_17_18_19.append(df_2020)

In [46]:
bikeshare_data

Unnamed: 0,trip_duration_seconds,from_station_name,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,223,Wellesley St E / Yonge St Green P,Church St / Wood St,Member,01/01/2017,00:00,01/01/2017,00:03
1,279,Kendal Ave / Bernard Ave,Bathurst Subway Station,Member,01/01/2017,00:00,01/01/2017,00:05
2,1394,Parliament St / Aberdeen Ave,College St W / Markham St,Member,01/01/2017,00:05,01/01/2017,00:29
3,826,College Park South,King St W / Spadina Ave,Member,01/01/2017,00:07,01/01/2017,00:21
4,279,McGill St / Church St,University Ave / Gerrard St W,Member,01/01/2017,00:08,01/01/2017,00:12
...,...,...,...,...,...,...,...,...
2908162,330,King St W / Spadina Ave,Wellington St W / Stafford St,Member,31/12/2020,23:52,31/12/2020,23:57
2908163,216,Humber Bay Shores Park West,Humber Bay Shores Park / Marine Parade Dr,Member,31/12/2020,23:54,31/12/2020,23:58
2908164,204,Humber Bay Shores Park West,Humber Bay Shores Park / Marine Parade Dr,Member,31/12/2020,23:54,31/12/2020,23:58
2908165,1659,Church St / Dundas St E - SMART,Church St / Dundas St E - SMART,Member,31/12/2020,23:56,01/01/2021,00:24


It looks good other than the index numbers looking off. I'll just reset the index so that they go in order.

In [48]:
bikeshare_data.reset_index(inplace=True)
bikeshare_data

Unnamed: 0,index,trip_duration_seconds,from_station_name,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,0,223,Wellesley St E / Yonge St Green P,Church St / Wood St,Member,01/01/2017,00:00,01/01/2017,00:03
1,1,279,Kendal Ave / Bernard Ave,Bathurst Subway Station,Member,01/01/2017,00:00,01/01/2017,00:05
2,2,1394,Parliament St / Aberdeen Ave,College St W / Markham St,Member,01/01/2017,00:05,01/01/2017,00:29
3,3,826,College Park South,King St W / Spadina Ave,Member,01/01/2017,00:07,01/01/2017,00:21
4,4,279,McGill St / Church St,University Ave / Gerrard St W,Member,01/01/2017,00:08,01/01/2017,00:12
...,...,...,...,...,...,...,...,...,...
8762532,2908162,330,King St W / Spadina Ave,Wellington St W / Stafford St,Member,31/12/2020,23:52,31/12/2020,23:57
8762533,2908163,216,Humber Bay Shores Park West,Humber Bay Shores Park / Marine Parade Dr,Member,31/12/2020,23:54,31/12/2020,23:58
8762534,2908164,204,Humber Bay Shores Park West,Humber Bay Shores Park / Marine Parade Dr,Member,31/12/2020,23:54,31/12/2020,23:58
8762535,2908165,1659,Church St / Dundas St E - SMART,Church St / Dundas St E - SMART,Member,31/12/2020,23:56,01/01/2021,00:24


In [49]:
bikeshare_data.drop('index', axis=1, inplace=True)
bikeshare_data

Unnamed: 0,trip_duration_seconds,from_station_name,to_station_name,user_type,start_date,start_time,stop_date,stop_time
0,223,Wellesley St E / Yonge St Green P,Church St / Wood St,Member,01/01/2017,00:00,01/01/2017,00:03
1,279,Kendal Ave / Bernard Ave,Bathurst Subway Station,Member,01/01/2017,00:00,01/01/2017,00:05
2,1394,Parliament St / Aberdeen Ave,College St W / Markham St,Member,01/01/2017,00:05,01/01/2017,00:29
3,826,College Park South,King St W / Spadina Ave,Member,01/01/2017,00:07,01/01/2017,00:21
4,279,McGill St / Church St,University Ave / Gerrard St W,Member,01/01/2017,00:08,01/01/2017,00:12
...,...,...,...,...,...,...,...,...
8762532,330,King St W / Spadina Ave,Wellington St W / Stafford St,Member,31/12/2020,23:52,31/12/2020,23:57
8762533,216,Humber Bay Shores Park West,Humber Bay Shores Park / Marine Parade Dr,Member,31/12/2020,23:54,31/12/2020,23:58
8762534,204,Humber Bay Shores Park West,Humber Bay Shores Park / Marine Parade Dr,Member,31/12/2020,23:54,31/12/2020,23:58
8762535,1659,Church St / Dundas St E - SMART,Church St / Dundas St E - SMART,Member,31/12/2020,23:56,01/01/2021,00:24


Great, now the data is ready for some visual based analysis in Tableau. I'll finish off by saving this to csv. 

In [50]:
bikeshare_data.to_csv('Desktop/to_bikeshare_project/bikeshare_data.csv')