# Cyclistic Full Year Analysis
### author: Gordon Lam
### date: 27/12/2001

This analysis is the case study 1 of the Google Data Analytics Certificate's capstone project. 
In this case study, I am using the Divvy dataset and use them to help answer the business task "How do annual members and casual riders use Cyclistic bikes differently?"

### Business Objectives
- What is our goal?
    - to maxmize the number of annual riders
- What is our problem?
    - how annual members and casual riders diff from each other?
    - why would casual riders buy annual membership?
    - __how can Cyclistic use digital media to influence casual riders to become annual members?__
- Who is our main stakeholders
    - Lily Moreno (our director of marketing and manager)
    - Cyclistic Executive Team (responsible for deciding whether to approve the recommended marketing program)


### Data
The Divvy dataset is located in amazonaws and they are organized by date. The data is reliable and organized as it comes from the company. And it is also comprehensive, current and cited. 

The dataset is secured as the data are tokenized so that the personal information of the users can be protected.

### Process
- created a column called "ride_length" by substracting the "ended_at" by "started_at" column
- created a column called "day_of_week" representing the day of the week given the date

In [2]:
import pandas as pd
import numpy as np
import os
from pathlib import Path 

In [3]:
filepath = os.getcwd() + "/divvy-tripdata/"
csvpath = "csv/"
xlsxpath = "xlsx/"

In [4]:
df = pd.read_csv(filepath + csvpath + "202112-divvy-tripdata.csv")
df.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,46F8167220E4431F,electric_bike,2021-12-07 15:06:07,2021-12-07 15:13:42,Laflin St & Cullerton St,13307,Morgan St & Polk St,TA1307000130,41.854833,-87.66366,41.871969,-87.650965,member
1,73A77762838B32FD,electric_bike,2021-12-11 03:43:29,2021-12-11 04:10:23,LaSalle Dr & Huron St,KP1705001026,Clarendon Ave & Leland Ave,TA1307000119,41.894405,-87.632331,41.967968,-87.650001,casual
2,4CF42452054F59C5,electric_bike,2021-12-15 23:10:28,2021-12-15 23:23:14,Halsted St & North Branch St,KA1504000117,Broadway & Barry Ave,13137,41.899357,-87.648522,41.937582,-87.644098,member
3,3278BA87BF698339,classic_bike,2021-12-26 16:16:10,2021-12-26 16:30:53,Halsted St & North Branch St,KA1504000117,LaSalle Dr & Huron St,KP1705001026,41.89939,-87.648545,41.894877,-87.632326,member
4,6FF54232576A3B73,electric_bike,2021-12-30 11:31:05,2021-12-30 11:51:21,Leavitt St & Chicago Ave,18058,Clark St & Drummond Pl,TA1307000142,41.895579,-87.682024,41.931248,-87.644336,member


## Understand the variables

In [7]:
# Understand the variables
variables = pd.DataFrame(columns=['Variable', 'Number of Unique values', 'values'])

for i, var in enumerate(df.columns):
    variables.loc[i] = [var, df[var].nunique(), df[var].unique().tolist()]
    
variables

Unnamed: 0,Variable,Number of Unique values,values
0,ride_id,247540,"[46F8167220E4431F, 73A77762838B32FD, 4CF424520..."
1,rideable_type,3,"[electric_bike, classic_bike, docked_bike]"
2,started_at,228845,"[2021-12-07 15:06:07, 2021-12-11 03:43:29, 202..."
3,ended_at,228657,"[2021-12-07 15:13:42, 2021-12-11 04:10:23, 202..."
4,start_station_name,818,"[Laflin St & Cullerton St, LaSalle Dr & Huron ..."
5,start_station_id,816,"[13307, KP1705001026, KA1504000117, 18058, SL-..."
6,end_station_name,800,"[Morgan St & Polk St, Clarendon Ave & Leland A..."
7,end_station_id,798,"[TA1307000130, TA1307000119, 13137, KP17050010..."
8,start_lat,78650,"[41.854833, 41.894405166666665, 41.89935716666..."
9,start_lng,78373,"[-87.66366033333334, -87.632331, -87.648521833..."


### Data dictionary
|variable           |class  |description 
|:--------          |:----- |:-----------
|ride_id       |String |Id of the ride
|rideable_type |String |"eletric_bike", "classic_bike", "docked_bike"
|started_at        |Datetime |Start time of the ride in YYYY-MM-DD HH:MM:SS
|ended_at           |Datetime |End time of the ride in YYYY-MM-DD HH:MM:SS
|start_station_name |String |Name of starting station
|start_station_id   |String |Id of starting station
|end_station_name   |String |Name of ending station
|end_station_id     |String |Id of ending station
|start_lat |float | latitude of the starting station
|start_lng |float | longitude of the starting station
|end_lat |float | latitude of the ending station
|end_lng |float | longitude of the ending station
|member_casual |String | "member", "casual"
|ride_length |time |HH:MM:SS
|day_of_week |Integer |1-7, 1=Sunday, 7=Saturday

In [4]:
df1 = pd.read_csv(filepath + csvpath + "202201-divvy-tripdata.csv")
df1.head()

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,C2F7DD78E82EC875,electric_bike,2022-01-13 11:59:47,2022-01-13 12:02:44,Glenwood Ave & Touhy Ave,525,Clark St & Touhy Ave,RP-007,42.0128,-87.665906,42.01256,-87.674367,casual
1,A6CF8980A652D272,electric_bike,2022-01-10 08:41:56,2022-01-10 08:46:17,Glenwood Ave & Touhy Ave,525,Clark St & Touhy Ave,RP-007,42.012763,-87.665967,42.01256,-87.674367,casual
2,BD0F91DFF741C66D,classic_bike,2022-01-25 04:53:40,2022-01-25 04:58:01,Sheffield Ave & Fullerton Ave,TA1306000016,Greenview Ave & Fullerton Ave,TA1307000001,41.925602,-87.653708,41.92533,-87.6658,member
3,CBB80ED419105406,classic_bike,2022-01-04 00:18:04,2022-01-04 00:33:00,Clark St & Bryn Mawr Ave,KA1504000151,Paulina St & Montrose Ave,TA1309000021,41.983593,-87.669154,41.961507,-87.671387,casual
4,DDC963BFDDA51EEA,classic_bike,2022-01-20 01:31:10,2022-01-20 01:37:12,Michigan Ave & Jackson Blvd,TA1309000002,State St & Randolph St,TA1305000029,41.87785,-87.62408,41.884621,-87.627834,member


In [5]:
result = pd.concat([df, df1], ignore_index = True)
result

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,46F8167220E4431F,electric_bike,2021-12-07 15:06:07,2021-12-07 15:13:42,Laflin St & Cullerton St,13307,Morgan St & Polk St,TA1307000130,41.854833,-87.663660,41.871969,-87.650965,member
1,73A77762838B32FD,electric_bike,2021-12-11 03:43:29,2021-12-11 04:10:23,LaSalle Dr & Huron St,KP1705001026,Clarendon Ave & Leland Ave,TA1307000119,41.894405,-87.632331,41.967968,-87.650001,casual
2,4CF42452054F59C5,electric_bike,2021-12-15 23:10:28,2021-12-15 23:23:14,Halsted St & North Branch St,KA1504000117,Broadway & Barry Ave,13137,41.899357,-87.648522,41.937582,-87.644098,member
3,3278BA87BF698339,classic_bike,2021-12-26 16:16:10,2021-12-26 16:30:53,Halsted St & North Branch St,KA1504000117,LaSalle Dr & Huron St,KP1705001026,41.899390,-87.648545,41.894877,-87.632326,member
4,6FF54232576A3B73,electric_bike,2021-12-30 11:31:05,2021-12-30 11:51:21,Leavitt St & Chicago Ave,18058,Clark St & Drummond Pl,TA1307000142,41.895579,-87.682024,41.931248,-87.644336,member
...,...,...,...,...,...,...,...,...,...,...,...,...,...
351305,8788DA3EDE8FD8AB,electric_bike,2022-01-18 12:36:48,2022-01-18 12:46:19,Clinton St & Washington Blvd,WL-012,,,41.883436,-87.641391,41.890000,-87.620000,casual
351306,C6C3B64FDC827D8C,electric_bike,2022-01-27 11:00:06,2022-01-27 11:02:40,Racine Ave & Randolph St,13155,,,41.884158,-87.656977,41.880000,-87.650000,casual
351307,CA281AE7D8B06F5A,electric_bike,2022-01-10 16:14:51,2022-01-10 16:20:58,Broadway & Waveland Ave,13325,Clark St & Grace St,TA1307000127,41.949066,-87.648611,41.950780,-87.659172,casual
351308,44E348991862319B,electric_bike,2022-01-19 13:22:11,2022-01-19 13:24:27,Racine Ave & Randolph St,13155,,,41.884005,-87.657031,41.880000,-87.660000,casual


In [27]:
def mergeCsv(filepath):
    df_list = []
    csvpath = filepath + "csv"
    files = Path(csvpath).glob("*")
    for file in files:
        df = pd.read_csv(file)
        df_list.append(df)
    result = pd.concat(df_list, ignore_index = True)
    return result

In [28]:
result = mergeCsv(filepath)
result

Unnamed: 0,ride_id,rideable_type,started_at,ended_at,start_station_name,start_station_id,end_station_name,end_station_id,start_lat,start_lng,end_lat,end_lng,member_casual
0,550CF7EFEAE0C618,electric_bike,2022-08-07 21:34:15,2022-08-07 21:41:46,,,,,41.930000,-87.690000,41.940000,-87.720000,casual
1,DAD198F405F9C5F5,electric_bike,2022-08-08 14:39:21,2022-08-08 14:53:23,,,,,41.890000,-87.640000,41.920000,-87.640000,casual
2,E6F2BC47B65CB7FD,electric_bike,2022-08-08 15:29:50,2022-08-08 15:40:34,,,,,41.970000,-87.690000,41.970000,-87.660000,casual
3,F597830181C2E13C,electric_bike,2022-08-08 02:43:50,2022-08-08 02:58:53,,,,,41.940000,-87.650000,41.970000,-87.690000,casual
4,0CE689BB4E313E8D,electric_bike,2022-08-07 20:24:06,2022-08-07 20:29:58,,,,,41.850000,-87.650000,41.840000,-87.660000,casual
...,...,...,...,...,...,...,...,...,...,...,...,...,...
5733446,C5A123D7BF8D350A,electric_bike,2022-04-22 15:54:11,2022-04-22 16:20:59,Streeter Dr & Grand Ave,13022,California Ave & North Ave,13258,41.892296,-87.612198,41.910475,-87.696894,member
5733447,F7FCC7C26D8D137D,electric_bike,2022-04-21 20:18:17,2022-04-21 20:46:45,Streeter Dr & Grand Ave,13022,California Ave & North Ave,13258,41.892295,-87.612323,41.910475,-87.696894,member
5733448,43D351300A40000A,classic_bike,2022-04-21 16:46:02,2022-04-21 17:15:05,Franklin St & Monroe St,TA1309000007,St. Clair St & Erie St,13016,41.880317,-87.635185,41.894345,-87.622798,member
5733449,1618BFEEA7B566EF,electric_bike,2022-04-16 13:19:44,2022-04-16 13:37:31,Ashland Ave & Blackhawk St,13224,Southport Ave & Waveland Ave,13235,41.907094,-87.667217,41.948150,-87.663940,casual
