## Introduction

The following project has been created as a part of the Google Data Analytics Professional Certificate: Course 8 - The Capstone Project. This is also my first Data Analytics Project. I hope you like it!

## Scenario

I work as a junior data analyst in the marketing analyst team at Cyclistic, a bike-sharing company in Chicago. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Customers who purchase single-ride or full-day passes are referred to as casual riders whereas customers who purchase annual memberships are Cyclistic members. Thus, the director of marketing believes that the future success of the company depends on maximizing the number of annual memberships. 

## Business Task
> **Analyzing Cyclistic historical bike trip data to identify trends and convert casual riders into annual members by checking how they use Cyclistic bikes differently.**

## A description of the Data Sources
The Cyclistic historical bike trip datasets have been provided by Motivate International Inc. under the following [license.](https://www.divvybikes.com/data-license-agreement)

The datasets have been provided through a website link. They contain data about historical bike trips beginning from the year 2013 upto July 2021 in the city of Chicago. The datasets and tables have been organized either yearly, half-yearly, quarterly or monthly. Each table contains 13 fields which store data about each trip taken. 

**As per the project guidelines, all further analysis has been conducted on tables containing data for the period of 12 months (April 2020 - March 2021).**

The following issues were noticed after combining the required datasets into one table:
* Date and time columns ‘started_at’ and ‘ended_at’ and numeric columns ‘start_station_id’ and ‘end_station_id’ given in String formats. 
* Null or missing values spotted throughout the datasets.

## Data Cleaning / Manipulation Procedure using **Python**



In [None]:
# Importing Pandas
import pandas as pd

# Merging 12 months (August 2020 - July 2021) of Cyclistic bike data into one Dataframe
originaldf = pd.concat(map(pd.read_csv, 
                   [r"../input/cyclistic/202004-divvy-tripdata/202004-divvy-tripdata.csv", 
                    r"../input/cyclistic/202005-divvy-tripdata/202005-divvy-tripdata.csv",
                    r"../input/cyclistic/202006-divvy-tripdata/202006-divvy-tripdata.csv",
                    r"../input/cyclistic/202007-divvy-tripdata/202007-divvy-tripdata.csv",
                    r"../input/cyclistic/202008-divvy-tripdata/202008-divvy-tripdata.csv",
                    r"../input/cyclistic/202009-divvy-tripdata/202009-divvy-tripdata.csv",
                    r"../input/cyclistic/202010-divvy-tripdata/202010-divvy-tripdata.csv",
                    r"../input/cyclistic/202011-divvy-tripdata/202011-divvy-tripdata.csv",
                    r"../input/cyclistic/202012-divvy-tripdata/202012-divvy-tripdata.csv",
                    r"../input/cyclistic/202101-divvy-tripdata/202101-divvy-tripdata.csv",
                    r"../input/cyclistic/202102-divvy-tripdata/202102-divvy-tripdata.csv",
                    r"../input/cyclistic/202103-divvy-tripdata/202103-divvy-tripdata.csv"]
                      ), ignore_index = True)

# Removing rows with null values
df = pd.DataFrame(originaldf.dropna())

# Converting String data type into Datetime data type
df['started_at']= pd.to_datetime(df['started_at'])
df['ended_at']= pd.to_datetime(df['ended_at'])

# Converting String data type into Float data type
df['start_station_id'] = pd.to_numeric(df['start_station_id'],errors='coerce')
df['end_station_id'] = pd.to_numeric(df['end_station_id'],errors='coerce')

# Creating a new column 'total_time' to find the total time taken for each trip
df['total_time'] = df['ended_at'] - df['started_at']

# Dropping rows with negative 'total_time'
df = df[df.total_time >= pd.Timedelta(0)]

# Creating a new column 'day' to find the day of the week each trip took place on
df['day'] = df['started_at'].dt.day_name()

# Creating new columns 'year', 'month' and 'date' by extracting each element from the 'started_at' column
df['year'] = pd.DatetimeIndex(df['started_at']).year
df['month'] = pd.DatetimeIndex(df['started_at']).month
df['date'] = pd.DatetimeIndex(df['started_at']).day

## Analysis Procedure using **Python**

In [None]:
# Finding out the current number of Members and Casual Riders 
df.member_casual.value_counts()

In [None]:
# Finding the average ride-time per rider type
df.groupby('member_casual')['total_time'].mean(numeric_only = False)

In [None]:
#  Finding the number of riders as per day of the week
df.groupby('member_casual')['day'].value_counts()

In [None]:
# Finding the number of riders as per month
df.groupby('member_casual')['month'].value_counts()

In [None]:
# Finding the number of riders as per bike-type
df.groupby('member_casual')['rideable_type'].value_counts()

## Presentation of Analysis using Tableau

In [None]:
import os
from IPython.display import Image
Image(filename="../input/cyclistic-insights/Visualization1.PNG", width= 800, height=400)

**Insights:** 

The above Pie-chart gives us a general overview of the distribution of Member and Casual riders on the total number of trips. It is evident that out of the total number of trips that took place from April 2020 to March 2021, 58.95% of all riders were Members whereas the rest 41.05% were Casual riders. This shows that **Members use bike-sharing services more frequently** as compared to Casual riders.

In [None]:
import os
from IPython.display import Image
Image(filename="../input/cyclistic-insights/Visualization2.PNG", width=700, height=300)

**Insights:** 

The above chart gives us an idea about the distribution of Member and Casual riders over days of the week. In case of **Casual riders**, maximum number of trips can be seen on **weekends, i.e- Saturday and Sunday.** This indicates that most Casual riders use bike-sharing services for leisure purposes like holidays, trips, etc.

On the other hand, the number of trips are more or less consistent in case of **Member riders**, indicating that most members employ bike-sharing services for daily uses like going to work, university, etc. or any other daily errands. **Saturday** shows the maximum number of trips followed by **Wednesday**.

In [None]:
import os
from IPython.display import Image
Image(filename="../input/cyclistic-insights/Visualization3.PNG", width= 800, height=400)

**Insights:**

The above graph shows a similar trend with both, Member and Casual riders. Number of trips start increasing between May and June, peak in August, then decrease towards October and November. The lowest number of trips in both cases are found in the month of February. This could indicate a general **decrease in the number of trips during colder months**.

In [None]:
import os
from IPython.display import Image
Image(filename="../input/cyclistic-insights/Visualization4.PNG", width= 700, height=300)

**Insights:**

A clear trend is shown in the above chart with regards to both, member and casual riders. The **docked bike is the most popular** choice for all riders in general, being used for over **75%** of all trips during the given period. This is followed by electric bikes and lastly, classic bikes.

In [None]:
import os
from IPython.display import Image
Image(filename="../input/cyclistic-insights/Visualization5.PNG", width= 700, height=300)

**Insights:**

The above chart shows us a 24 hour distribution of Member and Casual riders starting their bike trips. Number of trips **peak at about 5PM UST for both categories**, indicating an increase in bike-usage during evenings. In case of members, there is also a slight spike in the number of trips around 5AM UST, which might indicate an increase in bike-usage during early-mornings for members who go to work or school around this time.

In [None]:
import os
from IPython.display import Image
Image(filename="../input/cyclistic-insights/Visualization7.PNG", width= 900, height=300)

In [None]:
import os
from IPython.display import Image
Image(filename="../input/cyclistic-insights/Visualization8.PNG", width= 900, height=300)

**Insights:**

The above two charts give us a geographical representation of the Top 10 most popular or frequently travelled routes used by Member and Casual riders in the city of Chicago. It is evident that both member and casual riders mostly travel **short distances** from one station to another usually located in the same area. Eg- Burnham Harbor to Burnham Harbor. However, different routes are popular for both rider types. 

**In case of Casual riders, the most frequently used route is Streeter Dr. & Grand Ave. to Streeter Dr. & Grand Ave. whereas for Member riders, it is MLK Jr. Dr. & 29th St. to State St. & 33rd St.** 

On the other hand, **only one route is popular amongst both Member and Casual riders, which is Theatre on the Lake to Theatre on the Lake.**

### **Average Ride time per User**
>  #####  Casual: 45 minutes 27 seconds

>  #####  Member: 15 minutes 56 seconds

**Insights:**

It can be seen that trips taken by Casual riders are about 3 times longer than trips taken by Members. This shows that **Casual riders usually use bike-sharing services for longer trips** as compared to Members.

## Top Recommendations 
#### (Based on Insights)

1. **Provide incentives like discounts, sales, etc. for annual membership rates during warmer months (May - October).**
   
   Objective - Maximum trips take place during warmer months as per the data given above. Thus, providing Summer discounts on the annual membership rates may motivate Casual riders to purchase annual memberships during these months and boost conversions.
   
2. **Create holiday packages which incentivizes trips on weekends, or take place post 4pm (Happy Hours) or, go on for above 30 minutes.**

   Objective - The above data shows that most Casual riders make bike-trips trips on weekdays (Saturday and Sunday), and most trips are made around 5pm throughout the week. Moreover, it also shows that trips made by casual riders tend to be 3 times longer than those made by Members. Thus, creating new holiday packages specifically for those Casual riders who usually head out for leisure trips during holidays or in the evenings or for longer periods of time may motivate them to purchase annual memberships for these packages.
   
3. **Target the above promotions mainly at stations which are a part of the most travelled routes for Casual riders.**  
   
   Objective - Based on the above data, the stations which are a part of the most popular routes used by Casual riders will tend to have the maximum amount of crowd containing Casual riders. Thus, this can lead to an increase in awareness amongst Casual riders which may prompt them to consider buying an annual membership.
