# **CLYCLISTIC CAPSTONE PROJECT**

**1.	About the company**

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Their users are more likely to ride for leisure, but about 30% use them to commute to work each day.


Cyclistic has flexible pricing plans: single-ride passes, full-day passes, and annual memberships, which has grown its customer segment and reach over time. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.  


**2.	Business Task**

Cyclistic currently has a customer base of casual and member riders but they believe they can expand by maximizing their casual riders and converting them to member riders rather than targeting all-new customers. Business task is to understand the **‘nature of bike usage between casual and annual member and the insight gained applied in coming up with a strategy to increase annual memberships’.**


**Key stakeholders of this study are**; Cyclistic executive team – executive that will decide to approve the recommended marketing program, Lily Moreno – director of marketing, cyclistic marketing analytics team – a team of analyst responsible for analyzing and reporting data that guides marketing strategy.  



**3.	Description of all data sources used**

The data source used for this case study is [Cyclistic Tripdata (2101 - 2112)](https://divvy-tripdata.s3.amazonaws.com/index.html). The dataset was made available through Motivate International Inc. This data set contains bike ride details from year 2013 – 2022 (July being the most recent month) collated on a monthly basis and sometimes quarterly basis. It includes information about bike type, start and end time, start and end station, latitude, longitude and customer type. 

Data used is easily accessible, its open source and can be copied but has data privacy issues which prohibits you from using riders’ personally identifiable information. The dataset is large thus ruling out sample bias. Its integrity and credibility is also intact as its first party data and internal to the company. Data collected from the company’s’ website covered a one-year period from January 2021 to December 2021. Each CSV file has riders' trip records by their ride IDs: this is primary key of the data


**4.	Documentation of data cleaning/manipulation**

Python, its library and visualization package was used to clean, manipulate analyze the dataset under review and also provide visualization of result analysis.


**5.	Analyze and share**

In [None]:
#using seaborn and matplot as my visualization tool, i needed to install the package
pip install seaborn

In [None]:
#importing all relevat package to be used for my analysis
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
#importing the datasets using the read_csv function
tripdata_2021_01 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202101-divvy-tripdata.csv")
tripdata_2021_02 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202102-divvy-tripdata.csv")
tripdata_2021_03 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202103-divvy-tripdata.csv")
tripdata_2021_04 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202104-divvy-tripdata.csv")
tripdata_2021_05 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202105-divvy-tripdata.csv")
tripdata_2021_06 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202106-divvy-tripdata.csv")
tripdata_2021_07 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202107-divvy-tripdata.csv")
tripdata_2021_08 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202108-divvy-tripdata.csv")
tripdata_2021_09 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202109-divvy-tripdata.csv")
tripdata_2021_10 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202110-divvy-tripdata.csv")
tripdata_2021_11 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202111-divvy-tripdata.csv")
tripdata_2021_12 = pd.read_csv(r"C:\Users\HP\Desktop\Divvy_trip_data\202112-divvy-tripdata.csv")

In [None]:
#checking to confirm my datasets were imported by sampling one of the dataset
tripdata_2021_04.info(2)

![image.png](attachment:cf61bcd3-82a1-46d6-8341-5cf98d2a78b8.png)

In [None]:
#now combining all the datasets into one single dataset so its easier to analyse as a single set
bike_data = pd.concat([tripdata_2021_01, tripdata_2021_02, tripdata_2021_03, tripdata_2021_04, tripdata_2021_05, tripdata_2021_06, tripdata_2021_07, tripdata_2021_08, tripdata_2021_09, tripdata_2021_10, tripdata_2021_11, tripdata_2021_12])

In [None]:
#getting a summarised view of combined data
bike_data.head()

![image.png](attachment:420b4338-d69e-4354-b60a-495bc3f6cca2.png)

In [None]:
#to get the data type of the dataset
bike_data.info()

![image.png](attachment:f0246d5b-e160-4fe6-afa9-aa03a56a509c.png)

In [None]:
#checking for null values
bike_data.isna().sum()

![image.png](attachment:d18237a4-58da-4eea-a4ef-aacc5dc2e60e.png)

In [None]:
#getting the value of the dataset, to confirm if the null values are significant, to determine if they would be dropped.
bike_data.shape

![image.png](attachment:7d54bcbf-655a-4f12-8ba0-8a0235ff0dbe.png)

In [None]:
(bike_data.isna().sum()/5595063) * 100

![image.png](attachment:3daa9c7f-9d5e-4096-bf86-8229cb9c24dd.png)

In [None]:
#The null values are less than 15%, so will drop them and proceed with analysis
bike_data.dropna(axis=0, inplace=True)
bike_data.isna().sum()

![image.png](attachment:c8a3b4fb-1952-43da-acbb-2998c1e44e21.png)

In [None]:
#checking for duplicate values
bike_data[bike_data.duplicated()]

![image.png](attachment:7a875391-93e8-4c29-8b20-f9cbdc1e46d0.png)

In [None]:
#changing the date format for start and end time
bike_data['started_at'] = pd.to_datetime(bike_data['started_at'], dayfirst = True)
bike_data['ended_at'] = pd.to_datetime(bike_data['ended_at'], dayfirst = True)
bike_data.info()

![image.png](attachment:ef09a626-2471-4022-8b68-76f2c3d66f56.png)

In [None]:
#creating no of hours, day of the week and month number for start date& time
bike_data['Hour'] = bike_data.started_at.apply(lambda x: x.hour)
bike_data['Day'] = bike_data.started_at.apply(lambda x: x.day_name())
bike_data['Month'] = bike_data.started_at.apply(lambda x: x.month)

bike_data.head()

![image.png](attachment:fead09f9-8bdf-4431-8f0d-4edeb2d62648.png)

In [None]:
# Finding out the current number of Members and Casual Riders 
bike_data.member_casual.value_counts()

![image.png](attachment:204fa8cb-4ae9-48df-8e34-3e01637cee09.png)

In [None]:
#  Finding the number of riders as per day of the week
bike_data.groupby('member_casual')['Day'].value_counts()

![image.png](attachment:c83cfe24-5155-4595-93ca-9cf490c12da1.png)

In [None]:
#to get the total ride length, will subtract the end time from the start time to know the time spent per rider from one station to the other
bike_data['Ride_length'] = (bike_data['ended_at'] - bike_data['started_at'])

import datetime as datetime
from datetime import timedelta
bike_data['Ride_time'] = (bike_data['Ride_length'])/timedelta(minutes=1)
bike_data['Ride_time'] = bike_data['Ride_time'].round(decimals = 2)

bike_data.head()

![image.png](attachment:c6dc18f1-ade9-4947-93dd-dcfa34b9e120.png)

In [None]:
#given co-ordinates, lets find distance in km from one point to the other
bike_data['Lat'] = (bike_data['end_lat'] - bike_data['start_lat'])
bike_data['Lng'] = (bike_data['end_lng']) - bike_data['start_lng']

In [None]:
#using the math function to convert the lat, lng to distance in km
import math

In [None]:
bike_data['Distance'] = np.sqrt((bike_data['Lat']** 2) + (bike_data['Lng'] ** 2))
bike_data['Distance'] = bike_data['Distance'] * 111
bike_data['Distance'].head()

![image.png](attachment:dc1effdb-0f86-4543-95c5-6be1ab0b9238.png)

In [None]:
bike_data.head()

![image.png](attachment:b3569d28-2887-4255-83d1-20fa6cdf544f.png)

In [None]:
#earlier we separated start date into hr, month and day but the month is in numeric format, to convert it to string label that assigns 1 as January, we use the code below
month = {1:'January', 2:'February', 3:'March', 4:'April', 5:'May', 6:'June', 7:'July', 8:'August', 9:'September', 10:'October', 11:'November', 12:'December'}
bike_data['Month_Name'] = bike_data['Month'].map(month)
bike_data['Month_Name'].head()

![image.png](attachment:498061fd-e952-43f8-a006-8aecb650901d.png)

In [None]:
bike_data.head()

![image.png](attachment:87df2f30-8058-4884-a1b4-ced04e4ecd5c.png)

In [None]:
# Finding the number of riders as per month
bike_data.groupby('member_casual')['Month_Name'].value_counts()

![image.png](attachment:ae580427-7044-48a9-9e24-37fea76fafe9.png)

In [None]:
bike_data.groupby('member_casual').mean()['Ride_time']

![image.png](attachment:a6cc1969-e345-4d4b-81ab-68182cb5e93d.png)

In [None]:
#calculating average ride completed per weekday
average_day_ride = bike_data.groupby('Day').mean()['Ride_time']
average_day_ride

![image.png](attachment:886a8e0b-fbcd-4560-921a-2bd2b41cb035.png)

In [None]:
#calculating average ride per user distribution (casual& member)
avg_user_day = bike_data.groupby(['Day', 'member_casual']).agg({'Ride_time':['mean']})
avg_user_day

![image.png](attachment:6e6e4340-f426-469d-b3ec-196dfe53e13b.png)

In [None]:
#now we know the average ride per day per member type, lets get the total ride completed per day per member type to get the actual count
user_daytrip = bike_data.groupby(['Day','member_casual']).size().to_frame()
user_daytrip

![image.png](attachment:94a4c293-7f89-4253-9fe5-139b854cf2c0.png)

In [None]:
pip install plotly

In [None]:
# plotly libraries
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
from plotly.subplots import make_subplots

In [None]:
#to check distribution of riders by count
sns.set_style('whitegrid')
plt.figure(figsize=(6,6))
sns.countplot(x='member_casual', hue='member_casual', data=bike_data, palette='magma_r')

![image.png](attachment:b297cfce-f136-4d0f-b31f-f4f11e0c4ef8.png)

In [None]:
#plot to show the count of rides per weekday
plt.figure(figsize=(8,6))
sns.countplot(x='Day', hue='member_casual', data=bike_data, palette='winter')
plt.tight_layout()

![image.png](attachment:6faa0721-45a6-4e89-b2e1-dd21c59c0129.png)

In [None]:
#to further check count of rides, we checked by hour of the day
plt.figure(figsize=(8,6))
sns.countplot(x='Hour', hue='member_casual', data=bike_data, palette='cividis')
plt.tight_layout()

![image.png](attachment:d81cbdb5-a701-487d-a984-e599f62afa7f.png)

In [None]:
plt.figure(figsize=(8,6))
sns.countplot(x='Month_Name', hue='member_casual', data=bike_data, palette='ocean_r')

![image.png](attachment:346eb68b-4f04-4895-a5b2-5b7a72636863.png)

In [None]:
plt.figure(figsize=(8,6))
sns.barplot(x='member_casual', y='Distance', data=bike_data, palette='coolwarm')

![image.png](attachment:79b19e91-d4ee-4bcb-95df-f01fdfdacb46.png)

In [None]:
plt.figure(figsize=(8,6))
sns.barplot(x='Month_Name', y='Distance', hue = 'member_casual', data=bike_data, palette='Pastel1')

![image.png](attachment:21e5b7b3-1c4f-4241-a8e1-c77c3bc4ce6c.png)

In [None]:
plt.figure(figsize=(8,6))
sns.barplot(x='member_casual', y='Ride_time', data=bike_data, palette='Paired')

![image.png](attachment:4e54c139-71e1-4a1f-97b3-faa7ee28bb0f.png)

In [None]:
plt.figure(figsize=(6,6))
sns.countplot(x='rideable_type', hue='member_casual', data=bike_data, palette='bwr')

![image.png](attachment:918edffc-3942-4f16-a873-a25386fc68be.png)

![Sheet 1 (9).png](attachment:070ac90a-24e1-4173-8cdd-4d2cf6105cd9.png)

![Sheet 1 (8).png](attachment:e0e2f2bf-172f-4e10-996a-df926cf185fe.png)![Sheet 1 (7).png](attachment:9d3d4f3e-9767-4580-9d5c-7f82f58d593d.png)

**6.	Conclusions and Recommendations**

Casual riders had a longer ride time than members, which is a positive for cyclistic in their approach on  expanding revenue by getting casual riders to embrace membership. This can be achieved by introducing coupons on annual membership based on ride time and distance covered, this would attract the casual riders to subscribe since they currently ride for a longer time than current members. 


It was also observed that the largest number of casual rides occurred on Saturday and Sunday. This indicates that the purpose of riding for most casual riders is mostly leisure than commute. Cyclistic can target the casual riders by sending out an email letter or push text that introduces a weekend only membership at a discounted price to the annual membership. 

A discounted weekday plan should also be implemented, this will get more casual riders to use their bike often on weekdays and in time entice them to embrace annual membership plan. 


The analysis also showed that more rides happened in July and August which are summer months. Cyclistic can focus on this period and maximize profit by introducing a summer only membership for casual riders who will be hesitant to subscribe to a full year membership plan. 


Streeter Dr Avenue was the start point with the highest number of rides for casual riders (which consist of parks, recreational centers, beach etc; this buttresses the point that most casual riders ride for leisure). To engage casual riders at the other stations, a tour the city campaign can be organized to cover a minimum of 52 suggested routes weekly starting at currently less engaged start stations.The bike is rented at these stations for a token in form of rental fee and the idea of touring the city in a year is sold (the price in determining the rental fee can be benchmarked to an annual subscription discounted on a weekly basis).


Monthly/weekly membership plan should be introduced, this will propel casual rider to embrace the idea of membership plan and will eventually result in them embracing the annual plan as they become more comfortable with the weekly/monthly plan. 


A customer appreciation incentive, such as 10% off annual membership fee renewal for returning riders who wish to enjoy cyclistic offers long term could also be offered.
