## Case Study: How Does a Bike-Share Navigate Speedy Success?
This case study by Abdul-Samed which  is a capstone project which is  a partial fufilment of a Google Analytics Professional Certificate,

## Introduction
Welcome to the Cyclistic bike-share analysis case study! In this case study, you will perform many real-world tasks of a junior data analyst. You will work for a fictional company, Cyclistic, and meet different characters and team members. In order to answer the key business questions, you will follow the steps of the data analysis process: **ask, prepare, process, analyze, share, and act.** Along the way, the Case Study Roadmap tables — including guiding questions and key tasks — will help you stay on the right path.
By the end of this lesson, you will have a portfolio-ready case study. Download the packet and reference the details of this case study anytime. Then, when you begin your job hunt, your case study will be a tangible way to demonstrate your knowledge and skills to potential employers.

## Scenario
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

## ASK PHASE
Three questions will guide the future marketing program:

1.How do annual members and casual riders use Cyclistic bikes differently?
2.Why would casual riders buy Cyclistic annual memberships?
3.How can Cyclistic use digital media to influence casual riders to become members?

### Business Task
Our objective is to analyse how annual members and casual riders use Cyclistic bikes differently,why would casual riders buy Cyclistic annual memberships and how can Cyclistic use digital media to influence casual riders to become members.

## PREPARE PHASE
data cleaning and manipulation
observe and familiarize with data
check for null & missing values
Perform sanity check

In [None]:
#import neccessary packages 
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
#loading the datasets

q1_2020 = pd.read_csv('../input/cyclistic-12-months-dataset/Divvy_Trips_2020_Q1/Divvy_Trips_2020_Q1.csv')
q4_2019 = pd.read_csv('../input/cyclistic-12-months-dataset/Divvy_Trips_2019_Q4/Divvy_Trips_2019_Q4.csv')
q3_2019 = pd.read_csv('../input/cyclistic-12-months-dataset/Divvy_Trips_2019_Q3/Divvy_Trips_2019_Q3.csv')
q2_2019 = pd.read_csv('../input/cyclistic-12-months-dataset/Divvy_Trips_2019_Q2.csv/Divvy_Trips_2019_Q2.csv')




#q3_2019


In [None]:
q1_2020.head()

### PROCESS PHASE
1. Check the data for errors. 
2. Choose your tools. 
3. Transform the data so you can work with it effectively. 
4. Document the cleaning process.

In [None]:
 

#renaming to be consistent with q1_2020

q3_2019.rename(columns=
               {"trip_id": "ride_id",
                "bikeid" : "rideable_type",
               "start_time": "started_at",
               "end_time": "ended_at",
               "to_station_id" : "end_station_id",
               "from_station_id":"start_station_id",
               "from_station_name" :"start_station_name",
               "to_station_name" :"end_station_name",
               "usertype":"member_casual"},inplace=True)

q4_2019.rename(columns=
               {"trip_id": "ride_id",
                "bikeid" : "rideable_type",
               "start_time": "started_at",
               "end_time": "ended_at",
               "to_station_id" : "end_station_id",
               "from_station_id":"start_station_id",
               "from_station_name" :"start_station_name",
               "to_station_name" :"end_station_name",
               "usertype":"member_casual"},inplace=True)

 


q2_2019.rename(columns=
               {"01 - Rental Details Rental ID": "ride_id",
                "01 - Rental Details Bike ID" : "rideable_type",
               "01 - Rental Details Local Start Time": "started_at",
               "01 - Rental Details Local End Time": "ended_at",
               "03 - Rental Start Station ID" : "end_station_id",
               "02 - Rental End Station ID":"start_station_id",
               "03 - Rental Start Station Name" :"start_station_name",
               "02 - Rental End Station Name" :"end_station_name",
               "User Type":"member_casual"},inplace=True)


In [None]:
#
q3_2019['ride_id']=q3_2019['ride_id'].astype('str')
q3_2019['rideable_type']=q3_2019['rideable_type'].astype('str')
q4_2019['ride_id']=q4_2019['ride_id'].astype('str')
q4_2019['rideable_type']=q4_2019['rideable_type'].astype('str')
q2_2019['ride_id']=q2_2019['ride_id'].astype('str')
q2_2019['rideable_type']=q2_2019['rideable_type'].astype('str')

In [None]:
q3_2019.info()

In [None]:
#the next thing is to merge the datasets in to one named all_trips
all_trips= pd.merge(left=q2_2019,right=q3_2019,how='outer')
all_trips= pd.merge(left=all_trips,right=q4_2019,how='outer')
all_trips= pd.merge(left=all_trips,right=q1_2020,how='outer')

In [None]:
all_trips.head()

#### deleting columns that we dont need

In [None]:
#since we dont need them we use del to drop them
del all_trips['01 - Rental Details Duration In Seconds Uncapped']
del all_trips['Member Gender']
del all_trips['05 - Member Details Member Birthday Year']
del all_trips['start_lat']
del all_trips['start_lng']
del all_trips['end_lng']
del all_trips['end_lat']
del all_trips['birthyear']
del all_trips['tripduration']
del all_trips['gender']

In [None]:
all_trips.head()

In [None]:
#with the member_casual we need to ressign the entries:Customer & Subscriber to either member or casual while maintaing
#casual and member values
all_trips['member_casual'] = all_trips['member_casual'].map({
    'casual':'casual',
    'member':'member',
    'Subscriber':'member',
    'Customer':'casual'
})

In [None]:
all_trips.head()

### lets break the date in to months days and year

In [None]:
#lets start with creating a new column 'Date' 
all_trips['Date'] = pd.DatetimeIndex(all_trips['started_at']).strftime("%d %b %Y")

all_trips['Month'] = pd.DatetimeIndex(all_trips['started_at']).strftime("%b")
all_trips['Day'] = pd.DatetimeIndex(all_trips['started_at']).strftime("%d")
all_trips['Year'] = pd.DatetimeIndex(all_trips['started_at']).strftime("%Y")


In [None]:
all_trips.head()

## calculate the ride length in seconds

In [None]:
#first convert to ended_at & started_at to timeIndex
all_trips['started_at'] = pd.DatetimeIndex(all_trips['started_at'])
all_trips['ended_at'] = pd.DatetimeIndex(all_trips['ended_at'])

#then proceed to do the subtration to find the diff
all_trips['ride_length'] = all_trips["ended_at"] - all_trips["started_at"]

#then to convert the minutes to seconds
all_trips['ride_length']=all_trips['ride_length']/np.timedelta64(1,'s')

In [None]:
all_trips.head()

### Deleting negative numbers and those at HQ QR

In [None]:
#removing some negatives and rides for testing at HQ
#first filter them into a variable
df2=all_trips[(all_trips['ride_length']<0)| (all_trips['end_station_name']=='HQ QR')].index



#then now delete the from various columns and rows
all_trips.drop(df2,inplace=True)

## ANALYSE PHASE
After procesing the data ,the next phase is to analyse
1. Aggregate your data so it’s useful and accessible. 
2. Organize and format your data. 
3. Perform calculations.
4. Identify trends and relationships

In [None]:
all_trips.describe()

In [None]:
all_trips.pivot_table(values="ride_length", index=['member_casual'],aggfunc="max")

In [None]:
all_trips.pivot_table(values="ride_length", index=['member_casual'],aggfunc="mean")

## Station Analysis on ride length

In [None]:
station_avg = all_trips.pivot_table(values="ride_length",    index=['start_station_name'],aggfunc="count").sort_values('ride_length', ascending=False).head(10)
station_avg

In [None]:
station_avg = all_trips.pivot_table(values="ride_length",    index=['start_station_name'],aggfunc="count").sort_values('ride_length', ascending=False).tail(10)
station_avg

In [None]:
bike_avg = all_trips.pivot_table(values="ride_length",    index=['end_station_name'],aggfunc="mean").sort_values('ride_length', ascending=False)
bike_avg

### check out yearly subs

In [None]:
subs=all_trips.pivot_table(values="ride_length", index=['member_casual','Year'],aggfunc="mean").sort_values('ride_length', ascending=False)
subs

## SHARE 
1. Determine the best way to share your findings. 
2. Create effective data visualizations. 
3. Present your findings.


In [None]:
import matplotlib.pyplot as plt 
#plt.rc('font', size=14)

### lets see the monthly average ride length

In [None]:
#lets see the monthly average ride length
monthly_ride=all_trips.pivot_table(values="ride_length", index=['Month'],aggfunc="mean").sort_values('ride_length', ascending=False)
monthly_ride
monthly_ride.plot(kind='bar');
plt.xticks(rotation=90);

From the graph it shows **August** is month most rider do spend more time riding followed by:
**July,March,September,Febuary,June....**

### The monthly average ride length with respect to member subscribtion

In [None]:
#lets see the monthly average ride length with respect to member subscribtion
monthly_sub_ride=all_trips.pivot_table(values="ride_length", index=['Month','member_casual'],aggfunc="mean").sort_values('ride_length', ascending=False)
monthly_sub_ride
monthly_sub_ride.plot(kind='bar');
plt.xticks(rotation=90);

Evident from the graph,With **Casual** riders,the highest average time spend on riding is in **January**,but with **Member** riders, the highest average time spend on riding is in **July**.

###  Week Day average ride_length 

In [None]:
# lets convert Date to DatetimeIndex
all_trips['Date'] = pd.DatetimeIndex(all_trips['Date'])


#create a new column day_of_week to store the names of day of the week
all_trips['day_of_week']= all_trips['Date'].dt.day_name()

In [None]:
weekday_ride=all_trips.pivot_table(values="ride_length",    index=['day_of_week'],aggfunc="mean").sort_values('ride_length', ascending=False)
weekday_ride.plot(kind='bar')

From the graph,the amount of time spend on riding is usually high on **Sunday & Saturday** repectivly on the average.

###  Weekly average ride_lenghth with respect to membership(casual & member)

In [None]:
weekly_ride=all_trips.pivot_table(values="ride_length",    index=['day_of_week','member_casual'],aggfunc="mean").sort_values('ride_length', ascending=False)
weekly_ride.plot(kind='bar')

According to the graph,**Casual** riders mostly spend moretime riding on the bikes on friday,wednesday and thurdays.But **Member** riders spend moretime riding on weekends then monday.

It is worth noting that from the previous average ride lenghth graph which shows **saturdays and sundays** with highest time riders spend riding,it is most likely to be impacted by **Member** riders.This is evident from the graph above that has  weekends has the highest amount of time spend on riding by **Member**  riders. Also from the graph casual riders have **friday** has day most **casual** riders spend more time which influences the previous graph where **friday** is **3rd** on the average time spent on bikes.

In [None]:
weekly_count=all_trips.pivot_table(values="ride_id",    index=['day_of_week'],aggfunc="count").sort_values('ride_id', ascending=False)
weekly_count.plot(kind='bar')

from the graph,**Tuesday,Wednesday** appears  to be the highest days where most people actually take rides.

In [None]:
weekly_count=all_trips.pivot_table(values="ride_id",    index=['day_of_week','member_casual'],aggfunc="count").sort_values('ride_id', ascending=False)
weekly_count.plot(kind='bar')

in terms of the graph,**member** takes more rides on Tuesday while less peoble take on Sunday and Saturday.
but **casual** more people take rides on Saturday and Sunday but less **casual** take on Tuesday.
this is sharp contrast between **member & casual** 

## ACT PHASE

* ● What is your final conclusion based on your analysis? 
* ● How could your team and business apply your insights? 
* ● What next steps would you or your stakeholders take based on your findings? 
* ● Is there additional data you could use to expand on your findings?

####  1. How do annual members and casual riders use Cyclistic bikes differently?

From the analysis and visualization,we were able to monitor the and as well as differentiate between casual and member riders.According to the graph,**Casual** riders mostly spend moretime riding on the bikes on friday,wednesday and thurdays.But **Member** riders spend moretime riding on weekends then monday.But on a general riders do spend more time ridng on  weekends.
Also with analysis based on monthly usage;the following observation was made:From the graph it shows **August** is month most riders  spend more time riding followed by: July,March,September,Febuary,June.
furthermore with respect types of riders with to it is evident from the graph,With Casual riders,the highest average time spend on riding is in **January**,but with Member riders, the highest average time spend on riding is in **July**.

#### 2. Why would casual riders buy Cyclistic annual memberships? 
From the analysis,use the daily to help convert casual riders to member riders,we see casual rider spend more times riding hence we have design attractive packages and discounts for casual users willing to become member riders.This will help convince them to buy annual membership in order to enjoy the package and discount.


#### 3. How can Cyclistic use digital media to influence casual riders to become members?
use the monthly data to plan towards seasonal periods.This because we know the months that have highest average time that is spent ,hence we will know when to run campaigns to attract potential riders.Also since the general amount of time increases on weekends,adverstisement and promotion should be done on weekends .