# **Google Data Analytics Capstone Project: How Does a Bike-Share Navigate Speedy Success?**

![image.png](attachment:0ae808d0-d987-4529-8471-b3efa59dbb24.png)

## Introduction

The Cyclistic case study is part of the Google Professional Data Analytics Certification. In this case study, I play the role of a junior data analyst at Cyclistic. I will follow basic data analysis process steps to solve core business problems.

I'm a junior data analyst working on the marketing analyst team at Cyclistic, a bike-sharing company in Chicago. The marketing manager believes the company's future success will depend on maximizing annual memberships. Therefore, as a marketing team, he wants to understand how cyclistic bikes are used in different ways according to membership types. Here, my task will be to review and analyze 12 months of data, to obtain insights from this data through certain visualizations and present it to the team. I have analyzed very specific data and tried to support these insights with clear graphs so that my suggestions can be taken into account.


## About the Company

In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime. Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. 

One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members. Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members.

She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs. Moreno has set a clear goal: Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics. Moreno and her team are interested in analyzing the Cyclistic historical bike trip data to identify trends.


In this case-study, I will be drawing upon the 6 phases of the analysis process, namely Ask, Prepare, Process, Analyze, Share and Act.
I downloaded the datasets from the link below.

[The previous 12 months of Cyclistic trip data](https://divvy-tripdata.s3.amazonaws.com/index.html)

## Phase 1 – ASK 

Three questions will guide the future marketing program: 
1.	How do annual members and casual riders use Cyclistic bikes differently? 
2.	Why would casual riders buy Cyclistic annual memberships? 
3.	How can Cyclistic use digital media to influence casual riders to become members?

1.1 	Business Task
Analyze data to gain insights into how users use Cyclistic's bikes by membership type and to identify trends based on Cyclistic's marketing strategy. 

1.2 	Key Stakeholders
* 	Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
* 	Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
* 	Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals as well as how you, as a junior data analyst, can help Cyclistic achieve them.
* 	Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

1.3 	A clear statement of the business task
The main purpose in this job role is to analyze Cyclistic data and gain insights for marketing trends based on user orientation and driving data and share them with key stakeholders. Specific and clear visualizations should be made and ready for sharing so that all stakeholders can understand the data.


## Phase 2 – PREPARE

**Guiding Questions**

**Q1: Where is your data located?**

A1: The data is available at the address I gave the link above. The data I have downloaded from this address have been handled in a way to cover 12 months, January 2022 and December 2022.
You can also find data from my personal Kaggle account.

**Q2: How is the data organized?**

A2: Many tools and programs can be used to edit data. I chose the R programming language for my work. In the R programming language, I combined 12 months of data into a single data frame, then added the necessary columns and removed the columns that would not be useful to us in our analysis.

**Q3: Are there issues with bias or credibility in this data? Does your data ROCCC?** 

A3: Yes, the data appears to be ROCCC. There don't seem to be any issues with bias or credibility in the data, as it consists of trip data from Cyclistic's own bike riders and is made available under license by Motivate International Inc. Additionally, the data appears to be reliable, original, comprehensive, current, and cited, meeting the criteria for ROCCC.

**Q4: How are you addressing licensing, privacy, security, and accessibility?**

A4: It is important to adhere to any licensing restrictions when using publicly available data. As mentioned in the prompt, the data is made available under a specific license, and users should review and comply with any requirements or restrictions specified by the license. To address privacy and security concerns, any personally identifiable information should be removed or obscured from the data. Additionally, it is important to ensure that the data is accessible and usable by anyone who needs to work with it.

**Q5: How did you verify the data’s integrity?**

A5: To verify the integrity of the data, it is important to check the data for any inconsistencies or errors. This may include checking for missing or incorrect data, identifying outliers or unusual values, or comparing data with other sources to ensure it is correct. I carried out this analysis with simple and fast methods in R language and as a result I checked that the data is consistent. I identified missing and incorrect data types, removed them and made them ready for analysis. 

**Q6: How does it help you answer your question?**

A6: Cyclistic trip data helps answer the question of how annual members and casual cyclists use the bike share program differently. Analyzing the patterns in the data, my work to identify usage differences between the two groups and to understand how each group uses the bikes has helped me come up with answers that are very effective in answering marketing strategy questions.

**Q7: Are there any problems with the data?**

A7: I found that there are some deficiencies and incorrect data types in the data. I solved them using methods in R language and made them suitable for analysis. 


I won't share the data preprocessing code here, but you can find it in my [github](https://github.com/fatihilhan42/Cyclistic_Bike_Share_Data_Analysis/blob/main/Analysis/Data%20preprocessing%20and%20cleaning.R) repo.


## Phase 3 – PROCESS
**Guiding Questions:**

**Q1: What tools are you choosing and why?**

A1: I used the R programming language to process the data and make it ready for analysis. Thanks to its simple syntax and fast usage, I both consolidated my knowledge and analyzed and organized the data. I visualized the data I edited in the R language in the Tableau program. Thus, I developed my skills in both programs.

**Q2: Have you ensured your data’s integrity?**

A2: Yes, I managed to ensure the integrity of the data. I added 12 months of data to a single data frame, making it ready for a year's analysis.

**Q3:  Have you documented your cleaning process so you can review and share those results?**

A3: Yes, I have documented these operations and analysis process using R markdown. You can reach the markdown in my github account via this link.


## Phase 4 – Analyze

**Q1: How should you organize your data to perform analysis on it?**
A1: First of all, the driving data had to be grouped according to user types, hours, time zones of the day, days of the week, months and seasons, and I organized the data to be that way. As can be seen in the R markdown document that I just shared during the process phase, time columns were created and the driving data of the users were classified according to the time type. In addition to examining the average and total driving times of the user types, the driving time lengths were also extracted from the data in the data frame, such as the end and start time, and included in the analysis, and important insights were obtained.

Now let's take a look at the analysis we have done in the R programming language. The insights we get as a result of these analyzes will help us a lot in the presentation of our project;

Let's add the necessary libraries;




In [1]:
#load libraries 
library(tidyverse) #calculations
library(lubridate) #dates 
library(hms) #time
library(data.table) #exporting data frame


── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.2 ──
[32m✔[39m [34mggplot2[39m 3.4.0      [32m✔[39m [34mpurrr  [39m 1.0.1 
[32m✔[39m [34mtibble [39m 3.1.8      [32m✔[39m [34mdplyr  [39m 1.0.10
[32m✔[39m [34mtidyr  [39m 1.2.1      [32m✔[39m [34mstringr[39m 1.5.0 
[32m✔[39m [34mreadr  [39m 2.1.3      [32m✔[39m [34mforcats[39m 0.5.2 
── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
Loading required package: timechange


Attaching package: ‘lubridate’


The following objects are masked from ‘package:base’:

    date, intersect, setdiff, union



Attaching package: ‘hms’


The following object is masked from ‘package:lubridate’:

    hms



Attaching package: ‘data.table’


The following objects are masked from ‘package:lubridat

Now we define the data set we prepared before;

In [2]:
Cyclistic_Tableau <- read.csv("/kaggle/input/cyclistic-bike-share-capstone-project/cyclistic_data.csv") 

Now let's look at the results of our analysis step by step;

**Total Rides**

In [3]:
#Total number of rides
nrow(Cyclistic_Tableau)

In this data frame, we have a total of 5,651,522 rides. It took place between January 2022 and December 2022.

**Total Rides by User Type**

In [4]:
#member type 
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  count(member_casual)

member_casual,n
<chr>,<int>
casual,2313357
member,3338165


Of the total 5,651,522 rides, 2,313,357 were made by casual members and 3,338,165 by annual members.

**Total Rides by per Bike Type**

In [5]:
#Member types and total rides by bike type.
Cyclistic_Tableau %>% 
  group_by(member_casual, rideable_type) %>% 
  count(rideable_type)

member_casual,rideable_type,n
<chr>,<chr>,<int>
casual,classic_bike,887708
casual,docked_bike,174690
casual,electric_bike,1250959
member,classic_bike,1705809
member,electric_bike,1632356


When we group our data according to bike types; It was observed that casual members preferred the most electric bike type with 1,250,959 rides, while annual members preferred the classic bike type the most with 1,705,809 rides. 

When these data are examined, electric bicycles are preferred by users (both casual and annual members) at a rate of 51% of the total ride. It was seen that the second most used type of bicycle after the electric bicycle is the classical bicycle at a rate of 45%. It has been clearly seen that the docked bike type is used only by casual members.

**Riding by the Hours**

In [6]:
#Driving by the hours
#total rides by member type
Cyclistic_Tableau %>% 
  group_by(member_casual) %>%
  count(hour) %>%
  print(n=48)

#total rides 
Cyclistic_Tableau %>% 
  count(hour) %>%
  print() 

[90m# A tibble: 48 × 3[39m
[90m# Groups:   member_casual [2][39m
   member_casual  hour      n
   [3m[90m<chr>[39m[23m         [3m[90m<int>[39m[23m  [3m[90m<int>[39m[23m
[90m 1[39m casual            0  [4m4[24m[4m6[24m245
[90m 2[39m casual            1  [4m2[24m[4m9[24m949
[90m 3[39m casual            2  [4m1[24m[4m8[24m546
[90m 4[39m casual            3  [4m1[24m[4m1[24m024
[90m 5[39m casual            4   [4m7[24m549
[90m 6[39m casual            5  [4m1[24m[4m2[24m375
[90m 7[39m casual            6  [4m2[24m[4m9[24m344
[90m 8[39m casual            7  [4m5[24m[4m1[24m384
[90m 9[39m casual            8  [4m6[24m[4m9[24m590
[90m10[39m casual            9  [4m7[24m[4m1[24m919
[90m11[39m casual           10  [4m9[24m[4m2[24m664
[90m12[39m casual           11 [4m1[24m[4m2[24m[4m0[24m980
[90m13[39m casual           12 [4m1[24m[4m4[24m[4m3[24m712
[90m14[39m casual           13 [4m1[24m[4m4[2

When we examine our driving data according to the hours of the day, it is seen that there is a serious increase in the number of driving in the afternoon. The number of driving in the afternoon and evening hours was higher than the other time periods. 

When the number of driving hours of the day according to the user type is examined, it is clearly seen that the annual and casual members drive more especially in the afternoon and evening hours. 

**Riding by Day**

In [7]:
# Riding by Day

# -morning-

#total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(time_of_day == "Morning") %>% 
  count(time_of_day)

#Total Rides
Cyclistic_Tableau %>%
  filter(time_of_day == "Morning") %>% 
  count(time_of_day)

# -afternoon- 

#total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(time_of_day == "Afternoon") %>% 
  count(time_of_day)

#Total Rides
Cyclistic_Tableau %>%
  filter(time_of_day == "Afternoon") %>% 
  count(time_of_day)

# -evening- 
#total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(time_of_day == "Evening") %>% 
  count(time_of_day)

#Total Rides
Cyclistic_Tableau %>%
  filter(time_of_day == "Evening") %>% 
  count(time_of_day)

# -night-
#total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(time_of_day == "Night") %>% 
  count(time_of_day)

#Total Rides
Cyclistic_Tableau %>%
  filter(time_of_day == "Night") %>% 
  count(time_of_day)

# -all times of day- 
#Total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  count(time_of_day)

#number of rides
Cyclistic_Tableau %>%
  group_by(time_of_day) %>%
  count(time_of_day)

member_casual,time_of_day,n
<chr>,<chr>,<int>
casual,Morning,435881
member,Morning,908578


time_of_day,n
<chr>,<int>
Morning,1344459


member_casual,time_of_day,n
<chr>,<chr>,<int>
casual,Afternoon,1046967
member,Afternoon,1418179


time_of_day,n
<chr>,<int>
Afternoon,2465146


member_casual,time_of_day,n
<chr>,<chr>,<int>
casual,Evening,704821
member,Evening,891274


time_of_day,n
<chr>,<int>
Evening,1596095


member_casual,time_of_day,n
<chr>,<chr>,<int>
casual,Night,125688
member,Night,120134


time_of_day,n
<chr>,<int>
Night,245822


member_casual,time_of_day,n
<chr>,<chr>,<int>
casual,Afternoon,1046967
casual,Evening,704821
casual,Morning,435881
casual,Night,125688
member,Afternoon,1418179
member,Evening,891274
member,Morning,908578
member,Night,120134


time_of_day,n
<chr>,<int>
Afternoon,2465146
Evening,1596095
Morning,1344459
Night,245822


The driving behaviors of the users in certain time periods of the day were examined. According to these reviews, it has been observed that annual and casual members have a higher driving percentage in the Afternoon compared to other time periods. While the evening time period has the second highest percentage of driving, the least driving occurred at night.
It is noteworthy that casual members have a higher percentage of driving at night than annual members.


**Riding by the Weekday**

In [8]:
# -Day of The Week-

#total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  count(day_of_week)

#total rides
Cyclistic_Tableau %>%
  count(day_of_week)

member_casual,day_of_week,n
<chr>,<chr>,<int>
casual,Çarşamba,273409
casual,Cuma,333433
casual,Cumartesi,471395
casual,Pazar,387400
casual,Pazartesi,276660
casual,Perşembe,308287
casual,Salı,262773
member,Çarşamba,522662
member,Cuma,466030
member,Cumartesi,442210


day_of_week,n
<chr>,<int>
Çarşamba,796071
Cuma,799463
Cumartesi,913605
Pazar,773643
Pazartesi,748955
Perşembe,839485
Salı,780300


When we examine according to the days of the week, it has been observed that most weekend driving takes place. It is assumed that the most important reason for this is that the members prefer to tour by bike as a weekend activity, since they have weekend holidays.

**Rides by the Month**

In [9]:
# -Day of The Month-
#Total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  count(day) %>%
  print(n = 62)

#total rides 
Cyclistic_Tableau %>%
  count(day) %>%
  print()

#- MONTH-
#total rides by member
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  count(month) %>%
  print(n=24)
  
Cyclistic_Tableau %>%
  count(month)

[90m# A tibble: 62 × 3[39m
[90m# Groups:   member_casual [2][39m
   member_casual   day      n
   [3m[90m<chr>[39m[23m         [3m[90m<int>[39m[23m  [3m[90m<int>[39m[23m
[90m 1[39m casual            1  [4m7[24m[4m1[24m886
[90m 2[39m casual            2  [4m7[24m[4m8[24m217
[90m 3[39m casual            3  [4m8[24m[4m1[24m424
[90m 4[39m casual            4  [4m7[24m[4m3[24m178
[90m 5[39m casual            5  [4m7[24m[4m9[24m460
[90m 6[39m casual            6  [4m6[24m[4m4[24m425
[90m 7[39m casual            7  [4m6[24m[4m3[24m212
[90m 8[39m casual            8  [4m5[24m[4m9[24m072
[90m 9[39m casual            9  [4m9[24m[4m1[24m439
[90m10[39m casual           10  [4m9[24m[4m3[24m244
[90m11[39m casual           11  [4m7[24m[4m0[24m116
[90m12[39m casual           12  [4m7[24m[4m1[24m958
[90m13[39m casual           13  [4m7[24m[4m7[24m613
[90m14[39m casual           14  [4m7[24m[4m9[24m717
[

month,n
<chr>,<int>
April,369887
August,783749
December,181313
February,115085
January,103509
July,821176
June,766835
March,283207
May,632933
November,336909


**Rides by The Seasons**

In [10]:
# --SEASONS- 

# -Spring-

#total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(season == "Spring") %>%
  count(season)

#Total rides 
Cyclistic_Tableau %>%
  filter(season == "Spring") %>%
  count(season)

# -summer- 

#Total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(season == "Summer") %>%
  count(season)


#Total rides 
Cyclistic_Tableau %>%
  filter(season == "Summer") %>%
  count(season)

# -Fall-

#Total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(season == "Fall") %>%
  count(season)

Cyclistic_Tableau %>%
  filter(season == "Fall") %>%
  count(season)

# -Winter-

#Total rides by member type
Cyclistic_Tableau %>%
  group_by(member_casual) %>%
  filter(season == "Winter") %>%
  count(season)

Cyclistic_Tableau %>%
  filter(season == "Winter") %>%
  count(season)

# - All seasons- 

#Total rides by member type

Cyclistic_Tableau %>%
  group_by(season, member_casual) %>%
  count(season)

#total rides
Cyclistic_Tableau %>%
  group_by(season) %>%
  count(season)

member_casual,season,n
<chr>,<chr>,<int>
casual,Spring,494624
member,Spring,791403


season,n
<chr>,<int>
Spring,1286027


member_casual,season,n
<chr>,<chr>,<int>
casual,Summer,1129807
member,Summer,1241953


season,n
<chr>,<int>
Summer,2371760


member_casual,season,n
<chr>,<chr>,<int>
casual,Fall,604487
member,Fall,989341


season,n
<chr>,<int>
Fall,1593828


member_casual,season,n
<chr>,<chr>,<int>
casual,Winter,84439
member,Winter,315468


season,n
<chr>,<int>
Winter,399907


season,member_casual,n
<chr>,<chr>,<int>
Fall,casual,604487
Fall,member,989341
Spring,casual,494624
Spring,member,791403
Summer,casual,1129807
Summer,member,1241953
Winter,casual,84439
Winter,member,315468


season,n
<chr>,<int>
Fall,1593828
Spring,1286027
Summer,2371760
Winter,399907


**Analysis of Average Riding Times**

**Average Ride Length**

In [11]:
# -Average Ride Length-

#average of ride_length
Cyclistic_avgRide <- mean(Cyclistic_Tableau$ride_length)
print(Cyclistic_avgRide)

# -Member Type- 
#average ride_length
Cyclistic_Tableau %>% group_by(member_casual) %>% 
  summarise_at(vars(ride_length), list(time = mean))

# -Type of Bike-
Cyclistic_Tableau %>% group_by(member_casual, rideable_type) %>% 
  summarise_at(vars(ride_length), list(time = mean))

Cyclistic_Tableau %>% group_by(rideable_type) %>% 
  summarise_at(vars(ride_length), list(time = mean))


# -HOUR-
# average ride_length by member type
Cyclistic_Tableau %>% group_by(hour,member_casual) %>% 
  summarise_at(vars(ride_length), list(time = mean)) %>% 
  print(n=48)

#average ride_length
Cyclistic_Tableau %>% group_by(hour) %>% 
  summarise_at(vars(ride_length), list(time = mean)) %>% 
  print(n=24)

# -Tıme Of Day- 

# -morning-
#average ride length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>% 
  filter(time_of_day == "Morning") %>%
  summarise_at(vars(ride_length), list(time=mean))

#average ride length
Cyclistic_Tableau %>% filter(time_of_day == "Morning") %>%
  summarise_at(vars(ride_length), list(time=mean))

#- afternoon-
#average ride length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>% 
  filter(time_of_day == "Afternoon") %>%
  summarise_at(vars(ride_length), list(time=mean))


#average ride length
Cyclistic_Tableau %>% filter(time_of_day == "Afternoon") %>%
  summarise_at(vars(ride_length), list(time=mean))


# -evening- 
# average ride length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>% 
  filter(time_of_day == "Evening") %>%
  summarise_at(vars(ride_length), list(time=mean))

#average ride length
Cyclistic_Tableau %>% filter(time_of_day == "Evening") %>%
  summarise_at(vars(ride_length), list(time=mean))


# -Night-
# average ride length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>% 
  filter(time_of_day == "Night") %>%
  summarise_at(vars(ride_length), list(time=mean))

#average ride length
Cyclistic_Tableau %>% filter(time_of_day == "Night") %>%
  summarise_at(vars(ride_length), list(time=mean))



# -All times of day-
# average ride length by member type
Cyclistic_Tableau %>% group_by(time_of_day,member_casual) %>% 
  summarise_at(vars(ride_length), list(time=mean))

#average ride length
Cyclistic_Tableau %>% group_by(time_of_day) %>% 
  summarise_at(vars(ride_length), list(time=mean))

# --DAY OF THE WEEK- 
#average ride_length by member type
Cyclistic_Tableau %>% group_by(member_casual, day_of_week) %>% 
  summarise_at(vars(ride_length), list(time=mean))

#average ride length
Cyclistic_Tableau %>% group_by(day_of_week) %>% 
  summarise_at(vars(ride_length), list(time=mean))


# -DAY OF THE MONTH-
#average ride_length by member type
Cyclistic_Tableau %>% group_by(day,member_casual) %>% 
  summarise_at(vars(ride_length), list(time=mean)) %>%
  print(n=62)

#average ride_length
Cyclistic_Tableau %>% group_by(day) %>% 
  summarise_at(vars(ride_length), list(time=mean)) %>%
  print(n=31)


# -MONTH-
#average ride_length by member type
Cyclistic_Tableau %>% group_by(month,member_casual) %>% 
  summarise_at(vars(ride_length), list(time=mean)) %>%
  print(n=24)

#average ride_length
Cyclistic_Tableau %>% group_by(month) %>% 
  summarise_at(vars(ride_length), list(time=mean)) %>%
  print(n=12)  


# -SEASON-

# -SPRİNG-

#average ride_length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>%
  filter(season == "Spring") %>%
  summarise_at(vars(ride_length), list(time=mean))

#average ride length 
Cyclistic_Tableau %>% 
  filter(season == "Spring") %>%
  summarise_at(vars(ride_length), list(time=mean))

# -SUMMER- 
#average ride_length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>%
  filter(season == "Summer") %>%
  summarise_at(vars(ride_length), list(time=mean))

#average ride length 
Cyclistic_Tableau %>% 
  filter(season == "Winter") %>%
  summarise_at(vars(ride_length), list(time=mean))

# -Fall-
#average ride_length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>%
  filter(season == "Fall") %>%
  summarise_at(vars(ride_length), list(time=mean))

#average ride length 
Cyclistic_Tableau %>% 
  filter(season == "Fall") %>%
  summarise_at(vars(ride_length), list(time=mean))

# -WİNTER- 
#average ride_length by member type
Cyclistic_Tableau %>% group_by(member_casual) %>%
  filter(season == "Winter") %>%
  summarise_at(vars(ride_length), list(time=mean))

#average ride length 
Cyclistic_Tableau %>% 
  filter(season == "Winter") %>%
  summarise_at(vars(ride_length), list(time=mean))


# -ALL SEASONS- 
#average ride_length by member type
Cyclistic_Tableau %>% group_by(season, member_casual) %>%
  summarise_at(vars(ride_length), list(time=mean))

Cyclistic_Tableau %>% group_by(season) %>%
  summarise_at(vars(ride_length), list(time=mean))

[1] 16.35967


member_casual,time
<chr>,<dbl>
casual,22.01987
member,12.43714


member_casual,rideable_type,time
<chr>,<chr>,<dbl>
casual,classic_bike,24.56478
casual,docked_bike,50.75654
casual,electric_bike,16.20101
member,classic_bike,13.34589
member,electric_bike,11.48749


rideable_type,time
<chr>,<dbl>
classic_bike,17.18589
docked_bike,50.75654
electric_bike,13.53251


[90m# A tibble: 48 × 3[39m
[90m# Groups:   hour [24][39m
    hour member_casual  time
   [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m         [3m[90m<dbl>[39m[23m
[90m 1[39m     0 casual         19.1
[90m 2[39m     0 member         12.1
[90m 3[39m     1 casual         21.2
[90m 4[39m     1 member         12.1
[90m 5[39m     2 casual         20.0
[90m 6[39m     2 member         12.2
[90m 7[39m     3 casual         19.3
[90m 8[39m     3 member         12.1
[90m 9[39m     4 casual         17.4
[90m10[39m     4 member         12.4
[90m11[39m     5 casual         15.2
[90m12[39m     5 member         10.5
[90m13[39m     6 casual         15.8
[90m14[39m     6 member         10.9
[90m15[39m     7 casual         14.6
[90m16[39m     7 member         11.5
[90m17[39m     8 casual         16.7
[90m18[39m     8 member         11.3
[90m19[39m     9 casual         21.5
[90m20[39m     9 member         11.4
[90m21[39m    10 casual         25.6
[90m

member_casual,time
<chr>,<dbl>
casual,21.63723
member,11.682


time
<dbl>
14.90954


member_casual,time
<chr>,<dbl>
casual,23.75701
member,12.8081


time
<dbl>
17.45819


member_casual,time
<chr>,<dbl>
casual,20.16829
member,12.71229


time
<dbl>
16.00479


member_casual,time
<chr>,<dbl>
casual,19.25968
member,11.72779


time
<dbl>
15.57882


time_of_day,member_casual,time
<chr>,<chr>,<dbl>
Afternoon,casual,23.75701
Afternoon,member,12.8081
Evening,casual,20.16829
Evening,member,12.71229
Morning,casual,21.63723
Morning,member,11.682
Night,casual,19.25968
Night,member,11.72779


time_of_day,time
<chr>,<dbl>
Afternoon,17.45819
Evening,16.00479
Morning,14.90954
Night,15.57882


member_casual,day_of_week,time
<chr>,<chr>,<dbl>
casual,Çarşamba,19.02958
casual,Cuma,20.56129
casual,Cumartesi,24.6813
casual,Pazar,25.1331
casual,Pazartesi,22.66091
casual,Perşembe,19.70175
casual,Salı,19.66251
member,Çarşamba,11.85962
member,Cuma,12.25084
member,Cumartesi,13.82131


day_of_week,time
<chr>,<dbl>
Çarşamba,14.32213
Cuma,15.71689
Cumartesi,19.42476
Pazar,19.43685
Pazartesi,15.95644
Perşembe,14.84872
Salı,14.46989


[90m# A tibble: 62 × 3[39m
[90m# Groups:   day [31][39m
     day member_casual  time
   [3m[90m<int>[39m[23m [3m[90m<chr>[39m[23m         [3m[90m<dbl>[39m[23m
[90m 1[39m     1 casual         20.6
[90m 2[39m     1 member         12.1
[90m 3[39m     2 casual         22.3
[90m 4[39m     2 member         12.4
[90m 5[39m     3 casual         23.0
[90m 6[39m     3 member         12.4
[90m 7[39m     4 casual         22.2
[90m 8[39m     4 member         12.5
[90m 9[39m     5 casual         23.8
[90m10[39m     5 member         12.6
[90m11[39m     6 casual         21.1
[90m12[39m     6 member         11.9
[90m13[39m     7 casual         19.8
[90m14[39m     7 member         12.1
[90m15[39m     8 casual         19.3
[90m16[39m     8 member         11.8
[90m17[39m     9 casual         23.0
[90m18[39m     9 member         13.0
[90m19[39m    10 casual         23.4
[90m20[39m    10 member         13.0
[90m21[39m    11 casual         21.9
[90m2

member_casual,time
<chr>,<dbl>
casual,25.06613
member,12.24621


time
<dbl>
17.17693


member_casual,time
<chr>,<dbl>
casual,22.75981
member,13.40906


time
<dbl>
12.40008


member_casual,time
<chr>,<dbl>
casual,18.7216
member,11.84989


time
<dbl>
14.45611


member_casual,time
<chr>,<dbl>
casual,17.88681
member,10.93148


time
<dbl>
12.40008


season,member_casual,time
<chr>,<chr>,<dbl>
Fall,casual,18.7216
Fall,member,11.84989
Spring,casual,25.06613
Spring,member,12.24621
Summer,casual,22.75981
Summer,member,13.40906
Winter,casual,17.88681
Winter,member,10.93148


season,time
<chr>,<dbl>
Fall,14.45611
Spring,17.17693
Summer,17.86336
Winter,12.40008


## Phase 5 - SHARE

Of course, after analyzing this data, visualizing it and sharing it with stakeholders is a very important step. For this reason, you can look at the visualizations we obtained from the Tableau program, where I created explanatory graphics that everyone can understand. You can click this link to check out my [Tableau](https://public.tableau.com/app/profile/fatih.ilhan/viz/GoogleDataAnalyticsCapstoneStudyCase1/Averageridingtimesperseasonbymembertype) account.

**Total Number of Rides**

In this data frame, we have a total of 5,651,522 rides. It took place between January 2022 and December 2022.

**Total Rides by User Type**

Of the total 5,651,522 rides, 2,313,357 were made by casual members and 3,338,165 by annual members.


![image.png](attachment:8e1ed44d-e34f-4799-ac72-725c78379c53.png)

**Total Rides by per Bike Type**

When we group our data according to bike types; It was observed that casual members preferred the most electric bike type with 1,250,959 rides, while annual members preferred the classic bike type the most with 1,705,809 rides. 

![image.png](attachment:0361f400-7fef-4be9-8e9f-fc5af40e5416.png)

When these data are examined, electric bicycles are preferred by users (both casual and annual members) at a rate of 51% of the total ride. It was seen that the second most used type of bicycle after the electric bicycle is the classical bicycle at a rate of 45%. It has been clearly seen that the docked bike type is used only by casual members.

**Usage Of Bicycle Types According to Seasons**

When the use of bicycle types according to the seasons is examined, it can be said that the electric bicycle is used more often than the classical bicycle. In the summer, the difference is very small. In both types of bicycles, the least usage occurred in the winter season.

![image.png](attachment:9d0a6e3e-657e-412f-a433-6cc0aebf9d61.png)

**Riding by the Hours**

When we examine our driving data according to the hours of the day, it is seen that there is a serious increase in the number of driving in the afternoon. The number of driving in the afternoon and evening hours was higher than the other time periods. ;

![image.png](attachment:e2d0ff55-a4b3-4f63-9b71-d06ac344c256.png)

When the number of driving hours of the day according to the user type is examined, it is clearly seen that the annual and casual members drive more especially in the afternoon and evening hours. 

![image.png](attachment:18583862-1291-4f56-9924-42bb5778addb.png)

**Riding by Day**

The driving behaviors of the users in certain time periods of the day were examined. According to these reviews, it has been observed that annual and casual members have a higher driving percentage in the Afternoon compared to other time periods. While the evening time period has the second highest percentage of driving, the least driving occurred at night.
It is noteworthy that casual members have a higher percentage of driving at night than annual members.


![image.png](attachment:1992a9a8-0a91-4b42-a1d5-9e94530c6cab.png)

**Riding by the Weekday**

When we examine according to the days of the week, it has been observed that most weekend driving takes place. It is assumed that the most important reason for this is that the members prefer to tour by bike as a weekend activity, since they have weekend holidays.


![image.png](attachment:cd5896f6-dc17-4ff8-adfa-1e5ae2eb2d68.png)

**Riding weekday by the Member Type**

If we divide this analysis according to member types, it is observed that casual members reach the highest number of rides on Saturdays, while annual members reach the highest number of rides on Thursdays. Annual members made more rides on all other days in total number of rides except Saturday. 


![image.png](attachment:b56a6bf9-3859-4a81-b0c1-22abf6fcec50.png)

**Rides by the Month**

When we group our total number of rides by months, we observe that the highest number of rides occurs in july with 14.53 percent. Again, with 13.87% and 13.57 % of our total driving numbers, August and June were observed as the months with the most driving. It was clearly seen that December, January and February had the lowest driving percentages.


![image.png](attachment:e1629908-9ff7-4957-a590-7f5f1db8eae7.png)

If we group these data according to the member types;

![image.png](attachment:aa1759d3-f939-41e6-865a-900392015d3b.png)

As can be seen from the line graph above, the behavior of the membership types in the total number of rides according to the months is quite similar to each other. In both membership types, while more rides are made in the summer months, this number drops considerably in the winter months.
After this analysis, it would be very accurate to analyze these driving types according to the seasons. 


**Rides by The Seasons**

The driving data according to the seasons, which is a big picture of the analysis graph according to the months given above, is examined in this section.


![image.png](attachment:8364a8bc-2b8a-4fdb-943f-9c9a9ec891e6.png)

When the graph given above is examined, it is observed that while the most driving is in the summer season, a close number of driving is observed between the fall season and the spring season. The significant decrease in the number of rides in the winter season has statistically revealed that the drivers do not prefer bicycles when the weather and road conditions deteriorate. 

![image.png](attachment:292e0f5e-7688-447d-aecf-78c72f5f2b9e.png)

Looking at the pie chart given above, summer months stand out as the season with the highest number of rides with 41.97% in total driving numbers, while fall ranks second with 28.20%. While the spring season is the third season with the most driving, as we just mentioned, the winter season is the season with the least driving.

![image.png](attachment:7b6effdc-8d23-4729-bf11-c60bab96ffe2.png)

The graphic above shows the comparison of the total number of seasonal rides according to the member types. As we mentioned above, summer and spring seasons are at their peak in both member types.

**Analysis of Average Riding Times**

**Average Ride Length**

The average ride length is defined as the average of the numbers (ride length) obtained by subtracting the start time from the ride finish time. With the analysis made, the average driving length was measured as 16.36 minutes.

When the user types are examined, it is seen that casual members have an average driving length of 22 minutes, and annual members have an average driving length of 12.4 minutes.


![image.png](attachment:9a998017-53af-47cb-9dc5-20f97b718190.png)

**Average ride length by the Bike Type**

When we compare the average riding times according to the bicycle type, it is seen that the docked bike type is seen as the bicycle type with the highest average riding time with 50.76 minutes, while the classical bicycle type has an average of 17.19 minutes and the electric bicycle has an average of 13.53 minutes. 


![image.png](attachment:069ae97e-31cc-4711-b143-9c46a689a331.png)

**Average ride length bike type by the Member Type**

When we compare the average riding times according to user type and bike type, it has been observed that Casual members use docked bikes the most. With an average of 50.76 minutes, all docked bike rides belong to casual members. In the classic bicycle type, casual members have an average of 24.56 minutes of riding time and more usage than annual members with 13.35 minutes of riding time. In the electric bicycle type, casual members have an average driving time of 16.20 minutes, while annual members have an average of 11.49 minutes.


![image.png](attachment:eae90e1c-8f0f-4b67-ab42-ac562e7f236e.png)

**Average riding times according to time zones of the day**

When the average driving times are analyzed according to user types and time zones of the day, it is observed that casual members have 23.76 minutes of average driving time in the afternoon than annual members with an average of 12.81 minutes. Casual members have the most average driving time in the morning after the afternoon, while annual members have the most average driving time in the evening and night hours.

![image.png](attachment:7e6ba854-750d-4958-bdcf-1db63922f949.png)

**Average riding times by user types and time zones of the day**

When the average driving times of the member types according to the days are examined, casual members and annual members have the most average driving times on weekends. While the average driving time is generally higher in casual members, it has been observed that annual members have an average of 13 to 11 minutes of driving time


![image.png](attachment:7af99693-5c08-4897-a09c-28e1acdd4b53.png)

**Average riding times of member types by days**

When the average driving times are analyzed according to the days, Sunday and Saturday are at the top. With an average of 15,956 minutes on Monday, the first working day, it is the day with the highest average driving time after the weekend


![image.png](attachment:be4f34ca-7926-4f16-8ce1-d28e7d9f71bc.png)

**Average riding times by Day**

When the average driving times are analyzed according to the days, Sunday and Saturday are at the top. With an average of 15,956 minutes on Monday, the first working day, it is the day with the highest average driving time after the weekend


![image.png](attachment:56461b02-c8be-435c-a1b0-2308d282edb4.png)

**Average driving times of user types by Month**

When the average driving times of the user types according to the months are examined, it has been observed that the months with the most average driving time for casual members are March, April, May and June. With an average driving time of 25.78 minutes in March, casual members had the most average driving time in this month. Annual members, on the other hand, have the most average driving time in June with 13.68 minutes.


![image.png](attachment:b1eb2ed6-cc56-4b29-bebe-8afac3def852.png)

**Average riding times by Month**

When the average driving times are analyzed by months, May, June and July are the months with the highest average driving time. While it was observed that December had the lowest average driving time, it was clearly determined that the average driving time decreased even more in winter.


![image.png](attachment:5d46b082-bd2a-4f93-89f8-be07f4c84912.png)

**Average riding times by Season**

When the average driving time according to the seasons is examined, the average driving time is 17,863 minutes at the most, while the summer season is followed by the spring months with an average driving time of 17,177 minutes. The lowest average driving time was 12.40 minutes in winter.


![image.png](attachment:225fa773-3cb0-4e45-a390-0914042edefb.png)

**Average Riding Times Per Season By Member Type**

When the average driving times in the seasons are examined according to the member types, it is observed that the casual members have the most average driving time in the spring season. Casual members have the highest average driving time in summer with 22.76 minutes after the spring season, and the lowest average time in winter is 17.89 minutes. Annual members, on the other hand, had the most average driving time of 13.41 minutes in the summer season, while the average of 10.93 minutes in the winter season.


![image.png](attachment:d3ba8112-cd6d-4313-8d2f-6e7a176bba1f.png)

Yes, graphics and analysis can be increased further, but we have completed the necessary investigations for us. Therefore, we can move on to the next section where we discuss how this data will affect our approach to current business problems.
As a result of all these analyzes and visualizations I have done, I combined the information I obtained in a presentation file and made it ready to share with my team. The presentation file can be accessed via this **link**.

## Phase 6 – ACT
**Final Conclusion and Recommendation**

My answers to the three questions that will guide the future marketing program are as follows, based on my analysis and visualizations:

**Q1: How do annual members and casual riders use Cyclistic bikes differently?**
**A1:** 
* 	Our annual members use the Cyclistic bike sharing service frequently and consistently in terms of total ride and ride time. It has shown that users prefer bicycles to other types of vehicles on issues such as cost, environmental pollution and traffic to use for certain jobs.

* 	On the other hand, casual riders do not use the bike frequently and consistently. Therefore, it is understandable that they do not prefer to switch to the annual membership plan. Despite this, casual riders like to cycle inconsistently compared to annual members, with long rides on weekends that we think are for fun and relaxation.

* 	Annual members are very likely to opt for the annual membership plan, as their daily and weekly routines are set. The fact that users have school, work and other activities at certain time intervals seems to push them to use a certain means of transportation in a planned way. Due to this planned use, annual membership seems quite reasonable to meet the needs of such members.

* 	As a result, usage differences give us information about drivers' tendencies and behaviors. These differences vary according to the drivers' behaviors and perceptions of use.

**Q2: Why would casual riders buy Cyclistic annual memberships?**
**A2:** In the light of the analysis and visualizations I have made, the suggestions I can give to the marketing team on how casual riders can be converted into annual members are as follows:

* 	The advertising and marketing strategy to get casual riders to get their annual subscription plan should be around very specific issues. These issues are the cost of cycling, health, environmental health, traffic, etc. It should be noted that the benefits in these subjects are more than other tools.

* 	In the light of the analyzes I have made; it has been observed that the reason for the annual members to use the service stems from their instincts to stick to their planned and routines in their lives. For this reason, mobile applications can be developed to encourage casual riders to move on to a planned life, and advertisements can be made for users to create routines at certain times of the day and use Cyclistic bicycles as a means of travel to these routines.

* 	It is a known fact that time and money are very important in today's societies. For this reason, the transition to the annual plan can be encouraged by reducing the costs of the annual membership plans and by making special campaigns, sweepstakes, coupons and discounts for people who switch to the annual membership plan. Thanks to the agreements to be made with different establishments (food restaurants, cafes, book stores, entertainment centers, etc.), a system can be developed in which users can use the points they earn according to the number and frequency of driving in these establishments.

* 	A system where driving time and number can be tracked would be very useful. Thanks to the application to be integrated into smart devices, it can be connected to other mobile applications such as Google fit and Apple Health, and the data can be created in the minds of users to create the perception that they are doing an activity to protect their health with their daily driving goals.

**Q3: How can Cyclistic use digital media to influence casual riders to become members?**
**A3:**
* 	The impact of social media on marketing and campaign processes in recent times is an undeniable fact. For this reason, encouraging campaigns can be organized using social media platforms. YouTube, Facebook, Twitter, TikTok etc. With the community and activity groups to be created on social media platforms, it can be ensured that members can share in these groups. In these community and group events where existing members will register, activities such as cycling and collective trips to certain points can be organized and shared on social media accounts. Especially with the planned activities to be held on social media, effective communication can be ensured between our current users and this communication can be met on social media and attract new users to our structure.

* 	Various agreements are made with the faces known to the public from social media phenomena, and the sharing of these people about the Cyclistic company in their accounts can cause a significant increase in the number of members.

* 	Especially in media sharing tools such as YouTube and TikTok, agreements with influencers who have a channel on travel will be very effective, and the owners of these channels will use their Cyclistic bikes on their trips and make special coupons and discount draws for people watching their videos. Historical, cultural, touristic, etc. Collective trips to places can be organized through these channels and shared on social media, thus attracting the attention of more users.


[Github](https://github.com/fatihilhan42/Cyclistic_Bike_Share_Data_Analysis) link to the project.

[Tableau](hhttps://public.tableau.com/app/profile/fatih.ilhan/viz/GoogleDataAnalyticsCapstoneStudyCase1/Averageridingtimesperseasonbymembertype) Visualizations.

[DataSets](https://drive.google.com/drive/folders/1dV3BcpJ8ZPjaqanqDLpy1SczEyFffc2L?usp=share_link)

You can check out my updates and posts on my [Linkedin](https://www.linkedin.com/in/fatih-ilhan/) account.