## Introduction
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, the director of marketing believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, the team believes there is a very good chance to convert casual riders into members. Casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.


## Business Task
Design marketing strategies aimed at converting casual riders into annual members. However, in order to do that, there is a need to better understand how annual members and casual riders use Cyclistic bikes differently by analyzing the Cyclistic historical bike trip data to identify trends. 

In-Scope:

1. Summary of business task
2. Description of data sources used
3. Documentation of data cleaning and manipulation processes
4. Summary of analysis
5. Relevant data visualization and key findings
6. High-level recommendations

Out-of-Scope:

1. Insights on why casual riders would buy Cyclistic annual memberships
2. Roadmap for using digital media to influence casual riders to become members

## Prepare
The data for this analysis comes from a dataset made available by Motivate International Inc. under this [license](https://ride.divvybikes.com/data-license-agreement) and can be found [here](https://divvy-tripdata.s3.amazonaws.com/index.html). Previous 12 months (November 2020 - October 2021) of Cyclistic trip data was downloaded to enable us to answer the business questions.

The analysis is conducted in R Studio.

In [2]:
# Install and load packages
library("tidyverse")
library("dplyr")

The selected files for the 12-month period for analysis were then imported to R Studio as dataframes.

In [5]:
# Load data sets for trip data for 12 months (Nov 2020 to Oct 2021)
Nov_2020 <- read_csv("../input/cyclistic-datasets/202011-divvy-tripdata.csv",show_col_types = FALSE)
Dec_2020 <- read_csv("../input/cyclistic-datasets/202012-divvy-tripdata.csv",show_col_types = FALSE)
Jan_2021 <- read_csv("../input/cyclistic-datasets/202101-divvy-tripdata.csv",show_col_types = FALSE)
Feb_2021 <- read_csv("../input/cyclistic-datasets/202102-divvy-tripdata.csv",show_col_types = FALSE)
Mar_2021 <- read_csv("../input/cyclistic-datasets/202103-divvy-tripdata.csv",show_col_types = FALSE)
Apr_2021 <- read_csv("../input/cyclistic-datasets/202104-divvy-tripdata.csv",show_col_types = FALSE)
May_2021 <- read_csv("../input/cyclistic-datasets/202105-divvy-tripdata.csv",show_col_types = FALSE)
Jun_2021 <- read_csv("../input/cyclistic-datasets/202106-divvy-tripdata.csv",show_col_types = FALSE)
Jul_2021 <- read_csv("../input/cyclistic-datasets/202107-divvy-tripdata.csv",show_col_types = FALSE)
Aug_2021 <- read_csv("../input/cyclistic-datasets/202108-divvy-tripdata.csv",show_col_types = FALSE)
Sep_2021 <- read_csv("../input/cyclistic-datasets/202109-divvy-tripdata.csv",show_col_types = FALSE)
Oct_2021 <- read_csv("../input/cyclistic-datasets/202110-divvy-tripdata.csv",show_col_types = FALSE)

Once data frames have been imported into R Studio, the head() and colnames() functions were used to view a few data frames to ensure they were imported correctly into R Studio.

In [6]:
# View data frames
head(Nov_2020)
colnames(Nov_2020)

head(Apr_2021)
colnames(Apr_2021)

head(Oct_2021)
colnames(Oct_2021)

On investigation, all imported data frames have similar structure (i.e., column names and data types). They are the subsequently merged to form a single data frame.

In [7]:
# Combine data for all months into one large data frame
dframe <- rbind(Nov_2020, Dec_2020, Jan_2021, Feb_2021, Mar_2021, Apr_2021,
                May_2021, Jun_2021, Jul_2021, Aug_2021, Sep_2021, Oct_2021)
bike_trip_data <- dframe

In [8]:
str(bike_trip_data)

In [9]:
colnames(bike_trip_data)

In [10]:
head(bike_trip_data)

## Process

In [11]:
# Data cleaning
bike_trip_data <- distinct(bike_trip_data) # remove duplicate data entries

In [12]:
# rename columns poorly named
bike_trip_data <- rename(bike_trip_data, customer_type = member_casual)
bike_trip_data <- rename(bike_trip_data, bike_type = rideable_type)

In [13]:
# Include other important columns into data frame
bike_trip_data$date <- as.Date(bike_trip_data$started_at) #add a date column
bike_trip_data$month <- format(as.Date(bike_trip_data$started_at), "%b %Y") #add month-year column
bike_trip_data$day <- format(as.Date(bike_trip_data$date), "%d") # add day column
bike_trip_data$year <- format(as.Date(bike_trip_data$date), "%Y") # add year column
bike_trip_data$day_of_week <- format(as.Date(bike_trip_data$date), "%A") #add day column
bike_trip_data$time <- format(bike_trip_data$started_at, format = "%H:%M") # add time column in char
bike_trip_data$time <- as.POSIXct(bike_trip_data$time, format = "%H:%M") # convert to dttm
bike_trip_data$ride_length <- (as.double(difftime(bike_trip_data$ended_at, bike_trip_data$started_at)))/60 # add column for length of each ride in minutes

bike_trip_data <- bike_trip_data[!bike_trip_data$ride_length<1,] # remove negative rides and rides less than 1 minute


In [14]:
head(bike_trip_data)

In [15]:
# Create data frame with the columns needed for analysis
bike_data <- bike_trip_data %>% 
  select(bike_type, customer_type, started_at, date, month, day, year, day_of_week, time, ride_length)

invisible(drop_na(bike_data)) # remove rows with missing values

head(bike_data)

In [16]:
# Order data for day of the week
bike_data$day_of_week <- factor(bike_data$day_of_week, levels=c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", 
                                                                "Saturday", "Sunday"))
bike_data$month <- factor(bike_data$month, levels=c("Nov 2020", "Dec 2020", "Jan 2021", "Feb 2021", "Mar 2021", "Apr 2021", 
                                                    "May 2021", "Jun 2021", "Jul 2021", "Aug 2021", "Sep 2021", "Oct 2021"))

## Analyze
Since the business task is focused on understanding how annual members and casual riders use Cyclistic bikes differently, our analysis will compare data for members vs casual riders

In [17]:
summary(bike_data$ride_length) # summary stats for overall ride length

In [18]:
table(bike_data$customer_type) # show total number of members and casual riders

In [19]:
aggregate(ride_length ~ customer_type, bike_data, mean) # average length of rides by customer type (mins)

In [20]:
aggregate(ride_length ~ customer_type, bike_data, median) # median length of rides by customer type (mins)

In [21]:
aggregate(ride_length ~ customer_type, bike_data, sum) # total length of rides by customer type (mins)

In [22]:
aggregate(ride_length ~ customer_type + day_of_week, bike_data, mean) # summary of average length of rides by customer type and day of the week

In [23]:
# Compare summary stat for ridership based on customer type and broken down by day of the week
bike_data %>% 
  group_by(customer_type, day_of_week) %>% 
  summarise(total_rides = n(), average_rides = mean(ride_length), .groups = 'drop') %>% 
  arrange(day_of_week)

In [26]:
# Analyze data related to stations. Create data frame with station information 
station_data <- bike_trip_data %>% 
  select(start_station_name, end_station_name, start_lat, start_lng, end_lat, end_lng, customer_type, ride_length)

station_data <- station_data[!(is.na(station_data$start_station_name) | is.na(station_data$end_station_name)),] # remove any rows with NA station names
station_data <- station_data[!(station_data$start_station_name == "" | station_data$end_station_name == ""),] # remove any rows with empty station names

str(station_data)
head(station_data)

In [None]:
# Create data frame for plotting location of stations on map
geo_data <- bind_rows(data.frame("stations" = station_data$start_station_name,
                                "longitude" = station_data$start_lng,
                                "latitude" = station_data$start_lat),
                     data.frame("stations" = station_data$end_station_name,
                                "longitude" = station_data$end_lng,
                                "latitude" = station_data$end_lat))

invisible(trimws(geo_data$stations))
station_loc = distinct(geo_data)
str(station_loc)
head(station_loc)

In [27]:
# Top 10 popular start stations for member customers
top_10_start_station_members <- station_data %>%
  filter(station_data$customer_type == 'member') %>%  
  group_by(start_station_name) %>%
  summarise(station_count = n()) %>%
  arrange(desc(station_count)) %>%
  slice(1:10)

In [28]:
# Top 10 popular start stations for casual customers
top_10_start_station_casual <- station_data %>%
  filter(station_data$customer_type == 'casual') %>%  
  group_by(start_station_name) %>%
  summarise(station_count = n()) %>%
  arrange(desc(station_count)) %>%
  slice(1:10)

## Share
#### Data visualization

In [29]:
library(ggplot2)

In [33]:
# Figure 1: Total Rides for Each Day of the Week
options(repr.plot.width = 12, repr.plot.height = 12)
bike_data %>% 
  group_by(customer_type, day_of_week) %>% 
  summarise(total_number_of_rides = n(), .groups = 'drop') %>%
  ggplot(aes(x = day_of_week, y = total_number_of_rides, fill = customer_type)) + geom_col(position = "dodge") + 
  scale_y_continuous(labels = scales::comma) + scale_fill_manual(values = c("#DC3220", "#005AB5")) +
  labs(x = 'Day of Week', y = 'Total Number of Rides', title = 'Total Rides for Each Day of the Week', fill = "Type of Customer") +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [34]:
# Figure 2: Breakdown of Ridership by Month
options(repr.plot.width = 12, repr.plot.height = 12)
bike_data %>% 
  group_by(customer_type, month) %>% 
  summarise(total_number_of_rides = n(),.groups = 'drop') %>% 
  ggplot(aes(x = month, y = total_number_of_rides, fill = customer_type)) + geom_col(position = "dodge") + 
  scale_y_continuous(labels = scales::comma) + scale_fill_manual(values = c("#DC3220", "#005AB5")) +
  theme(axis.text.x = element_text(angle = 45)) +
  labs(x = 'Month', y = 'Total Number of Rides', title = 'Total Rides per Month', fill = "Type of Customer") +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [35]:
# Figure 3: Average Ride Length for Each Day of the Week
options(repr.plot.width = 12, repr.plot.height = 12)
bike_data %>% 
  group_by(customer_type, day_of_week) %>% 
  summarise(average_ride_length = mean(ride_length), .groups = 'drop') %>% 
  ggplot(aes(x = day_of_week, y = average_ride_length, fill = customer_type)) + geom_col(position = "dodge") + 
  scale_y_continuous(labels = scales::comma) + scale_fill_manual(values = c("#DC3220", "#005AB5")) +
  labs(x = 'Day of Week', y = 'Average Length of Rides (mins)', title = 'Average Length of Rides for Each Day of the Week', fill = "Type of Customer") +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [36]:
# Figure 4: Average Ride Length by Month
options(repr.plot.width = 12, repr.plot.height = 12)
bike_data %>% 
  group_by(customer_type, month) %>% 
  summarise(average_ride_length = mean(ride_length), .groups = 'drop') %>% 
  ggplot(aes(x = month, y = average_ride_length, fill = customer_type)) + geom_col(position = "dodge") + 
  scale_y_continuous(labels = scales::comma) + scale_fill_manual(values = c("#DC3220", "#005AB5")) +
  theme(axis.text.x = element_text(angle = 45)) +
  labs(x = 'Month', y = 'Average Length of Rides (mins)', title = 'Average Length of Rides per Month', fill = "Type of Customer") +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [37]:
# Figure 5: Ridership Breakdown by Bike Type
options(repr.plot.width = 12, repr.plot.height = 12)
bike_data %>% 
  group_by(customer_type) %>%
  ggplot(aes(x = bike_type, fill = customer_type)) + geom_bar(position = "dodge") +
  scale_y_continuous(labels = scales::comma) + scale_fill_manual(values = c("#DC3220", "#005AB5")) +
  labs(x = 'Type of Bike', y = 'Total Number of Rentals', title = 'Breakdown of Ridership by Bike Type', fill = "Type of Customer") +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [38]:
# Figure 6: Ridership breakdown by Bike Type across Months
options(repr.plot.width = 12, repr.plot.height = 12)
bike_data %>% 
  group_by(customer_type, month, bike_type) %>% 
  summarise(total_number_of_rides = n(), .groups = 'drop') %>% 
  ggplot(aes(x = month, y = total_number_of_rides, fill = customer_type)) + geom_col(position = "dodge") + 
  scale_y_continuous(labels = scales::comma) + scale_fill_manual(values = c("#DC3220", "#005AB5")) +
  facet_grid(customer_type ~ bike_type) + theme(axis.text.x = element_text(angle = 90)) + 
  labs(x = 'Month', y = 'Total Number of Rides', title = 'Total Number of Bike Rentals per Month', fill = "Type of Customer") +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [39]:
# Figure 7: Bike ridership throughout the day
options(repr.plot.width = 12, repr.plot.height = 12)
bike_data %>% 
  group_by(customer_type, time) %>% 
  summarise(total_number_of_rides = n(), .groups = 'drop') %>% 
  ggplot(aes(x = time, y = total_number_of_rides, color = customer_type, group = customer_type)) + geom_line() + 
  scale_y_continuous(labels = scales::comma) + scale_x_datetime(date_breaks = "1 hour", date_labels = "%H:%M", expand = c(0,0)) +
  scale_color_manual(values = c("#DC3220", "#005AB5")) + theme(axis.text.x = element_text(angle = 45)) + 
  labs(x = 'Time', y = 'Total Number of Rides', title = 'Bike Rentals Distributed Throughout the Day', color = "Type of Customer") +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [40]:
# Figure 7: Top 10 start stations for member riders
ggplot(top_10_start_station_members) + 
  geom_col(aes(x = reorder(start_station_name, station_count), y = station_count), fill = "#0096FF") + 
  scale_y_continuous(labels = scales::comma) + coord_flip() + theme_minimal() +
  labs(x = "", y = 'Number of Rides', title = 'Top 10 Start Station for Member Customers') +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

In [41]:
# Figure 8: Top 10 start stations for casual riders
ggplot(top_10_start_station_casual) + 
  geom_col(aes(x = reorder(start_station_name, station_count), y = station_count), fill = "#F08080") + 
  scale_y_continuous(labels = scales::comma) + coord_flip() + theme_minimal() +
  labs(x = "", y = 'Number of Rides', title = 'Top 10 Start Station for Casual Customers') +
  theme(text = element_text(size=18), plot.title = element_text(hjust = 0.5))

## Act
#### Key Findings
Casual riders ride nearly 50% longer than members on average
Annual members mainly use the bikes for their commutes as their usage peaks on weekdays during rush hour
Casual riders use the bikes more for leisure based on the peak usage in summer months and weekends
Casual riders do not use the service during the winter months as much as annual members
Annual members mainly use classic bikes and rarely use docked bikes but casual riders are more open to riding all kinds of bikes

The average ride duration is higher for casual riders for any day of the week.

• Both members and casual riders preferred docked bikes, while the classic bike is the least popular bike type.

• Streeter Dr & Grand Ave, Lake Shore Dr & Monroe St, and Millennium Park are casual riders’ top three start stations.

• Casual riders ride more during the weekends.

### Recommendations
• Giving incentives or rewards for achieving members’ milestones to attract casual riders to become members.

• Offer occasional membership discount to new riders on summer and holiday weekends

• Partner with local businesses within the top used stations for casual riders targeting 1) local casual riders, 2) frequent visitors (commuters) to the businesses.
Run promotions for annual memberships during the winter months to boost sales and try to convert casual riders into annual members
Decrease the price of single-fare and full-day passes Monday through Friday to bolster casual ridership during the work week
Increase price of single-fare and full-day passes on Saturday and Sunday to entice customers to convert from casual ridership into annual membership