In [None]:
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "# Capstone - Case Study 1: How does a bike-share navigate speedy success?\n\n## Introduction\n\n### Scenario\n\nI am assuming to be/play the role as a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, my team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, my team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve our recommendations, so they must be backed up with compelling data insights and professional data visualizations."
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "### Background\n\n**Cyclistic Overview**\n\nCyclistic is a bike-share program with more than 5,800 bicycles and 600 docking stations in Chicago. It offers a variety of bikes, including reclining bikes, hand tricycles, and cargo bikes, making it inclusive for people with disabilities. While most riders use traditional bikes, 8% use assistive options. About 30% of users ride for commuting, while the rest ride for leisure.\n\n**Company History**\n\nCyclistic launched in 2016 and has grown to a fleet of 5,824 bicycles geotracked and locked into a network of 692 stations across Chicago. Riders can unlock bikes from one station and return them to any other in the system.\n\n**Marketing Strategy**\n\nCyclistic has relied on building general awareness with flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Finance analysts have concluded that annual members are more profitable than casual riders. Therefore, the goal is to convert casual riders into annual members.\n\n**Objective**\n\nMoreno, the director of marketing, has set a clear goal: Design marketing strategies to convert casual riders into annual members. The marketing analyst team needs to analyze Cyclistic’s historical bike trip data to identify trends and make data-driven recommendations."
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## Data Analysis Process\n\nThis case study follows the 6 steps of the Data Analysis process: ASK, PREPARE, PROCESS, ANALYZE, SHARE, and ACT. R and RStudio are utilized for data analysis due to the large dataset size."
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## 1. ASK\n\n**Main Objective**\n\nTo understand how casual riders and annual members use Cyclistic bikes differently.\n\n**Business Tasks**\n\n- Identify the business task: What attracts casual riders to become annual members?\n- Consider key stakeholders:\n  - Director of Marketing, Moreno: Responsible for developing campaigns and initiatives.\n  - Executive Team: Will approve the recommended marketing program.\n  - Analytics Team: Collects, analyzes, and reports data for the marketing strategy.\n\n**Deliverables**\n\n- A clear statement of the business task: Identify key factors that attract riders to become annual members.\n- Problem Statement: How do annual members and casual riders use Cyclistic bikes differently?\n- Insights for Business Decisions: Identify differences to define and design a marketing campaign to attract more members and increase profits."
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## 2. PREPARE\n\n**Key Tasks**\n\n- Download and store data appropriately.\n- Identify how it’s organized.\n- Sort and filter the data.\n- Determine the credibility of the data.\n\n**Guiding Questions**\n\n- **Credibility and Bias:** The data is reliable, original, comprehensive, current, and cited, provided by Lyft Bikes and Scooters, LLC.\n- **Licensing, Privacy, Security, Accessibility:** The data is open and maintained by Motivate International Inc., following the Data License Agreement on [Divvy Bikes](https://divvybikes.com/data-license-agreement).\n- **Data Integrity:** The data was examined and verified for consistency in columns and data types.\n- **Relevance:** The data helps analyze both annual members and casual riders, providing insights into their characteristics and bike usage.\n\n**Data Source**\n\nCyclistic’s historical data from 2013 to 2024, available [here](https://divvy-tripdata.s3.amazonaws.com/index.html)."
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## Data Organization\n\nThe data consists of CSV files organized by quarters from 2013 to 2019 and by month from 2020 to 2024. The analysis focuses on data from 2023, with 12 files named `YYYYMM-divvy-tripdata.csv`.\n\n**Columns:**\n\n- `ride_id`: Ride identifier\n- `rideable_type`: Type of bike\n- `started_at`: Start time\n- `ended_at`: End time\n- `start_station_id`, `start_station_name`, `start_lat`, `start_lng`: Start station details\n- `end_station_id`, `end_station_name`, `end_lat`, `end_lng`: End station details\n- `member_casual`: Member type (casual or annual)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## PREPARE: step by step\n\nFirstly, we need to install & load the packages required for this process, which in this case will be: Tidyverse, Janitor & Lubridate."
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": "install.packages('tidyverse')\ninstall.packages('janitor')\ninstall.packages('lubridate')\nlibrary(tidyverse)\nlibrary(janitor)\nlibrary(lubridate)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "Subsequently, we need to collect data from downloaded data files. The data is stored in .zip files, so we have to extract them. As I chose the 2023 dataset, I will get 12 .csv files, one file representing one month of trip data.\nWe will use the function `read.csv()` of package:utils to import CSV files into RStudio. Before this, we must know the working directory to input the right path of CSV files as parameters to `read.csv()`."
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": "# Get working directory\ngetwd() \n\n# Set working directory\nsetwd(\"./input-data/\") \n\n# Read data files\nm01 <- read.csv(\"./input-data/202301-divvy-tripdata.csv\")\nm02 <- read.csv(\"./input-data/202302-divvy-tripdata.csv\")\nm03 <- read.csv(\"./input-data/202303-divvy-tripdata.csv\")\nm04 <- read.csv(\"./input-data/202304-divvy-tripdata.csv\")\nm05 <- read.csv(\"./input-data/202305-divvy-tripdata.csv\")\nm06 <- read.csv(\"./input-data/202306-divvy-tripdata.csv\")\nm07 <- read.csv(\"./input-data/202307-divvy-tripdata.csv\")\nm08 <- read.csv(\"./input-data/202308-divvy-tripdata.csv\")\nm09 <- read.csv(\"./input-data/202309-divvy-tripdata.csv\")\nm10 <- read.csv(\"./input-data/202310-divvy-tripdata.csv\")\nm11 <- read.csv(\"./input-data/202311-divvy-tripdata.csv\")\nm12 <- read.csv(\"./input-data/202312-divvy-tripdata.csv\")"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "Examine datasets: all datasets have the same column names & data types."
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": "str(m01)\nstr(m02)\nstr(m03)\nstr(m04)\nstr(m05)\nstr(m06)\nstr(m07)\nstr(m08)\nstr(m09)\nstr(m10)\nstr(m11)\nstr(m12)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "All 12 datasets have 13 columns with the same data types:\n- `ride_id`           : character\n- `rideable_type`     : character\n- `started_at`        : character\n- `ended_at`          : character\n- `start_station_name`: character\n- `start_station_id`  : character\n- `end_station_name`  : character\n- `end_station_id`    : character\n- `start_lat`         : number\n- `start_lng`         : number\n- `end_lat`           : number\n- `end_lng`           : number\n- `member_casual`     : character\n\nMerge the 12 datasets into one data frame."
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [],
      "source": "# Combine all data frames into one\ncyclistic_data <- bind_rows(m01, m02, m03, m04, m05, m06, m07, m08, m09, m10, m11, m12)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "Clean and Transform Data:"
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": "# Clean column names\ncyclistic_data <- clean_names(cyclistic_data)\n\n# Convert date columns to datetime\ncyclistic_data <- cyclistic_data %>%\n  mutate(started_at = ymd_hms(started_at),\n         ended_at = ymd_hms(ended_at))\n\n# Calculate ride length and day of the week\ncyclistic_data <- cyclistic_data %>%\n  mutate(ride_length = as.numeric(difftime(ended_at, started_at, units = \"mins\")),\n         day_of_week = wday(started_at, label = TRUE))\n\n# Remove any NA or negative ride lengths\ncyclistic_data <- cyclistic_data %>%\n  filter(!is.na(ride_length) & ride_length > 0)\n\n# Examine the cleaned data\nglimpse(cyclistic_data)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## 3. PROCESS\n\n**Key Tasks**\n\n- Check the data for errors.\n- Choose tools.\n- Transform the data so you can work with it effectively.\n- Document the cleaning process."
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## 4. ANALYZE\n\n**Key Tasks**\n\n- Aggregate the data.\n- Organize and format the data.\n- Perform calculations.\n- Identify trends and relationships.\n\n**Analysis Steps:**\n\nDescriptive Analysis:"
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [],
      "source": "avg_ride_length <- cyclistic_data %>%\n  group_by(member_casual) %>%\n  summarise(mean_ride_length = mean(ride_length),\n            median_ride_length = median(ride_length),\n            max_ride_length = max(ride_length),\n            min_ride_length = min(ride_length))\n\nprint(avg_ride_length)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "Ride Count by Day of the Week:"
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [],
      "source": "ride_count_by_day <- cyclistic_data %>%\n  group_by(member_casual, day_of_week) %>%\n  summarise(number_of_rides = n(),\n            average_ride_length = mean(ride_length)) %>%\n  arrange(member_casual, day_of_week)\n\nprint(ride_count_by_day)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "Start and End Station Usage:"
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {},
      "outputs": [],
      "source": "station_usage <- cyclistic_data %>%\n  group_by(member_casual, start_station_name, end_station_name) %>%\n  summarise(number_of_rides = n(),\n            average_ride_length = mean(ride_length)) %>%\n  arrange(desc(number_of_rides))\n\nprint(head(station_usage, 20))"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## 5. SHARE\n\n**Key Tasks**\n\n- Determine the best way to share findings.\n- Create effective data visualizations.\n- Present findings.\n- Ensure work is accessible.\n\n**Visualizations:**\n\n1. Average Ride Length by Member Type:"
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {},
      "outputs": [],
      "source": "ggplot(avg_ride_length, aes(x = member_casual, y = mean_ride_length, fill = member_casual)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(title = \"Average Ride Length by Member Type\",\n       x = \"Member Type\",\n       y = \"Average Ride Length (minutes)\") +\n  theme_minimal()"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "![Average Ride Length by Member Type](./case-study-1_average-ride-length-by-member-type.png)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "2. Ride Count by Day of the Week:"
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {},
      "outputs": [],
      "source": "ggplot(ride_count_by_day, aes(x = day_of_week, y = number_of_rides, fill = member_casual)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  labs(title = \"Ride Count by Day of the Week\",\n       x = \"Day of the Week\",\n       y = \"Number of Rides\") +\n  theme_minimal()"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "![Ride Count by Day of the Week](./case-study-1_ride-count-by-day-of-the-week.png)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "3. Popular Start and End Stations:"
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {},
      "outputs": [],
      "source": "top_stations <- station_usage %>% \n  filter(member_casual %in% c(\"member\", \"casual\")) %>%\n  group_by(member_casual) %>%\n  top_n(10, number_of_rides)\n\nggplot(top_stations, aes(x = reorder(start_station_name, -number_of_rides), y = number_of_rides, fill = member_casual)) +\n  geom_bar(stat = \"identity\", position = \"dodge\") +\n  coord_flip() +\n  labs(title = \"Top Start Stations by Member Type\",\n       x = \"Start Station\",\n       y = \"Number of Rides\") +\n  theme_minimal()"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "![Popular Start and End Stations](./case-study-1_popular-start-n-end-stations-by-member-type.png)"
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "**Sharing Findings:**\n\n1. **Average Ride Length by Member Type:**\n   - Casual riders have a significantly longer average ride length (28.3 minutes) compared to annual members (12.5 minutes). This suggests casual riders may be using the bikes for leisure or longer trips, while annual members likely use them for shorter, more frequent trips such as commuting.\n\n2. **Ride Count by Day of the Week:**\n   - Casual riders have higher ride counts on weekends, especially Saturdays and Sundays. In contrast, annual members have a more consistent ride count throughout the week, with slight increases on weekdays, particularly Tuesdays and Wednesdays.\n\n3. **Popular Start and End Stations by Member Type:**\n   - Popular start and end stations vary significantly between casual riders and annual members. Stations like \"DuSable Lake Shore Dr & Monroe St\" and \"Streeter Dr & Grand Ave\" are highly frequented by casual riders, while annual members show a more distributed usage across various stations."
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": "## 6. ACT\n\n**Key Tasks**\n\n- Prepare a comprehensive report with all findings, insights, and visualizations.\n- Create a presentation to share with key stakeholders, ensuring data is accessible and understandable.\n- Use insights gained to make actionable recommendations for the marketing strategy aimed at converting casual riders into annual members.\n\n**Recommendations and Actions:**\n\n1. **Targeted Marketing Campaigns:**\n   - **Leisure Focus:** Since casual riders tend to have longer ride durations and higher usage on weekends, create marketing campaigns focused on leisure activities. Highlight benefits such as weekend ride packages, scenic routes, and leisure ride events.\n   - **Commute Focus:** For annual members who primarily use bikes for commuting, emphasize the convenience and cost savings of an annual membership. Promote benefits such as faster commute times, dedicated bike lanes, and easy access to docking stations near business districts.\n\n2. **Station Optimization:**\n   - **Casual Rider Stations:** Enhance and promote stations popular among casual riders, such as \"DuSable Lake Shore Dr & Monroe St\" and \"Streeter Dr & Grand Ave.\" Ensure these stations are well-maintained and have ample bikes available on weekends.\n   - **Annual Member Stations:** Optimize stations used by annual members for daily commutes. Provide amenities such as quick bike check-outs and returns, well-lit areas, and proximity to public transit options.\n\n3. **Membership Incentives:**\n   - Offer incentives for casual riders to become annual members. Examples include:\n     - Discounted annual memberships after a certain number of single rides.\n     - Free trials of annual membership benefits.\n     - Special promotions during peak riding seasons.\n\n4. **Community Engagement:**\n   - Organize community events and rides to engage both casual riders and annual members. Events such as community bike rides, maintenance workshops, and social gatherings can help foster a sense of community and loyalty.\n\n**Implementation Plan:**\n\n1. **Timeline:**\n   - Develop a detailed timeline for the implementation of marketing campaigns, station optimizations, and membership incentives.\n   - Assign responsibilities to team members for each task and set clear deadlines.\n\n2. **Budget:**\n   - Allocate a budget for marketing campaigns, station improvements, and membership incentive programs.\n   - Track spending and ensure initiatives remain cost-effective.\n\n3. **Monitoring and Evaluation:**\n   - Set up key performance indicators (KPIs) to monitor the effectiveness of the implemented strategies.\n   - Regularly review data to assess the impact on membership conversions and make adjustments as needed.\n\n4. **Feedback Loop:**\n   - Collect feedback from riders through surveys and social media to continuously improve the bike-share program.\n   - Use feedback to refine marketing messages, improve station amenities, and enhance the overall rider experience."
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "R",
      "language": "R",
      "name": "ir"
    },
    "language_info": {
      "codemirror_mode": "r",
      "file_extension": ".r",
      "mimetype": "text/x-r-source",
      "name": "R",
      "pygments_lexer": "r",
      "version": "4.0.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 5
}
