# 🚲 **Public Health Connection** 
 
While this analysis primarily focuses on uncovering usage patterns in Cyclistic’s bike-share program, it also touches on a broader impact. Promoting cycling as a mode of transport contributes to healthier lifestyles, reduced pollution, and improved well-being, aligning with the **United Nations Sustainable Development Goal 3: Good Health and Well-being**.

📄 [Read the full SDG 3 impact documentation here](https://drive.google.com/file/d/1uia1uQjhZMEUg8L3Lvc2kxY5PuOZDBce/view?usp=sharing)


# Introduction

Cyclistic, a bike-share program in Chicago, has many casual riders who use the service occasionally but do not subscribe to annual memberships. Annual members are more profitable and tend to ride more regularly, but casual riders remain significantly higher in number. This presents missed opportunities for business growth and public health improvements.

# Scope

This project analyzes 12 months of trip data from Cyclistic’s bike-share program to understand how casual riders differ from annual members in ride frequency, duration, and timing. The findings will help design strategies to increase membership.

# Objectives

- Analyze and compare usage patterns between casual and annual riders.  
- Identify trends useful for targeted marketing strategies.  
- Recommend three realistic actions to convert casual users into regular members.

This approach supports promoting consistent bike use and aligns with sustainable transport and public health goals.


# Setup: Installing and Loading Necessary R Packages

In [None]:
# Install packages (if needed)
# install.packages("tidyverse")
# install.packages("lubridate")

# Load libraries
library(tidyverse)  # includes dplyr, ggplot2, readr, tidyr, etc.
library(lubridate)  # for date/time manipulation


This section loads the essential R packages used throughout the analysis.
tidyverse contains popular data manipulation and visualization packages such as dplyr and ggplot2.
lubridate is used for handling date and time data efficiently.

# Load and Inspect Data

In [None]:
# Example of reading data (update the path as necessary)
rides <- read_csv("/kaggle/input/cyclistic-case-study/Divvy_Trips_2020_SQL_R_Q1.csv")

# Inspect the first few rows
head(rides)

# View the dataset in a spreadsheet-like viewer (if your R environment supports it)
View(rides)

# Quick data structure overview
glimpse(rides)

# Show column names
colnames(rides)

The Cyclistic bike-share data file was read into R using read_csv(). The dataset contains detailed information about each trip, including ride ID, bike type, start and end times, station names and locations, user type (member or casual), and ride duration. The first few rows were previewed with head(), the entire dataset was viewed interactively with View(), column names were examined using colnames(), and the structure and variable types were summarized using glimpse(). These steps were done to understand the data and identify any potential issues before proceeding with the analysis.

# Exploratory Data Analysis of Cyclistic Bike-Share Data
An exploratory data analysis was conducted on the Cyclistic bike-share dataset to identify trends and patterns in rider behavior. The dataset was first cleaned and prepared, including converting ride durations and organizing days of the week. Various visualizations were then created to explore how ride duration, frequency, and user type (member or casual) differed across time and categories. The results were intended to provide actionable insights for decision-making and strategy development.

The analysis proceeded in the following steps:

# 1. Data Preparation

* Converting ride_length to numeric minutes.
* Reordering day_of_week as a factor with proper labels and order.


In [None]:
rides$ride_length <- period_to_seconds(hms(rides$ride_length)) / 60

rides <- rides %>%
    mutate(
        day_of_week = factor(day_of_week,
                             levels = 1:7,
                             labels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"),
                             ordered = TRUE)
    )

Descriptive statistics were initially derived to establish key metrics such as average ride duration and most common ride days. The ride_length column was converted from HH:MM:SS to minutes, and day_of_week was transformed from numeric to weekday names using R functions. These transformations facilitated clear analysis and visualization.

# 2. Ride Duration Distribution by Rider Type
Histogram of ride lengths split by member_casual.

In [None]:
ggplot(rides, aes(x = ride_length, fill = member_casual)) +
    geom_histogram(binwidth = 5, position = "identity", alpha = 0.6) +
    xlim(0, 100) +
    labs(title = "Distribution of Ride Duration by Rider Type",
         x = "Ride Length (minutes)",
         y = "Number of Rides",
         fill = "Rider Type") +
    scale_fill_manual(values = c("member" = "#1f77b4", "casual" = "#ff7f0e")) +
    theme_minimal()

Histogram visualizations revealed distinct usage patterns: casual riders had significantly longer average ride durations (around 1–1.5 hours) compared to members (typically 10–15 minutes). This suggests that casual users use the bikes for leisure, while members ride for shorter, utilitarian purposes

# 3. Number of Rides by Day of Week and Rider Type
Bar plot showing count of rides per day, grouped by rider type.

In [None]:
ggplot(rides, aes(x = day_of_week, fill = member_casual)) +
    geom_bar(position = "dodge") +
    labs(title = "Number of Rides by Day of Week and Rider Type",
         x = "Day of Week",
         y = "Number of Rides",
         fill = "Rider Type") +
    theme_minimal() +
    scale_fill_manual(values = c("member" = "#1f77b4", "casual" = "#ff7f0e"))

Bar charts and pivot tables indicated that members rode most frequently during weekdays, especially on Mondays, Tuesdays, and Wednesdays, consistent with commuting behavior. Casual riders, however, showed increased activity on weekends, particularly Sundays, aligning with recreational use.

# 4. Average Ride Length by Day of Week and Rider Type
Line plot showing how average ride length changes across the week for each rider type.

In [None]:
avg_ride_length <- rides %>%
    group_by(day_of_week, member_casual) %>%
    summarise(avg_ride_length = mean(ride_length, na.rm = TRUE), .groups = "drop")

my_colors <- c("member" = "#1f77b4", "casual" = "#ff7f0e")

ggplot(avg_ride_length, aes(x = day_of_week, y = avg_ride_length, color = member_casual, group = member_casual)) +
    geom_line(linewidth = 1.2) +  # updated as per warning
    geom_point(size = 3) +
    scale_color_manual(values = my_colors) +
    labs(title = "Average Ride Length by Day of Week and Rider Type",
         x = "Day of Week",
         y = "Average Ride Length (minutes)",
         color = "Rider Type") +
    theme_minimal()

Line plots demonstrated that casual riders had varying ride durations across the week (30–40 minutes on average, peaking on Sundays), whereas members maintained a steady duration (10–15 minutes). These trends confirm the contrasting nature of member and casual usage.

# 5. Ride Duration Summary by Rider Type
Boxplot comparing ride duration distributions for members vs casual riders.

In [None]:
ggplot(rides, aes(x = member_casual, y = ride_length, fill = member_casual)) +
    geom_boxplot(outlier.alpha = 0.2) +
    scale_fill_manual(values = my_colors) +
    coord_cartesian(ylim = c(0, 100)) +
    labs(title = "Ride Duration by Rider Type",
         x = "Rider Type",
         y = "Ride Length (minutes)",
         fill = "Rider Type") +
    theme_minimal()

A boxplot comparing ride durations by rider type emphasized the disparity: casual riders showed high variability with medians between 50–75 minutes, while members showed consistency with medians around 10–15 minutes. This further highlights the difference in ride purpose and behavior between the two groups.

# Conclusion
The twelve-month exploration of Cyclistic’s trip records achieved the stated objectives. Usage patterns of casual and annual riders were analyzed in depth, revealing a clear behavioral divide: casual riders favor weekend outings of an hour or more, whereas members concentrate on short weekday journeys that rarely exceed fifteen minutes. These findings supplied the trends required for targeted marketing,evidence that leisure-centered, weekend promotions will resonate with casual users, while commuter incentives and loyalty rewards will reinforce weekday engagement among members.

Three concrete actions emerged from the analysis. First, weekend leisure campaigns, framed around scenic routes and group experiences, can speak directly to casual riders’ preferences and encourage membership upgrades. Second, commuter reward programs points, fare credits, or workplace partnerships can strengthen member retention and attract riders who value predictability and cost savings. Third, integrating health-tracking features such as “Cycle for 30” daily challenges transforms the service into a preventive health companion, aligning Cyclistic with Sustainable Development Goal 3 and differentiating the brand in a crowded mobility market.

By connecting rider behavior to health and sustainability outcomes, the study demonstrates that data-driven insight can do more than explain the past; it can shape strategies that expand membership, foster healthier communities, and support a cleaner urban environment. Cyclistic now holds a practical roadmap for converting occasional users into committed members, one that balances business growth with public health impact and positions the program as a model for active, sustainable transport.