# Cyclistic Bike-Share Analysis - Project Summary

## Project Overview

This project analyzes 4.86 million bike rides from 2024 to understand behavioral differences between annual members and casual riders at Cyclistic, a bike-share company in Chicago.

**Business Goal:** Design marketing strategies to convert casual riders into annual members.

---

## Tools & Technologies

| Tool | Purpose |
|------|---------|
| Python (Pandas, NumPy) | Data cleaning and analysis |
| Matplotlib, Seaborn | Data visualization |
| MySQL | Data storage and SQL queries |
| Jupyter Notebook | Documentation and code execution |
| Anthropic Claude (AI Assistant) | Learning support and workflow guidance |
---

## Project Structure
```
Cyclistic-Bike-Share-Analysis/
│
├── data/
│   ├── raw/                 (original 12 monthly CSV files)
│   └── processed/           (cleaned and processed data)
│
├── notebooks/
│   ├── 01_data_preparation.ipynb      (ASK and PREPARE phases)
│   ├── 02_data_cleaning.ipynb         (PROCESS phase)
│   ├── 03_exploratory_analysis.ipynb  (ANALYZE phase)
│   ├── 04_visualizations.ipynb        (SHARE phase)
│   ├── 05_mysql_integration.ipynb     (MySQL integration)
│   ├── 06_key_findings.ipynb          (Findings and recommendations)
│   └── 07_project_summary.ipynb       (This file)
│
├── sql/
│   └── queries.sql                    (10 analytical SQL queries)
│
├── visualizations/
│   ├── 01_member_distribution.png
│   ├── 02_duration_comparison.png
│   ├── 03_hourly_patterns.png
│   ├── 04_daily_patterns.png
│   ├── 05_monthly_trends.png
│   ├── 06_weekday_weekend_comparison.png
│   └── 07_bike_type_preferences.png
│
└── README.md
```

---

## Data Journey

### Original Data
- 12 monthly CSV files (January - December 2024)
- Source: Divvy Bikes (public data)
- Total records: 5,860,568

### Cleaning Steps
1. Combined 12 monthly files into one dataset
2. Removed 211 duplicate ride IDs
3. Fixed ride_length calculation error
4. Removed invalid rides (negative, zero, and over 24 hours)

### Final Clean Dataset
- Total records: 4,859,019
- Records removed: 1,001,549 (17.1%)
- Data retained: 82.9%
- Average ride duration: 9.68 minutes
- Median ride duration: 8.52 minutes

---

## Key Findings

### 1. Members are Commuters, Casual Riders are Leisure Users

| Metric | Casual Riders | Annual Members |
|--------|--------------|----------------|
| Total Rides | 1,584,360 (32.6%) | 3,274,659 (67.4%) |
| Avg Ride Duration | 10.61 min | 9.23 min |
| Peak Day | Saturday | Wednesday |
| Peak Hour | 5:00 PM | 5:00 PM |
| Weekday Rides | 65.1% | 76.7% |
| Weekend Rides | 34.9% | 23.3% |
| Summer Rides | 68.8% | 56.6% |

### 2. Biggest Behavioral Differences
- Members show clear **commute patterns** (morning and evening peaks)
- Casual riders show **leisure patterns** (afternoon/evening only)
- Casual riders ride **longer** but **less frequently**
- Casual riders usage is **highly seasonal** (summer-heavy)
- Members ride **consistently year-round**

### 3. Bike Preferences
- Both groups prefer electric bikes (52-55%)
- Casual riders use electric scooters more (4.59% vs 1.66%)
- Classic bikes result in longer rides for casual riders (11.67 min)

---

## Top 3 Recommendations

### Recommendation 1: Weekend-to-Weekday Campaign
- **Target:** Casual riders who ride on weekends
- **Strategy:** "Commute Challenge" - encourage casual riders to try biking to work
- **Offer:** Free first month membership trial
- **Why:** 34.9% of casual rides are on weekends, showing regular engagement

### Recommendation 2: Seasonal Membership Tier
- **Target:** Casual riders who only ride in summer
- **Strategy:** Create a reduced-price "Summer Membership" (May-September)
- **Offer:** Special perks like priority bike access and no unlock fees
- **Why:** 68.8% of casual rides happen May-September

### Recommendation 3: In-App Targeted Notifications
- **Target:** Casual riders during peak hours (4-6 PM)
- **Strategy:** Send notifications showing potential membership savings
- **Offer:** Real-time cost comparison: pay-per-ride vs membership
- **Why:** Casual riders already peak at 4-6 PM, just need the value proposition

---

## Visualizations

All charts saved in the `visualizations/` folder:

1. **Member Distribution** - Shows the 67/33 split between members and casual riders
2. **Duration Comparison** - Casual riders take 15% longer rides on average
3. **Hourly Patterns** - Clear commute peaks for members vs leisure for casual
4. **Daily Patterns** - Weekday dominance for members, weekend usage for casual
5. **Monthly Trends** - Seasonal patterns showing summer peaks
6. **Weekday/Weekend Comparison** - Side-by-side weekday vs weekend analysis
7. **Bike Type Preferences** - Equipment preferences by rider type

---

## What I Learned

1. **Data Cleaning matters** - Found and fixed a critical ride_length calculation error
   that would have skewed the entire analysis
2. **Stratified sampling** - Learned to maintain data proportions when reducing dataset
   size for visualization tools
3. **Business context** - Kept the analysis focused on a clear business question
   throughout the entire process
4. **Storytelling with data** - Created visualizations that tell a clear narrative
   about user behavior differences
5. **SQL for validation** - Used MySQL queries to cross-validate findings from
   Python analysis, ensuring accuracy
6. **Modern workflow** - Leveraged AI assistants (like Anthropic's Claude) as a
   learning and problem-solving tool during the analysis process, similar to how
   professionals use these tools to accelerate data workflows while maintaining
   critical thinking and validation of all outputs
   
---

*Project completed as part of Google Data Analytics Professional Certificate*\
*Analyst: MITADRU DEB*\
*Date: 30/01/2026*