<center>
    <img src="https://rockborne.com/wp-content/uploads/2021/07/LandingPage-Header-RED-CENTRE.jpg" width="900" alt="logo"  />
</center>

# Bikeshare Systems Analysis Project

## Business Context

You are a data analyst working for a consultancy that advises bike-sharing operators on optimising their operations and expanding their networks. Your client portfolio includes five major bike-sharing systems across different cities: Baywheels, Bluebikes, Capital Bikeshare, Divvy Bikes, and Santander.

Your clients face several strategic challenges:
- Understanding seasonal demand patterns to optimise fleet size
- Identifying peak usage times for efficient bike redistribution
- Benchmarking performance against competitors
- Planning network expansion based on usage patterns
- Improving user experience through data-driven insights

You have been provided with historical trip data (2016-2019) containing over 81 million records. Your task is to analyse this data and provide actionable insights to support strategic decision-making.

---

## Initial Data Analysis

Before beginning your analysis, conduct a thorough assessment of the available data:

- Examine each provider's dataset structure
- Create a column availability matrix across all providers
- Document data quality issues and inconsistencies
- Identify which providers capture demographic information
- Note any naming convention variations between datasets

This assessment will inform your data integration strategy in later stages.

---

## Level 1: Single Market Analysis

### Business Scenario

One of your clients (Capital Bikeshare) wants to understand their operational patterns before expanding to new areas. They need insights into their current performance to benchmark against future growth.

**Your Task**: Analyse Capital Bikeshare's operations to establish baseline performance metrics.

### Data Quality & Preparation

Assess and clean the data to ensure reliable analysis:
- Identify missing, invalid, or inconsistent data
- Handle outliers and data anomalies
- Create calculated fields for time-based analysis
- Document your data quality findings and cleaning decisions

### Business Questions to Answer

1. **Demand Forecasting**: What are the seasonal patterns in bike usage? Which months experience peak and trough demand? What implications does this have for fleet sizing?

2. **Operational Efficiency**: What are the peak usage hours? How do weekday patterns differ from weekends? When should the company schedule maintenance and rebalancing operations?

3. **Network Utilisation**: How many stations are in the network? Which stations have the highest utilisation? Are there signs of capacity constraints at popular stations?

4. **Trip Characteristics**: What is the average trip duration? How does trip length vary by time of day and day of week?

5. **User Behaviour**: What proportion of trips are made by subscribers versus casual users? Do these segments show different usage patterns?

**Document your findings in your notebook with supporting visualisations and strategic recommendations.**

---

## Level 2: Competitive Intelligence

### Business Scenario

Your consultancy needs to provide comparative analysis across all five bike-sharing systems. Clients want to understand how they perform relative to competitors and identify best practices from market leaders.

**Your Task**: Integrate data from all providers and conduct cross-market comparative analysis.

### Data Integration Challenge

Each provider's data has different structures, naming conventions, and available attributes:
- Design a unified schema that accommodates all providers
- Develop a standardisation strategy for inconsistent column names
- Handle provider-specific attributes appropriately
- Create a transformation framework that can process all datasets

### Business Questions to Answer

1. **Market Growth**: How has ridership evolved year-over-year for each system? Which markets are growing fastest? Which are stagnating?

2. **Seasonal Resilience**: Do all markets show similar seasonal patterns, or are some more resilient to weather variations? What factors might explain differences?

3. **Network Efficiency**: How do the systems compare in terms of trips per station? Which operators achieve the highest utilisation rates?

4. **Market Size**: What is the relative scale of each operation (number of stations, total trips)? How does network size correlate with total ridership?

5. **Growth Rates**: Calculate year-over-year growth percentages. Which markets offer the best growth opportunities?

6. **User Mix**: How does the subscriber-to-casual ratio vary across markets? What might this indicate about market maturity?

**Provide a competitive benchmarking report highlighting each operator's strengths and areas for improvement.**

---

## Level 3: Deep Dive Analytics

### Business Scenario

Clients are now interested in more sophisticated analyses to inform pricing strategies, network expansion, and targeted marketing campaigns. They need granular insights into trip patterns, user segments, and geographic performance.

**Your Task**: Enrich the data with geographic information and conduct advanced segmentation analysis.

### Data Enrichment

- Join trip data with station location information
- Calculate actual trip distances using geographic coordinates
- Create multi-dimensional aggregations for deeper insights

### Business Questions to Answer

1. **Distance vs Duration**: What is the relationship between trip distance and duration? What is the average speed? Are there route inefficiencies?

2. **Trip Economics**: How do average trip duration and distance vary by provider and year? Are trips getting shorter or longer over time?

3. **User Segmentation**: How do subscribers differ from casual users in terms of trip duration, distance, and timing? What does this reveal about their use cases?

4. **Demographic Insights** (where available): How do usage patterns vary by age group and gender? Which demographics represent the largest opportunity for growth?

5. **Popular Routes**: What are the most common origin-destination pairs? Are popular routes symmetric (bidirectional) or unidirectional? What does this suggest about commuting patterns?

6. **Geographic Hotspots**: Which stations serve as major hubs? Which geographic areas show the highest demand? Where are the gaps in the network?

7. **Temporal Segmentation**: How do peak hours differ between subscribers and casual users? Does this confirm hypotheses about commuting versus leisure usage?

**Deliver an insights report with recommendations for pricing, marketing, and network expansion strategies.**

---

## Level 4: Data Warehouse for Strategic Reporting

### Business Scenario

Your consultancy's leadership team needs a standardised reporting framework that can be refreshed regularly and accessed by multiple clients through Business Intelligence dashboards. The current analysis workflow is ad-hoc and difficult to maintain.

**Your Task**: Design and implement a dimensional data model (star schema) that supports consistent, efficient reporting across all strategic questions.

### Requirements Analysis

Identify the reporting and analytical needs:
- What business questions must the data warehouse answer?
- What time granularity is required?
- What dimensions of analysis are most important?
- What key performance indicators should be easily accessible?

### Star Schema Design

Design a dimensional model including:

**Fact Table**: 
- Define the grain (level of detail for each record)
- Identify measurable metrics (duration, distance, etc.)
- Determine necessary foreign keys to dimension tables

**Dimension Tables**:
- **Time**: Support for daily, weekly, monthly, seasonal analysis
- **Station**: Geographic and operational station attributes
- **User**: User type and demographic information
- **Provider**: Bike-sharing system information
- **Bike**: Individual bike tracking (if relevant)

**Design Considerations**:
- How will you handle surrogate keys?
- Which dimensions change over time (slowly changing dimensions)?
- What aggregation tables would improve query performance?

### Implementation

- Create all dimension tables with appropriate attributes and hierarchies
- Build the fact table with proper relationships
- Validate referential integrity
- Create pre-aggregated summary tables for common queries
- Prepare datasets optimised for BI tool consumption
- Document the schema with data dictionary and entity-relationship diagram

### Business Questions Your Data Warehouse Must Support

1. **Trend Analysis**: Track ridership trends across any time period (daily, weekly, monthly, yearly)

2. **Geographic Performance**: Analyse station and route performance across different geographic levels

3. **Provider Benchmarking**: Compare providers across all key metrics

4. **User Analytics**: Segment and analyse user behaviour across multiple dimensions

5. **Operational Metrics**: Monitor fleet utilisation, trip characteristics, and network efficiency

6. **Growth Analysis**: Track year-over-year growth and identify emerging patterns

**Deliver a production-ready data warehouse with comprehensive documentation suitable for ongoing business intelligence reporting.**


**Examples of dashboards:**

- https://public.tableau.com/app/profile/felix.austin/viz/BikeshareDashboard_17607164661020/Dashboard1
- https://public.tableau.com/app/profile/sasha.witter/viz/BikesharingTheheadlines/Dashboard?publish=yes
- https://public.tableau.com/app/profile/sam.boughton/vizzes