<a href="https://colab.research.google.com/github/sohanmahamuni/EDA_Projects/blob/main/Ford_Bike_Sharing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Ford Bike Sharing EDA



##### **Project Type**    - EDA

# **Project Summary -**

The increasing demand for sustainable and efficient transportation in urban areas has fueled the growth of bike-sharing systems around the world. The Ford GoBike bike-sharing dataset provides an in-depth look at user behavior, trip patterns, and temporal trends for a regional bike-sharing system.This project aims to analyze data from Ford GoBike, a prominent bike-sharing service in the San Francisco Bay Area. The dataset covers trips taken in January 2018 and includes rich information such as trip durations, station locations, user demographics, and ride timestamps.This project aimed to explore, clean, and visualize the dataset using the UBM structure (Univariate, Bivariate, Multivariate) to generate actionable insights and support business decision-making.

# **GitHub Link -**

https://github.com/sohanmahamuni

# **Problem Statement**


The Ford GoBike system needs to understand customer behavior and usage trends to improve service delivery, enhance user satisfaction, and increase overall ridership.To optimize operations, improve user experience, and guide business strategy, Ford GoBike requires a deep understanding of user behavior, trip patterns, and system usage. The challenge lies in uncovering meaningful insights from a large, raw dataset that can support data-driven decision-making. The goal is to identify the key factors that influence bike-sharing usage and uncover patterns that can guide strategic business decisions.

#### **Define Your Business Objective?**

* Increase customer retention and acquisition through data-driven strategies.

* Optimize bike placement, redistribution, and availability during peak hours.

* Understand the preferences of various user segments to tailor marketing efforts.

* Identify areas or time periods with potential for promotional campaigns or expansion.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('201801-fordgobike-tripdata.csv')

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values = df.isnull().sum().sort_values(ascending=False)
missing_values[missing_values > 0]

In [None]:
# Visualizing the missing values
msno.matrix(df)
plt.title("Missing Values Matrix")
plt.show()

### What did you know about your dataset?

Answer Here

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns.tolist()

In [None]:
# Dataset Describe
df.describe()

### Variables Description

Answer Here

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
unique_values = df.nunique().sort_values(ascending=False)
unique_values

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.

# 1. Convert Start and End Time to datetime

df['start_time'] = pd.to_datetime(df['start_time'])
df['end_time'] = pd.to_datetime(df['end_time'])

# 2. Create trip_duration_minutes column

df['trip_duration_minutes'] = df['duration_sec'] / 60


# 3. Extract temporal features (month, hour, day of week, season)

df['month'] = df['start_time'].dt.month
df['day_of_week'] = df['start_time'].dt.day_name()
df['hour'] = df['start_time'].dt.hour

# Clean up invalid or extreme data

# Remove trips with 0 or negative duration
df = df[df['trip_duration_minutes'] > 0]

# Optional: Remove extremely long trips (outliers)
df = df[df['trip_duration_minutes'] < 300]  # trips longer than 5 hours are rare


# Check for missing values

print("\nMissing values per column:")
print(df.isnull().sum())

# filling missing values
df['member_gender'].fillna('Unknown', inplace=True)
df['member_birth_year'].fillna(df['member_birth_year'].median(), inplace=True)

# Final check

print("\nDataset shape after cleaning:", df.shape)
print("Columns available:", df.columns.tolist())

### What all manipulations have you done and insights you found?

**Manipulations:**

* Converted start_time and end_time to datetime format.

* Added trip_duration_minutes for better interpretability.

* Extracted month, day_of_week, and hour for temporal analysis.

* Filtered out invalid trips (zero or negative durations).

* Removed extremely long trips (> 300 minutes) as outliers.

* Dropped rows with missing user type (to keep relevant data).

* filling missing values of member_birth_year with median values for birth year and member_gender  with a placeholder 'Unknown' for gender.

**Initial Insights:**

* The dataset includes rich time-based features.Dataset has 94,563 records after cleaning.

* The time-based features (month, hour, season, etc.) are now ready for temporal analysis.

* Most trips are under 1 hour; extremely long trips are rare and likely outliers.

* There are clear patterns that could emerge by month, season, and user type (we’ll explore that in the next steps!).

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

# Univariate Analysis

#### Chart - 1

In [None]:
# Chart - 1 visualization code
# 1. Distribution of Trip Duration
sns.set(style="whitegrid")
plt.figure(figsize=(10,5))
sns.histplot(df['trip_duration_minutes'], bins=100, color='skyblue')
plt.title('Distribution of Trip Durations')
plt.xlabel('Duration (minutes)')
plt.ylabel('Count')
plt.xlim(0, 60)  # Focus on realistic durations
plt.show()

##### 1. Why did you pick the specific chart?

A histogram is the most effective way to visualize the distribution of a single continuous variable. In this case, trip duration is a key metric in understanding how long users typically ride bikes.

##### 2. What is/are the insight(s) found from the chart?

* Most bike trips fall between 5 to 15 minutes, with a sharp peak around 7–8 minutes.

* The distribution is right-skewed, meaning there are fewer trips with very long durations.

* The frequency of trips decreases significantly after the 15-minute mark.

* Very long trips (over 30–40 minutes) are rare, possibly indicating outliers or different user intentions (e.g., leisure instead of commuting).

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* These findings suggest that short-duration trips dominate usage, likely due to commuting or quick errands, which can inform fleet rebalancing and availability strategies during peak hours.

* Pricing models can be optimized to encourage shorter trips and discourage longer durations that tie up bikes for extended periods.

* Additionally, maintenance scheduling can be better aligned with the average wear-and-tear based on typical trip durations.

No negative growth insights, but understanding rare long-duration trips can help flag abnormal use cases or improve fraud detection systems.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
# 2. Count of User Type
sns.countplot(data=df, x='user_type', palette='Set2')
plt.title('User Type Distribution')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is ideal for comparing categorical variables like user types. It clearly visualizes the proportion of each category, helping identify which group dominates system usage. Since user type is central to business strategy (subscriber vs. casual user), understanding this split is critical.

##### 2. What is/are the insight(s) found from the chart?

* The number of Subscribers is significantly higher than Customers.

* Subscribers outnumber Customers by nearly a 7:1 ratio.

* This indicates that most users prefer long-term membership plans rather than occasional rentals.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

* The dominance of subscribers suggests the business is successful at retaining customers long-term, which is generally more cost-effective than acquiring new customers frequently.

* This insight can encourage the company to enhance subscriber loyalty programs and offer tiered subscription options.

* However, the low number of casual customers may indicate a missed opportunity in attracting one-time or tourist riders. Targeted marketing campaigns or flexible pricing for casual users could help diversify the user base.

No direct insight suggesting negative growth, but the imbalance hints at an untapped market segment.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
# 3. Gender Distribution
sns.countplot(data=df, x='member_gender', palette='coolwarm')
plt.title('Gender Distribution')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart is effective for visualizing the distribution of categorical variables like gender. This format helps quickly identify imbalances in user demographics, which is valuable for targeting marketing and improving inclusivity in service design.

##### 2. What is/are the insight(s) found from the chart?

* The majority of users are Male, followed by Female users.

* There is a relatively high number of Unknown gender entries, which indicates missing or uncollected data.

* Users identifying as Other make up a very small proportion.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* These insights help the company understand its core demographic and can guide targeted campaigns—for example, promoting the service in ways that appeal to female riders or gender minorities.

* The large number of “Unknown” entries highlights a data quality issue, which can be addressed to improve future analysis and personalization.

**Insights that lead to negative growth:**

No direct sign of negative growth, but:

* Over-reliance on a single gender group (e.g., males) may limit market reach.

* By identifying underrepresented groups, the company can take steps to broaden its appeal—potentially increasing ridership and revenue.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='day_of_week', order=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'])
plt.title('Trips per Day of the Week')
plt.show()


##### 1. Why did you pick the specific chart?

A bar chart effectively shows how trip volumes vary by day, helping to identify daily usage patterns. It’s ideal for comparing categorical data like weekdays, making it easy to spot peak and low activity days.

##### 2. What is/are the insight(s) found from the chart?

* Weekdays have higher trip counts than weekends, with Tuesday and Wednesday being the most active days.

* Saturday and Sunday see significantly fewer trips, suggesting that bike usage is mainly for commuting or weekday routines, rather than leisure.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* Knowing peak usage days allows for better resource allocation, such as increasing bike availability and maintenance schedules on high-demand days.

* This insight can inform weekday-based promotions or pricing strategies to maintain or boost usage.

**Insights that lead to negative growth:**

* The dip on weekends might signal untapped potential for recreational users. Marketing campaigns, special weekend offers, or leisure-focused routes could help diversify the user base and boost weekend ridership.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='hour', palette='viridis')
plt.title('Trips by Hour of Day')
plt.xlabel('Hour')
plt.show()

##### 1. Why did you pick the specific chart?

A bar chart was chosen to clearly show how bike usage is distributed across the 24-hour day. It’s the most effective way to highlight peak and off-peak hours at a glance and helps in spotting user behavior patterns.

##### 2. What is/are the insight(s) found from the chart?

* There are two major peaks: Morning peak at 8 AM, aligning with work commute hours. And evening peak at 5 PM, corresponding with end-of-work commutes.

* Very few trips occur during late night and early morning hours (12 AM to 5 AM).

* Moderate and consistent usage is seen from 10 AM to 3 PM, possibly by students, tourists, or flexible workers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* The clear commuter peaks suggest that bike availability and maintenance efforts should be intensified around rush hours to meet demand and ensure customer satisfaction.

* Dynamic pricing can be introduced based on usage hours, encouraging off-peak usage.

* For growth, targeted promotions can be designed to boost midday or evening non-commute usage, possibly appealing to tourists or casual riders.

# Bivariate Analysis

#### Chart - 6

In [None]:
# Chart - 6 visualization code
sns.boxplot(data=df, x='user_type', y='trip_duration_minutes', palette='pastel')
plt.title('Trip Duration by User Type')
plt.ylim(0, 60)
plt.show()

##### 1. Why did you pick the specific chart?

A box plot is ideal for showing the distribution of trip durations across different user types. It helps highlight central tendencies (medians), variability (interquartile range), and presence of outliers, all of which are crucial for understanding usage behavior.

##### 2. What is/are the insight(s) found from the chart?

* Customers (casual users) tend to have longer trip durations, with a higher median and wider spread.

* Subscribers (regular users) have shorter, more consistent trips, suggesting regular, purpose-driven use (like commuting).

* There are significant outliers in both groups, particularly among subscribers, possibly due to unusual or recreational rides.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

These insights can be used to personalize offerings:

* Create tailored packages for customers, focusing on leisure riding, sightseeing routes, or tourist bundles.

* For subscribers, ensure availability during commute times and offer time-efficient features like quick checkouts or route optimization.

Pricing strategies can also differ:

* Encourage customers to subscribe by highlighting cost-effectiveness for frequent or shorter rides.


#### Chart - 7

In [None]:
# Chart - 7 visualization code
sns.boxplot(data=df, x='member_gender', y='trip_duration_minutes', palette='Set3')
plt.title('Trip Duration by Gender')
plt.ylim(0, 60)
plt.xlabel('Gender')
plt.ylabel('Trip Duration (minutes)')
plt.show()

##### 1. Why did you pick the specific chart?

A box plot is ideal for comparing trip duration distributions across multiple gender categories. It clearly highlights central tendencies, variability, and outliers, enabling us to detect behavioral differences among gender identities.

##### 2. What is/are the insight(s) found from the chart?

* Riders marked as “Unknown” gender have the longest average and widest range of trip durations.

* Male riders have the shortest trip durations on average, followed by Female, and then Other.

* There are significant outliers across all gender categories, especially among males.

* The data suggests that users with “Unknown” gender may be tourists or occasional riders, possibly overlapping with the “Customer” user type observed in above Chart.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* Shorter durations among male riders could imply regular, utilitarian use, aligning with commuter-targeted services.

* Improving gender-inclusive features (e.g., personalized onboarding or interface options) might improve sign-up and engagement rates for the "Other" and "Unknown" groups.

**Insights that lead to negative growth:**

* Potential negative impact may come from not understanding or engaging with underrepresented gender groups effectively. Without tailored engagement, there's a risk of lost conversion or loyalty opportunities.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
sns.lineplot(data=df, x='hour', y='trip_duration_minutes', estimator='mean')
plt.title('Avg Trip Duration per Hour')
plt.show()

##### 1. Why did you pick the specific chart?

A line plot with a confidence interval is perfect for visualizing how the average trip duration varies across different hours of the day. This chart captures both central tendency and variability, revealing behavioral trends in how long people ride bikes at different times.

##### 2. What is/are the insight(s) found from the chart?

* Early morning hours (around 2 AM–4 AM) have spikes in average trip duration, with high variability — possibly due to leisure or non-commuting users.

* After 5 AM, trip durations drop and remain relatively stable, especially during typical commuting hours (8–10 AM and 5–7 PM).

* A modest rise is observed between 12 PM and 2 PM, likely corresponding to lunch-hour trips.

* The lowest durations occur between 5 AM–7 AM, which aligns with a surge in trip count (as seen in Chart 5) — likely due to short, purpose-driven commutes.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* The high-duration, low-volume hours (late night/early morning) present a niche marketing opportunity — possibly for night-tour promotions or targeting shift workers.

* The midday bump in duration could inspire “lunch break ride” campaigns.

* Understanding that trips are shorter during commute peaks helps in fleet management — ensuring rapid turnover and bike availability during those windows.

* Recognizing when users spend more time riding can also help in planning rest stops, scenic routes, or premium pricing strategies for extended-use hours.

**Insights that lead to negative growth:**

* The average trip duration is high during late-night hours (12 AM – 5 AM), but demand is low, indicating poor resource utilization and operational inefficiency.

* There's high variability in trip durations during early morning hours, which makes planning difficult and increases unpredictability in service.

* The longer trips during off-peak hours do not translate to higher profitability due to minimal demand, which can hurt overall revenue.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
sns.countplot(data=df, x='bike_share_for_all_trip', hue='user_type')
plt.title('Bike Share Usage by User Type')
plt.show()

##### 1. Why did you pick the specific chart?

This bar chart was chosen to explore the relationship between user type (Customer vs Subscriber) and participation in the Bike Share for All program, a key initiative aimed at improving transportation equity. It allows us to clearly compare how different user groups engage with the program.

##### 2. What is/are the insight(s) found from the chart?

* The vast majority of users, both Customers and Subscribers, are not enrolled in the Bike Share for All program.

* Among those who do use the program, only Subscribers are represented — no Customers are part of it.

* This suggests that Subscribers are more likely to be aware of or have access to the program than casual users.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* Targeted outreach opportunity: The data shows a huge gap in program awareness or accessibility, especially among Customers. Targeting casual riders with marketing or onboarding for the Bike Share for All program could help increase ridership and build loyalty.

* Improved user retention: Encouraging more Subscribers to enroll could boost retention, as participation in subsidized programs often correlates with higher usage over time.

**Insights that lead to negative growth:**

* Underutilization of the program: The current low usage of the Bike Share for All program implies either a lack of awareness, accessibility issues, or eligibility constraints — all of which represent missed growth opportunities.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(10,5))
sns.countplot(data=df, x='day_of_week', hue='user_type', order=['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday'])
plt.title('User Type Usage Across Weekdays')
plt.show()

##### 1. Why did you pick the specific chart?

This grouped bar chart was chosen because it effectively compares the count of trips across weekdays, segmented by user type (Customer vs Subscriber).It provides a clear view of how user behavior changes depending on the day, which is critical for operations planning, marketing, and resource allocation.

##### 2. What is/are the insight(s) found from the chart?

* Subscribers dominate weekday usage, especially from Tuesday to Thursday, suggesting heavy commuter activity during workdays.

* Customer usage remains relatively low and consistent during the week, but increases noticeably on weekends, likely due to recreational use.

* The sharp decline in Subscriber usage on weekends reinforces the idea that their trips are more work-related, while Customers ride more for leisure.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* Scheduling & fleet optimization: More bikes can be placed in high-demand areas during weekdays for Subscribers, while on weekends, availability can shift to parks, tourist areas, or waterfronts to support Customer needs.

* Targeted promotions: Offer weekend discounts or events aimed at Subscribers to encourage off-work engagement. Launch weekday commuter bundles for Customers who may be converted into Subscribers.

* Marketing segmentation: Different engagement strategies can be developed for each group based on when they use the service.

**Insights that lead to negative growth:**

* Heavy reliance on Subscribers during the week could be risky if there are disruptions in commuting trends (e.g., remote work policies). Diversifying usage with more Customer engagement can help mitigate this.

# Multivariate Analysis

#### Chart - 11

In [None]:
# Chart - 11 visualization code
# 11. Trip Duration vs Hour vs User Type
plt.figure(figsize=(12,6))
sns.pointplot(data=df, x='hour', y='trip_duration_minutes', hue='user_type')
plt.title('Avg Trip Duration by Hour for Each User Type')
plt.show()

##### 1. Why did you pick the specific chart?

Line Chart with Error Bars uses line plots to show the trend of average trip durations over 24 hours, split by user type (Customer and Subscriber), along with error bars to visualize variability in the data.It shows how average trip durations vary by hour across the two main user groups — Subscribers and Customers. It helps reveal behavioral patterns specific to each user type and highlights potential usage trends that could inform time-based service optimization.

##### 2. What is/are the insight(s) found from the chart?

* Customers consistently take longer trips than Subscribers throughout the day. Their average trip duration peaks sharply in the early morning hours (between 2 AM and 4 AM), likely due to low traffic and potentially more leisurely or recreational usage.

* Subscribers show much lower and stable trip durations, usually staying below 15 minutes, suggesting more functional use cases like commuting.

* There’s a clear behavioral distinction between both groups — Subscribers are time-efficient, while Customers may treat the service as a casual or tourism activity.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* The stable and short-duration pattern of Subscribers highlights their reliability and utility-focused usage — encouraging the company to promote subscriptions and provide perks for commuter-friendly features.

* The longer trips by Customers during off-peak hours indicate an opportunity to introduce flexible pricing or tourist packages that cater to casual riders.

* These insights could help refine fleet availability and maintenance schedules to match usage intensity by hour and user type.

**Insights that lead to negative growth:**

* The spike in longer customer rides during early morning may lead to bike availability issues or increased maintenance due to prolonged usage, especially if not tracked and managed.

* Without tailored offerings for casual riders, the company may miss revenue opportunities from this segment.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
# 12. Gender + User Type + Count
plt.figure(figsize=(8,5))
sns.countplot(data=df, x='member_gender', hue='user_type')
plt.title('User Type Distribution across Genders')
plt.show()

##### 1. Why did you pick the specific chart?

This grouped bar chart was chosen to visually compare the distribution of different user types (Customer vs Subscriber) across various gender categories. It's effective in highlighting disparities or trends in gender-based participation and can guide marketing or user experience decisions.

##### 2. What is/are the insight(s) found from the chart?

* The majority of subscribers are male, followed by a significantly lower number of female subscribers.

* Customers are more evenly distributed across the gender categories compared to subscribers, especially noticeable in the 'Unknown' and 'Other' genders.

* The 'Unknown' gender has a relatively high number of Customers, suggesting that customers may be less likely to share gender info than subscribers.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* Targeted campaigns can be developed to attract underrepresented groups like females or non-binary individuals to become subscribers.

* Product or service improvements (e.g., safety features, route suggestions) can be tailored to make the service more appealing to these groups.

* For users with “Unknown” gender, improving onboarding or optional profile fields may help gather better demographic data for more effective targeting.

**Insights that lead to negative growth:**

* The gender gap among subscribers indicates potential accessibility, inclusivity, or perception issues that could limit user base expansion if not addressed.

* If marketing and service features remain skewed toward the dominant demographic (males), the platform risks alienating other user segments, reducing diversity and long-term growth.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
# 13. Trip Duration vs Day of Week
plt.figure(figsize=(10,5))
pivot = df.groupby(['day_of_week', 'hour']).size().unstack()
sns.heatmap(pivot, cmap='YlGnBu')
plt.title('Trips by Day of Week and Hour')
plt.show()

##### 1. Why did you pick the specific chart?

This is a heatmap, chosen for its ability to show patterns across two categorical dimensions (day of the week and hour of the day). It allows us to quickly spot when trip frequency is highest or lowest without clutter.

##### 2. What is/are the insight(s) found from the chart?

* Weekday rush hours (around 8 AM and 5–6 PM) have the highest number of trips — especially Tuesday to Thursday.

* Trip frequency drops significantly on weekends, with a more even spread throughout the day rather than sharp peaks.

* Mondays and Fridays show a slight dip compared to mid-week, likely due to flexible work patterns.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

**The insights that can help creating a positive business impact in the following ways:**

* Optimize fleet allocation: Bikes can be strategically distributed to key stations before morning and evening rush hours during weekdays.

* Targeted promotions: Weekend usage is lower — this is a great opportunity to run discounts or campaigns to boost leisure rides on Saturdays and Sundays.

* Maintenance scheduling: Conduct maintenance during low-activity hours (e.g., late night or early morning) to minimize service disruptions.

**Insights that lead to negative growth:**

* Failing to meet weekday rush demand could result in user frustration due to unavailability.

* Not addressing under-utilized weekends could lead to wasted potential for revenue, especially from casual riders or tourists.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10, 6))
numeric_df = df[['duration_sec', 'trip_duration_minutes', 'member_birth_year', 'hour', 'month']]
corr = numeric_df.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm', linewidths=0.5)
plt.title('Correlation Heatmap of Numerical Features')
plt.show()


##### 1. Why did you pick the specific chart?

A correlation heatmap is the best visual tool to quickly identify the strength and direction of linear relationships between numerical features in a dataset.

##### 2. What is/are the insight(s) found from the chart?

* duration_sec and trip_duration_minutes are perfectly correlated, as expected (1.0 correlation).

* There's almost no correlation between trip duration and member_birth_year, hour, or month.

* hour and member_birth_year show a very slight positive correlation (~0.05), but it's too weak to infer a strong relationship.

* This confirms: trip duration isn’t strongly impacted by month or age.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df[['trip_duration_minutes', 'member_birth_year', 'hour', 'month']], diag_kind='kde')
plt.suptitle('Pair Plot of Key Numeric Variables', y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

Pair plots show scatterplots and distributions across multiple variables in one view, revealing trends, distributions, and outliers.

##### 2. What is/are the insight(s) found from the chart?

* Most variables show weak relationships, but outliers in trip_duration_minutes are evident.

* You may see that older members tend to take shorter trips on average, but it's a weak trend.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.



* Focus on Subscribers: Since subscribers form the majority, create loyalty programs to retain them and offer added benefits during peak hours.

* Weekend Promotions for Casual Users: Target casual riders with special weekend offers, especially in tourist-heavy areas.

* Optimize Bike Availability: Deploy more bikes during commute hours (8 AM & 5 PM) and ensure proper distribution at start stations used heavily by working professionals.

* Gender-Inclusive Campaigns: Develop campaigns that address the needs and safety concerns of female riders to encourage more participation.

* Improve Trip Duration Experience: Provide real-time data on trip times and expected traffic to improve ride planning, especially for longer trips on weekends.

# **Conclusion**

This analysis revealed meaningful trends in trip durations, user types, demographics, and time-based behaviors. Ford GoBike can utilize these insights to boost ridership, better allocate resources, and personalize marketing. With strategic implementation—especially targeting commute hours and weekend behavior—the business can grow sustainably and serve a wider range of users more effectively. Continuous monitoring and seasonal expansion of the dataset will allow for even deeper insights in future analyses.



### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***