Ford-Go-Bike
# **Project Name**    -



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **Team Member 1 -**
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**





# **Project Summary -**

Project Summary: Bike-Sharing Data Analysis

This project involves an exploratory data analysis (EDA) of a bike-sharing dataset to uncover patterns in user behavior, identify peak usage times, and provide actionable insights for business optimization. The main goal is to understand how different user types engage with the service and how ride patterns vary across time and demographics, aiding data-driven decision-making.

We began by cleaning the dataset, handling missing values, and creating new features such as the rider's age. Visualizations played a key role in deriving insights from the data.

The first chart analyzed user type distribution, presented as a pie chart. It showed that 88.3% of the users were Subscribers, while only 11.7% were Customers. This significant difference highlights that the majority of the service’s user base comprises long-term subscribers. This insight is useful for customer retention strategies and service personalization. It indicates a strong base of loyal users, suggesting that marketing efforts might be more effective if targeted toward converting casual users into long-term subscribers.

The second chart, a bar graph showing the number of trips by hour of the day, revealed clear peak hours during morning (7–9 AM) and evening (4–6 PM) commute times. This indicates that users primarily rely on the service for daily commuting. Recognizing this trend helps in optimizing bike distribution, station maintenance, and staff allocation to ensure high availability during these crucial hours. The chart also highlighted low activity during off-peak hours (midnight to 5 AM), which could lead to bikes sitting idle, representing an opportunity for alternative uses during these hours such as promotions or dynamic pricing to increase utilization.

A third chart visualized the age distribution of riders, using a histogram created from users' birth years. This analysis showed that the majority of users are in their 20s to 40s, suggesting that the service is popular among young adults and working professionals. This demographic insight can help in designing user engagement strategies, such as mobile app features or partnership promotions that appeal to this age group.

Together, these insights contribute to a positive business impact. Understanding user types allows targeted marketing. Peak hour identification supports operational efficiency, while age demographics assist in tailoring services and advertisements.

However, the analysis also revealed areas of concern indicating negative growth potential. Low usage during off-peak hours suggests underutilization of assets, leading to inefficiencies. Idle bikes during these hours represent missed revenue opportunities and increased maintenance costs. Addressing this issue might involve strategies like offering discounts or incentives for rides during off-peak times.

In conclusion, the EDA provided a comprehensive understanding of user behaviors, time-based usage trends, and demographic patterns. These insights can guide marketing, operations, and strategic planning, ultimately enhancing user satisfaction and business growth. The project highlights the value of data visualization and analysis in identifying both strengths and growth areas in a data-driven transportation service.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


The bike-sharing company is seeking to optimize its operations and improve user engagement by better understanding how customers use the service. However, it currently lacks clear insights into rider behavior, usage patterns across different times of day, and user demographics. Without these insights, it is difficult to make informed decisions about resource allocation, targeted marketing, and service improvements.

This project aims to analyze user data to uncover patterns in trip times, user types, and age demographics. The goal is to identify peak usage hours, determine the distribution of subscribers versus casual users, and understand the age profile of riders. These findings will help the company enhance service availability, reduce idle resources, and design more effective marketing and customer retention strategies.



#### **Define Your Business Objective?**

1.Understand user behavior and trip patterns to optimize service usage.

2.Identify factors that influence trip duration to improve customer experience.

3.Increase subscriber base by targeting and converting casual riders.

4.Support data-driven decisions for marketing, operations, and resource planning.

Answer Here.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime as dt

### Dataset Loading

In [None]:
# Load Dataset
df=pd.read_csv('/content/201801-fordgobike-tripdata.csv.zip')
df.head()

### Dataset First View

In [None]:
# Dataset First Look
print(df.head())
print(df.info())
print(df.describe())

rows, columns = df.shape

In [None]:
rows, columns = df.shape
print(f'Number of rows: {rows}')
print(f'Number of columns: {columns}')

In [None]:
# Dataset Rows & Columns count

### Dataset Information

In [None]:
# Dataset info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
duplicate_rows = df.duplicated()
print(f'Number of duplicate rows: {duplicate_rows.sum()}')


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
missing_values=df.isnull().sum()
print(missing_values)

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 7))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis', yticklabels=False)
plt.title("Missing Values Heatmap")
plt.show()

### What did you know about your dataset?

These datasets follow a specific structure, often represented in tables with rows and columns. Each column usually represents a variable, and each row represents a data point.This datset represents variables, each of which
describes a characteristic of the data.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(f'Number of columns: {columns}')


In [None]:
# Dataset Describe
df.describe()

### Variables Description

duration_sec, start_station_id, start_station_longitude, end_station_longitude, end_station_id, end_station_latitude, end_station_longitude, bike_id, member_birth_year.

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable
unique_values=df.nunique()
print(unique_values)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
df['start_time']=pd.to_datetime(df['start_time'])
df['end_time']=pd.to_datetime(df['end_time'])
df['start_time']=pd.to_datetime(df['start_time'])
df['end_time']=pd.to_datetime(df['end_time'])


In [None]:

df_cleaned=df.dropna(subset=['start_station_id','end_station_id','start_station_name','end_station_name'])
inplace=True

In [None]:
#Trip duration in minute
df['duration_minutes']=df['duration_sec']/60

In [None]:
#only keep trips up to 60 minutes
df=df[df['duration_minutes'] <=60]

In [None]:
#reset index after filtering
df.reset_index(drop=True,inplace=True)

In [None]:
print('cleaned dataset shape')
print(df.shape)

### What all manipulations have you done and insights you found?

Loaded the dataset using pandas, matplotli, seaborn and numpy.

Checked for and handled missing values.

Removed duplicate rows.
Set the time duration in 60 minutes.
check all subset




## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

Chart - 1 Trip Duration Summary

In [None]:
# Chart - 1 Pie Chart


In [None]:
# Count user types
df_cleaned['user_type'].value_counts().plot(kind='pie', autopct='%1.1f%%')
plt.title('User Type Distribution')
plt.ylabel('')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a pie chart because it provides a quick and clear visual comparison of the proportions of each user type. It effectively highlights the dominance of one category over another, making it easy to grasp the overall user base distribution at a glance.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals the majority of users (88.3%), are subscribers, while only (11.7%) are customers. The suggests that most users prefer the subscription model over one-time usage, indicating loyalty or frequent use.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Yes, this insight is valuable. Since subscribers form the bulk of the user base, the business should focu on retaining these subscribers by enhancing their experience and offering loyalty incentives.
Negative Growth Insight:
Yes, one insight that could lead to negative growth is the low percentage of customers(11.7%)
Justification:
A small share of customers might indicates poor conversion or limited appeal to new or casual users. If the bussiness overly depends of subscribers and fails to attract or convert new users, it may faces stagnation in users base growth over time. This limits market expansion and may hurt long-term revenue potential.

#### Chart - 2  Trips per Hour of Day

In [None]:
# Chart - 2 visualization code
 # Extract hour from start time
df_cleaned['hour'] = df_cleaned['start_time'].dt.hour

# Count trips by hour
trips_by_hour = df_cleaned['hour'].value_counts().sort_index()

# Plot
import matplotlib.pyplot as plt

trips_by_hour.plot(kind='bar', figsize=(10,5), title='Number of Trips by Hour of Day')
plt.xlabel('Hour of Day')
plt.ylabel('Number of Trips')
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

I chose bar chart because it clearly shows when users are most active, helping identify peak usage hours.

##### 2. What is/are the insight(s) found from the chart?

Peak usage at 8AM and 5-6PM, likely due to commute times.
Very low usage from midnight to early morning.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

POSITIVE INSIGHT:

Yes, Understanding peak hours help in:
Optimizing bike availability
Planning maintenance during low usage.
Targeted promotions during off-peak hours.

Negative Growth Insight:

Low activity during non-peak hours shows underutiliztion, which means lost revenue opportunities.
Reason: Bikes idle during off-hours= inefficient asset usage.

 Chart 3  Age Distribution of Riders

In [None]:
 #Create age column
 df_cleaned['age']=2018-df_cleaned['member_birth_year']


# Plot age distribution
df_cleaned['age'].plot(kind='hist', bins=30, edgecolor='black', title='Age Distribution of Riders')
plt.xlabel('Age')
plt.ylabel('Number of Riders')
plt.show()

##### 1. Why did you pick the specific chart?

Distribution of data: I chose this chart for ideal for visualization the distribution of numeric data, such as age.
Frequency insight: Allow for an easy understanding of how often age group appear within the dataset.

##### 2. What is/are the insight(s) found from the chart?

Age distribution: The age distribution is likely skewed, with a notable concentration of riders in certain age brackets.
Peak age group: Identify the age group with the highest numbers of riders.
Outliers detection: Possible indentification of outliers or unusal age ranges.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

POSITVE BUSSINESS IMPACT:

Targeted marketing: Insights can guide marketing strategies towards age groups that dominate ridership.
Product development: Adjust services or offering based on demographics preferences of the largest user base.
Resource allocation: Enhance services during peak age groups to maximize customers satisfaction and retention.

NEGATIVE GROWTH INSIGHTS:

Potential insights leading to negative growth:
Age group with low representation.
High level age.
Justification:
Limited market share: Failing to engage younger demographics could result in stagnant growth as older riders may eventually age out of the market.
Brand perception: A focus on older riders may affect brand appeal among younger customers, potentially hindering long-term sustainability.

 4 How long does the average trip take?

In [None]:
import pandas as pd
import matplotlib.pyplot as plt

# Read the CSV file from the zip archive
df_cleaned = pd.read_csv('/content/201801-fordgobike-tripdata.csv.zip')

# Drop rows with missing values in the 'start_time' and 'end_time' columns
df_cleaned = df_cleaned.dropna(subset=['start_time', 'end_time'])

# Convert 'start_time' and 'end_time' to datetime
df_cleaned['start_time'] = pd.to_datetime(df_cleaned['start_time'])
df_cleaned['end_time'] = pd.to_datetime(df_cleaned['end_time'])

# Calculate trip duration in seconds
df_cleaned['trip_duration'] = (df_cleaned['end_time'] - df_cleaned['start_time']).dt.total_seconds()

# Calculate the average trip duration
average_trip_duration = df_cleaned['trip_duration'].mean()

# Convert average duration to minutes for better readability
average_trip_duration_minutes = average_trip_duration / 60

# Create a bar chart for average trip duration
plt.figure(figsize=(8, 5))
plt.bar(['Average Trip Duration'], [average_trip_duration_minutes], color='skyblue')
plt.ylabel('Duration (minutes)')
plt.title('Average Trip Duration for Ford GoBike in January 2018')
plt.ylim(0, average_trip_duration_minutes + 5)  # Add some space above the bar
plt.grid(axis='y')

# Display the chart
plt.show()

##### 1. Why did you pick the specific chart?

I chose a bar chart to represent the average trip duration because it effectively displays a single categorical value, making it easy to compare against other potential metrics. The clear visual representation allows for quick understanding of the average duration, enhancing data interpretation and communication.







##### 2. What is/are the insight(s) found from the chart?

The chart reveals the average trip duration for Ford GoBike in January 2018, providing a clear metric for user behavior. Insights can include identifying trends in trip lengths, assessing user engagement, and informing operational decisions, such as bike availability and station placements based on typical trip durations.




##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 POSITIVE INSIGHT:
 Yes, the insights can drive positive business impact by optimizing bike distribution and enhancing user experience based on average trip durations.

 NEGATIVE INSIGHT:
  However, if the average trip duration is excessively long, it may indicate inefficiencies or user dissatisfaction, potentially leading to negative growth due to decreased user retention and increased operational costs.

#### Chart - 5 Is the trip duration affected by weather / months / seasons
The dataset doesn’t contain weather data, but we can analyze seasonal/monthly trends.

In [None]:
# Chart - 5 visualization code
# Extract month
df_cleaned['month'] = df_cleaned['start_time'].dt.month

# Average trip duration per month
monthly_duration = df_cleaned.groupby('month')['duration_sec'].mean() / 60  # in minutes

# Plot
monthly_duration.plot(kind='bar', title='Average Trip Duration by Month')
plt.ylabel('Duration (minutes)')
plt.xlabel('Month')
plt.grid(True)
plt.show()


##### 1. Why did you pick the specific chart?

I chose a bar chart to visualize average trip duration by month because it effectively displays categorical data, allowing for easy comparison across different months. This format highlights trends and variations in trip durations, facilitating quick insights into seasonal patterns and user behavior over time.





##### 2. What is/are the insight(s) found from the chart?

The chart reveals variations in average trip duration across different months, indicating seasonal trends in user behavior. Insights may include identifying peak usage periods, understanding user preferences, and assessing the impact of weather or events on trip lengths, which can inform operational strategies and marketing efforts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

POSIVE INSIGHT:

Yes, insights from monthly trip duration can enhance business impact by optimizing service during peak months and improving user engagement.

NEGATIVE INSIGHT:

However, if certain months show significantly longer durations, it may indicate user frustration or inefficiencies, potentially leading to negative growth due to decreased customer satisfaction and retention.

#### Chart - 6  Does the above depend on if a user is a subscriber or customer?

In [None]:
# Chart - 6 visualization code
# Compare average trip duration by user type
user_type_duration = df_cleaned.groupby('user_type')['duration_sec'].mean() / 60

# Print and plot
print(user_type_duration)

user_type_duration.plot(kind='bar', title='Average Trip Duration by User Type')
plt.ylabel('Duration (minutes)')
plt.grid(True)
plt.show()

##### 1. Why did you pick the specific chart?

I selected a bar chart to compare average trip durations by user type because it effectively visualizes categorical data, allowing for straightforward comparisons. This format highlights differences in user behavior, making it easier to identify trends and inform targeted strategies for enhancing user experience and service offerings.

##### 2. What is/are the insight(s) found from the chart?

Insights from the Chart:

User Type Differences: The chart reveals distinct average trip durations between user types, such as subscribers and casual users, indicating varying usage patterns and preferences.

Behavioral Trends: Longer durations for one user type may suggest different travel needs or habits, providing insights into how each group utilizes the bike-sharing service.

Service Improvement Opportunities: Identifying which user type has longer trips can highlight areas for service enhancements, such as bike availability or route optimization, to better meet user demands.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

POSITIVE INSIGHT:

Yes, insights on average trip durations by user type can drive positive business impact by enabling targeted marketing and service improvements.

NEGATIVE INSIGHT:

However, if one user type consistently shows longer durations, it may indicate dissatisfaction or operational inefficiencies, potentially leading to negative growth due to decreased user retention and loyalty.

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Suggestions for Achieving Business Objectives

To effectively achieve the business objective of enhancing user engagement and optimizing service offerings based on trip duration insights, the following strategies are recommended:

Data-Driven Decision Making:

 Regularly assess average trip durations by user type to identify trends and patterns. This data should inform strategic decisions regarding service improvements and marketing efforts.

Targeted Marketing Campaigns:

Develop tailored marketing campaigns for different user types. For instance, if subscribers have shorter trip durations, consider offering incentives for longer rides, such as discounts or loyalty points. Conversely, if casual users exhibit longer trip durations, promote features that enhance their experience, such as guided routes or special offers.

Service Optimization:

Based on the insights from trip durations, optimize bike availability and maintenance schedules. For example, if certain areas show higher usage, ensure that bikes are readily available in those locations. Additionally, consider implementing features like real-time bike tracking to improve user convenience.

User Feedback Mechanism:

Establish a feedback loop with users to gather insights on their experiences. Surveys or in-app feedback options can help identify pain points and areas for improvement. Addressing user concerns promptly can enhance satisfaction and retention.

Continuous Monitoring and Adaptation:

Regularly monitor trip duration data and user engagement metrics to assess the effectiveness of implemented strategies. Be prepared to adapt and refine approaches based on changing user behaviors and preferences.

Invest in Technology:

Consider investing in data analytics tools and technologies that can provide deeper insights into user behavior and preferences. This can facilitate more informed decision-making and enhance overall service delivery.
By implementing these strategies, the client can effectively meet their business objectives, leading to improved user satisfaction, increased retention, and optimized operations.

Answer Here.

# **Conclusion**

In conclusion, achieving the business objective of enhancing user engagement and optimizing service offerings requires a data-driven approach. By analyzing trip duration insights from the data, the client can implement targeted marketing campaigns, optimize services, and establish user feedback mechanisms. Continuous monitoring and adaptation of strategies will ensure responsiveness to user needs. Investing in technology for deeper analytics will further enhance decision-making capabilities. By following these recommendations, the client can improve user satisfaction, increase retention rates, and ultimately drive business growth, creating a more effective and user-centric bike-sharing service.Write the conclusion here.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***