<a href="https://colab.research.google.com/github/kaminikumari543/LabMetrix/blob/Development/UberEDA/Uber.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    - Uber




##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **Team Member 1 -**
##### **Team Member 2 -**
##### **Team Member 3 -**
##### **Team Member 4 -**

# **Project Summary -**

Write the summary here within 500-600 words.


The objective of this project is to perform Exploratory Data Analysis (EDA) on Uber ride request data to uncover trends, patterns, and bottlenecks in the ride-hailing process. The dataset contains detailed information on ride requests, including timestamps, pickup points, driver assignment, and ride status. Through structured data cleaning and analysis, this project aims to derive actionable business insights that can help improve operational efficiency and customer experience.

Data Cleaning
The dataset was first inspected for inconsistencies, missing values, and format issues. The key challenge was the inconsistent format of the Request_timestamp and Drop_timestamp columns. These were stored as strings in formats such as dd/mm/yyyy hh:mm:ss and were converted to DATETIME using CONVERT() in python for further analysis. Rows with missing or null values in critical columns like Pickup_point and Status were either cleaned or excluded from specific queries to ensure data integrity.

Text fields like Pickup_point and Status were standardized using TRIM() to remove leading/trailing whitespaces, preventing mismatches during grouping and filtering. The final cleaned dataset enabled reliable querying for time-based and categorical aggregations.

Analysis Insight

1.Peak Demand Hours
By extracting the hour from request timestamps, it was found that the highest demand occurs between 8 AM to 10 AM and 5 PM to 7 PM, corresponding to office commute times. This insight suggests a need for higher driver availability during these windows.

2.Request Status Distribution
A breakdown of ride request outcomes showed that while a good proportion of rides were completed, a significant number were either cancelled or unfulfilled due to no cars available. These two categories together represented a substantial loss of potential revenue and customer dissatisfaction.

3.Pickup Point vs. Status
Requests originating from the Airport showed a much higher cancellation rate compared to those from the City. This could be due to longer pickup wait times or inadequate driver availability at the airport. On the other hand, City pickups had a higher ride completion rate, indicating better service levels.

4.Driver Unavailability Trends
Analysis revealed that the early morning hours (5–7 AM) saw the most incidents of 'No Cars Available'. This indicates a driver supply issue that could be addressed with scheduling adjustments or driver incentives during these hours.

5.Overall Completion Rate
The project calculated an average trip completion rate of less than 70%, a clear indicator that operational inefficiencies are preventing the platform from meeting user demand consistently. Improvement here could lead to increased revenue and customer retention.







# **GitHub Link -**

Provide your GitHub Link here.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

# **Problem Statement**


**Write Problem Statement Here.**

Uber is experiencing a high number of ride cancellations and unfulfilled requests, especially during peak hours and at certain locations like airports. The goal is to analyze ride request data to identify demand patterns, operational bottlenecks, and areas for service improvement using exploratory data analysis.

#### **Define Your Business Objective?**

Answer Here.
1.Improve ride completion rates by identifying peak demand times and locations.

2.Reduce cancellations and "No Cars Available" cases through better driver allocation.

3.Optimize driver deployment strategies based on demand patterns.

4.Enhance customer satisfaction and service reliability.

5.Increase overall operational efficiency and revenue.

# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv('/content/Uber Request Data (1).csv')


### Dataset First View

In [None]:
# Dataset First Look
df = pd.read_csv('/content/Uber Request Data (1).csv')
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()

In [None]:
# Visualizing the missing values
sns.heatmap(df.isnull(), cbar=False)

### What did you know about your dataset?

Answer Here

The dataset contains Uber ride request records, including request and drop timestamps, pickup points (City or Airport), driver IDs, and ride statuses (Trip Completed, Cancelled, or No Cars Available). It helps analyze user demand patterns, service fulfillment, driver availability, and operational performance across time and location.

## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()


### Variables Description

Answer Here

Request_id: Unique identifier for each ride request.

Pickup_point: Location where the ride was requested (City or Airport).

Driver_id: ID of the driver assigned; null if not assigned.

Status: Outcome of the request (Trip Completed, Cancelled, No Cars Available).

Request_timestamp: Date and time when the ride was requested.

Drop_timestamp: Date and time when the trip ended (only for completed rides).

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Write your code to make your dataset analysis ready.
#convert date columns to date time
# Convert to datetime format
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], dayfirst=True, errors='coerce')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], dayfirst=True, errors='coerce')

# Remove duplicates
df.drop_duplicates(inplace=True)

# Clean text columns
df['Pickup point'] = df['Pickup point'].str.strip()
df['Status'] = df['Status'].str.strip()

#Handle missing or invalid data
df.dropna(subset=['Request timestamp', 'Drop timestamp'], inplace=True)

# Extract time based features
df['Request_hour'] = df['Request timestamp'].dt.hour
df['Request_day'] = df['Request timestamp'].dt.day
df['Request_month'] = df['Request timestamp'].dt.month

# Calculate trip in minutes
df['Trip_duration'] = (df['Drop timestamp'] - df['Request timestamp']).dt.total_seconds() / 60

# create final clean dataset
df.head()





### What all manipulations have you done and insights you found?

Answer Here.

1.Loaded CSV using pandas.read_csv().

2.Converted timestamps (Request_timestamp, Drop_timestamp) to datetime using pd.to_datetime() with dayfirst=True.

3.Removed duplicates using drop_duplicates().

4.Handled missing values by dropping rows with null Request_timestamp, Pickup_point, or Status.

5.Cleaned text fields (Pickup_point, Status) using str.strip().str.title() for consistency.

6.Extracted time-based features like Request_hour and Request_day.

7.Created trip duration column in minutes (Trip_duration_min).

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***



```
# This is formatted as code
```

#### Chart - 1   Bar Chart

In [None]:
# Chart - 1 visualization code
# Peak demand hours
plt.figure(figsize=(8, 5))
df['Request_hour'].value_counts().sort_index().plot(kind='bar')
plt.title('Peak Demand Hours')
plt.xlabel('Hour of the Day')
plt.ylabel('Number of Requests')


##### 1. Why did you pick the specific chart?

Answer Here.

It clearly shows hourly demand, making it easy to spot peak and low usage times.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

Peak at 9 AM

Second peak at 7 PM

Lowest demand around 1–3 AM and 1 PM

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Positive business impact:
Helps in resource planning, targeted marketing, and choosing low-traffic hours for maintenance.

Negative growth insights:

Low mid-day usage may indicate user drop-off.
Late-night low traffic suggests missed global reach.

#### Chart - 2  Pie Chart

In [None]:
# Chart - 2 visualization code
# Request status breakdown
plt.figure(figsize=(10, 6))
df['Status'].value_counts().plot(kind='pie', autopct='%1.1f%%', startangle=90)
plt.title('Request Status Breakdown')
plt.ylabel('')  # Remove the default label


##### 1. Why did you pick the specific chart?

Answer Here.

Pie charts are used to show percentage breakdown. This chart shows that 100% of the requests fall under a single status.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

All requests have the same status (e.g., success or failure) — no variation.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Positive business impact:

If 100% are successful, it shows excellent system reliability and performance.


Negative growth insight:

If 100% are failed or pending, it signals a serious issue in the request handling system needing urgent fix.


#### Chart - 3  Bar chart

In [None]:
# Chart - 3 visualization code
# status by pick up point
plt.figure(figsize=(8, 5))
df.groupby('Pickup point')['Status'].value_counts().unstack().plot(kind='bar', stacked=True)
plt.title('Status by Pickup Point')
plt.xlabel('Pickup Point')

##### 1. Why did you pick the specific chart?

Answer Here.

A bar chart is ideal for comparing trip completions across different pickup points (Airport vs City).


##### 2. What is/are the insight(s) found from the chart?

Answer Here

More trips are completed from City than Airport.

Both pickup points have high completion rates.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Positive business impact:

Helps allocate more drivers to City, where demand and completion are higher.

Ensures efficient fleet management.


Negative growth insight:

Slightly fewer completions at Airport may indicate issues like long wait times or fewer available drivers.

#### Chart - 4  Line Chart

In [None]:
# Chart - 4 visualization code
# No cars available by hours
no_cars_hour = df[df['Status'] == 'No Cars Available']['Request_hour'].value_counts().sort_index()

plt.figure(figsize=(7,3))
sns.barplot(x=no_cars_hour.index, y=no_cars_hour.values, color='salmon')
plt.title('No Cars Available by Hour')
plt.xlabel('Hour of Day')
plt.ylabel('No Cars Available Count')
plt.xticks(range(0, 24))
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here.

A line chart is suitable to track changes in car availability over 24 hours.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

No hours show cars unavailable; count stays at 0.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Positive business impact:

Strong indicator of excellent fleet availability—demand is being met.


Negative growth insight:

If the data is correct, no issue.

But if "no data" is shown as zero, it may falsely indicate full availability—leading to misinformed decisions. Validate the data source.



#### Chart - 5  Histogram Chart

In [None]:
# Chart - 5 visualization code
# Trip Duration distributation
plt.figure(figsize=(10, 6))
sns.histplot(df['Trip_duration'], bins=20, kde=True)
plt.title('Trip Duration Distribution')
plt.xlabel('Trip Duration (minutes)')
plt.ylabel('Frequency')

##### 1. Why did you pick the specific chart?

Answer Here.

A histogram is the clearest way to see the spread and shape of a numerical variable—in this case trip-duration minutes—so you can spot the typical length and any skew or outliers at a glance.


##### 2. What is/are the insight(s) found from the chart?

Answer Here

Most trips cluster around 45–55 min (mode).

Distribution is roughly bell-shaped but slightly right-skewed; a tail extends beyond 75 min.

Very short (<30 min) and very long (>80 min) trips are rare.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

Positive business impact

Knowing the typical trip length lets you set accurate ETA promises and optimize driver shift planning.

The right-tail alerts ops teams to allocate buffer time for occasional long trips without upsetting overall utilization.


Negative-growth insight

The long-tail (few trips >80 min) could reflect traffic bottlenecks or routing inefficiencies—if ignored, prolonged rides may hurt customer satisfaction and driver turnover. Monitoring and mitigating those extremes prevents reputational drag.


#### Chart - 6

In [None]:
# Chart - 6 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 7

In [None]:
# Chart - 7 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 8

In [None]:
# Chart - 8 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 9

In [None]:
# Chart - 9 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 10

In [None]:
# Chart - 10 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 11

In [None]:
# Chart - 11 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 12

In [None]:
# Chart - 12 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 13

In [None]:
# Chart - 13 visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code

##### 1. Why did you pick the specific chart?

Answer Here.

##### 2. What is/are the insight(s) found from the chart?

Answer Here

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Answer Here.

1.Improve Completion Rate

 Deploy more drivers during peak hours and high-demand zones using demand forecasting.

2.Reduce Cancellations & No Cars Available

 Implement incentive programs for drivers to stay active during early mornings and rush hours.

3.Optimize Driver Deployment

 Use pickup location trends (e.g., Airport vs City) to adjust driver positioning in real-time.

4.Enhance Customer Satisfaction

Reduce wait times through predictive dispatching and better ETA accuracy.

5.Increase Operational Efficiency & Revenue

 Improve allocation algorithms and avoid revenue loss from unserved requests.

# **Conclusion**

Write the conclusion here.

This EDA project highlighted key operational challenges faced by Uber, particularly in terms of time-based demand spikes, driver shortages, and location-specific inefficiencies. By leveraging the data cleaning and analysis, the project transformed raw ride request logs into valuable insights. These findings can guide decisions on driver scheduling, surge pricing, location-based dispatch strategies, and service reliability improvements. Ultimately, the goal is to use data to create a more balanced and responsive ride-hailing service for both drivers and riders.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***