# **Project Name**    - Uber Requests



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual

# **Project Summary -**

This project focuses on performing Exploratory Data Analysis (EDA) on the Uber Request dataset to understand ride demand, supply issues, and trip completion patterns. The dataset contains information such as request ID, pickup point, driver ID, trip status, request date and time, and drop date and time.

The analysis was carried out using Excel, SQL, and Python (Pandas & NumPy) to cover the complete data analysis lifecycle. Initially, the raw data was cleaned using Excel, where missing values were handled, date and time fields were separated, time slots were created, and new features such as request hour, drop hour, and trip duration were derived. Pivot tables and dashboards were also created in Excel to visualize key trends and patterns.

Next, SQL queries were used to extract meaningful insights from the data. These queries helped in identifying total requests, status-wise distribution of trips, pickup point analysis (Airport vs City), peak demand hours, and time periods with maximum “No Cars Available” issues. SQL made it easier to perform aggregation and grouping operations to support data-driven conclusions.

Further, Python (Pandas) was used for Exploratory Data Analysis (EDA) to validate findings and understand data distribution. Using Python, missing values were analyzed, categorical distributions were examined, and summary statistics were generated to support the insights obtained from Excel and SQL.

The analysis revealed that a significant number of ride failures were due to No Cars Available, especially during morning and evening peak hours. Airport pickup points showed higher supply-demand gaps compared to city pickups. Additionally, completed airport trips generally had longer durations than city trips.

Overall, this project demonstrates how combining Excel for data cleaning and visualization, SQL for querying insights, and Python for EDA can provide a comprehensive understanding of real-world operational data. The insights from this analysis can help in improving driver allocation strategies and reducing supply shortages during peak demand periods.

# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


In [None]:
The objective of this project is to perform Exploratory Data Analysis (EDA) on the Uber request
dataset to understand ride request patterns, booking status, and operational challenges.
The dataset contains details such as request ID, pickup point, driver ID, ride status,
request timestamp, and drop timestamp.
EDA is used to identify demand-supply gaps, missing values, ride cancellations,
and time-based trends that affect ride completion.


#### **Define Your Business Objective?**

Answer Here. The business objective of this analysis is to identify key factors affecting ride fulfillment
in the Uber request system. The analysis aims to understand patterns of completed,
cancelled, and unassigned requests based on pickup locations and time slots.
These insights can help improve driver allocation, reduce ride cancellations,
and enhance overall customer satisfaction.


# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

### Dataset Loading

In [None]:

df = pd.read_csv('Uber Request Data.csv') 

### Dataset First View

In [None]:
# Dataset First Look
df.head()


### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print("Number of Rows:", df.shape[0])
print("Number of Columns:", df.shape[1])


### Dataset Information

In [None]:
# Dataset Info
df.info()


#### Duplicate Values

In [None]:
# Dataset Duplicate Value Count
df.duplicated().sum()


#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count
df.isnull().sum()


In [None]:
# Visualizing the missing values
import seaborn as sns
import matplotlib.pyplot as plt

plt.figure(figsize=(10,5))
sns.heatmap(df.isnull(), cbar=False)
plt.title("Missing Values Heatmap")
plt.show()


### What did you know about your dataset?

Answer Here. From the initial exploration of the Uber Request dataset, it was observed that the dataset contains multiple categorical and time-based features such as request timestamps, pickup points, driver availability, and trip status. Missing values are present mainly in the driver-related and drop timestamp columns, indicating cancelled or unfulfilled requests. No major duplicate records were found in the dataset. The dataset is suitable for time-based and demand-supply analysis to identify peak hours, cancellation patterns, and operational gaps.


## ***2. Understanding Your Variables***

In [None]:
The Uber Request dataset contains both categorical and time-based variables. Key variables include request timestamp, drop timestamp, pickup point, and trip status. The timestamp variables capture the time of request and drop, which are crucial for analyzing peak demand hours and delays. Categorical variables such as pickup point and status help in understanding demand distribution and cancellation patterns. Some variables contain missing values, particularly in driver assignment and drop time, indicating unfulfilled or cancelled trips.


In [None]:
# Dataset Describe
# Check Unique Values for each variable
for col in df.columns:
    print(f"\nUnique values in {col}:")
    print(df[col].unique())


### Variables Description

Answer Here. ### Variables Description

- **Request_id**: A unique identifier for each Uber ride request.
- **Pickup_point**: Location from where the customer requested the ride (City or Airport).
- **Driver_id**: Unique ID assigned to the driver. Missing values indicate that no driver was assigned.
- **Status**: Current status of the ride request such as Trip Completed, Cancelled, or No Cars Available.
- **Request_timestamp**: Date and time when the ride was requested.
- **Drop_timestamp**: Date and time when the ride was completed. This is missing for cancelled or unfulfilled requests.


### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.

for column in df.columns:
    print(f"\nUnique values in {column}:")
    print(df[column].unique())


## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Convert timestamps to datetime format
df['Request_timestamp'] = pd.to_datetime(df['Request_timestamp'])
df['Drop_timestamp'] = pd.to_datetime(df['Drop_timestamp'])

# Extract hour and day from request timestamp
df['Request_hour'] = df['Request_timestamp'].dt.hour
df['Request_day'] = df['Request_timestamp'].dt.day_name()

# Check missing values
df.isnull().sum()


### What all manipulations have you done and insights you found?

Answer Here. ### Data Manipulations and Insights

The following data manipulations were performed to make the dataset analysis-ready:

1. Converted request and drop timestamp columns into datetime format.
2. Extracted new features such as request hour and request day from the timestamp.
3. Checked and handled missing values, especially in driver ID and drop timestamp columns.
4. Removed duplicate records to ensure data consistency.
5. Created new derived columns to support time-based analysis.

Key insights found during data wrangling:
- Missing driver IDs indicate unassigned or cancelled ride requests.
- Drop timestamp is missing for cancelled and no-car-available requests.
- Time-based features helped identify peak demand hours and operational gaps.


## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
import matplotlib.pyplot as plt
import seaborn as sns

# Ride Status Distribution
plt.figure(figsize=(8,5))
sns.countplot(x='Status', data=df)
plt.title("Ride Status Distribution")
plt.xlabel("Ride Status")
plt.ylabel("Number of Requests")
plt.show()


##### 1. Why did you pick the specific chart?

Answer Here. A bar chart was chosen to clearly compare the number of ride requests across different ride statuses. 
It helps in easily identifying which status occurs most frequently.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. The chart shows that although a significant number of rides are completed, 
a large portion of requests are either cancelled or marked as no cars available.
This highlights supply-demand imbalance during certain time periods.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here.High cancellation rates and no-car-available requests negatively impact customer satisfaction.
These insights can help Uber improve driver allocation strategies during peak hours
to reduce lost business opportunities.

#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(8,5))
sns.countplot(x='Pickup_point', hue='Status', data=df)
plt.title("Pickup Point vs Ride Status")
plt.xlabel("Pickup Point")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart was selected to compare ride outcomes such as completed, cancelled, and no-car-available requests across different pickup locations. Using a grouped bar chart makes it easier to understand how location impacts ride fulfillment.

##### 2. What is/are the insight(s) found from the chart?

Answer Here.The chart clearly shows that airport pickup points have a higher number of “No Cars Available” requests compared to city pickups. City locations have relatively better ride completion rates, indicating better driver availability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. High unfulfilled requests at airport locations negatively impact customer satisfaction and may lead to revenue loss. Improving driver availability at airports can significantly improve service quality and business performance.

#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='Request_hour', data=df)
plt.title("Hour-wise Ride Requests")
plt.xlabel("Hour of Day")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart helps in understanding how ride demand varies throughout the day. Hour-wise analysis is important for identifying peak and non-peak demand periods.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. The chart shows that ride requests peak during morning and evening hours, which typically correspond to office commute times. Late night and early morning hours have comparatively lower demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This insight helps in planning driver shifts more efficiently. Increasing driver supply during peak hours can reduce cancellations and improve overall ride completion rates.

#### Chart - 4

In [None]:
# Chart - 4 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='Request_hour', data=df[df['Status'] == 'No Cars Available'])
plt.title("No Cars Available Requests by Hour")
plt.xlabel("Hour of Day")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart focuses specifically on failed requests due to car unavailability. It helps identify time periods where supply is not meeting demand.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. The highest number of no-car-available requests occurs during peak demand hours. This indicates that driver supply is insufficient exactly when demand is highest.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This is a negative business indicator as customers are unable to book rides during critical hours. Addressing this gap can directly improve customer retention and revenue.

#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(8,5))
sns.countplot(x='Request_day', data=df)
plt.title("Day-wise Ride Requests")
plt.xlabel("Day of Week")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart is useful for analyzing ride demand patterns across different days of the week. It helps identify busy and slow days.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. Certain weekdays show higher ride demand compared to others. This suggests that demand is influenced by regular work schedules rather than being evenly distributed.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. Understanding day-wise demand helps in weekly driver planning and incentive allocation. This insight has a positive impact by enabling better operational planning.

#### Chart - 6

In [None]:
# Chart - 6 visualization code
df['Status'].value_counts().plot.pie(autopct='%1.1f%%', figsize=(6,6))
plt.title("Ride Status Distribution")
plt.ylabel("")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. A pie chart is effective for showing the proportion of different ride statuses in the dataset. It gives a quick overview of overall ride outcomes.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. A significant portion of ride requests are either cancelled or marked as no cars available. Completed rides do not dominate the dataset as expected.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This indicates a negative business impact due to lost ride opportunities. Reducing cancellations and unfulfilled requests can greatly improve profitability and customer trust.

#### Chart - 7

In [None]:
# Chart - 7 visualization code
df['Driver_Assigned'] = df['Driver_id'].notnull()

plt.figure(figsize=(8,5))
sns.countplot(x='Driver_Assigned', hue='Status', data=df)
plt.title("Driver Assignment vs Ride Status")
plt.xlabel("Driver Assigned (True / False)")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart was chosen to understand the relationship between driver availability and ride outcomes. It helps analyze how the presence or absence of a driver affects ride completion.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. The chart shows that when drivers are not assigned, most requests result in “No Cars Available” or cancellation. Requests with assigned drivers have a much higher completion rate.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This insight highlights a negative impact caused by driver shortages. Improving driver availability can directly increase completed rides and customer satisfaction.

#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='Request_hour', data=df[df['Pickup_point'] == 'Airport'])
plt.title("Hour-wise Airport Ride Requests")
plt.xlabel("Hour of Day")
plt.ylabel("Number of Airport Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart focuses only on airport pickups to understand demand patterns at airports. Airport rides often have different demand behavior compared to city rides.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. Airport ride requests peak during specific morning and evening hours, likely aligned with flight schedules. Demand is not evenly distributed throughout the day.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This insight allows targeted driver deployment at airports during peak hours, positively impacting service efficiency and reducing missed ride opportunities.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='Request_hour', data=df[df['Pickup_point'] == 'City'])
plt.title("Hour-wise City Ride Requests")
plt.xlabel("Hour of Day")
plt.ylabel("Number of City Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart helps analyze city ride demand separately to compare it with airport demand. It provides clarity on urban travel patterns.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. City ride demand is higher during office commute hours, particularly in the morning and evening. Midday demand is comparatively stable.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. Understanding city demand helps in better scheduling of drivers and improving ride availability during peak commuting hours, leading to a positive business impact.

#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='Request_day', hue='Status', data=df)
plt.title("Day-wise Ride Status Distribution")
plt.xlabel("Day of Week")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart was selected to analyze how ride outcomes vary across different days of the week. It helps identify operational issues on specific days.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. Some weekdays show higher cancellations and no-car-available requests compared to others. This suggests uneven driver availability across the week.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This insight can be used to plan weekly driver incentives and staffing strategies. Proper planning can reduce cancellations and improve consistency.

#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(10,5))
sns.countplot(x='Request_hour', data=df[df['Status'] == 'Trip Completed'])
plt.title("Hour-wise Completed Rides")
plt.xlabel("Hour of Day")
plt.ylabel("Completed Trips")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart focuses only on successful trips to understand when rides are most efficiently completed.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. Completed rides are highest during non-peak hours compared to peak hours, where cancellations and no-car-available issues are more frequent.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This indicates a negative impact during peak hours due to operational strain. Improving supply during peak times can increase completion rates.

#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.figure(figsize=(6,5))
sns.countplot(x='Pickup_point', data=df[df['Status'] == 'No Cars Available'])
plt.title("No Cars Available by Pickup Point")
plt.xlabel("Pickup Point")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart isolates failed requests due to car unavailability and compares them across pickup locations.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. Airport pickup points experience significantly higher no-car-available requests compared to city pickups.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. This represents a negative growth indicator. Increasing driver availability at airports can directly reduce lost revenue and improve customer experience.

#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(8,5))
order = ['Monday','Tuesday','Wednesday','Thursday','Friday','Saturday','Sunday']
sns.countplot(x='Request_day', data=df, order=order)
plt.title("Day-wise Ride Demand Trend")
plt.xlabel("Day of Week")
plt.ylabel("Number of Requests")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. This chart helps understand weekly demand patterns and highlights which days consistently receive higher ride requests.

##### 2. What is/are the insight(s) found from the chart?

Answer Here.  Weekdays show higher demand compared to weekends, indicating rides are largely driven by office and routine travel needs.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here. Positive impact for planning weekly operations. Driver incentives can be optimized for high-demand weekdays to improve fulfillment.

#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Selecting numerical columns
num_df = df.select_dtypes(include=['int64', 'float64'])

plt.figure(figsize=(8,6))
sns.heatmap(num_df.corr(), annot=True, cmap='coolwarm')
plt.title("Correlation Heatmap of Numerical Variables")
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here. A correlation heatmap is used to understand the relationship between numerical variables in the dataset. It helps identify whether changes in one variable are associated with changes in another variable, either positively or negatively.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. The heatmap shows that most numerical variables in the dataset have weak to moderate correlations with each other. Time-based variables such as request hour show some relationship with completion and driver assignment patterns, while no strong correlation exists among most features.

#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
import seaborn as sns
import matplotlib.pyplot as plt

# Selecting numerical columns
num_df = df.select_dtypes(include=['int64', 'float64'])

sns.pairplot(num_df)
plt.suptitle("Pair Plot of Numerical Variables", y=1.02)
plt.show()

##### 1. Why did you pick the specific chart?

Answer Here.  A pair plot was chosen to analyze the relationship between multiple numerical variables simultaneously.
It helps in understanding data distribution, identifying patterns, and checking whether any strong
linear relationship exists between variables.

##### 2. What is/are the insight(s) found from the chart?

Answer Here. The pair plot shows that most numerical variables do not have strong linear relationships with each other.
The data points are widely scattered, indicating that ride outcomes are influenced by multiple independent
factors rather than a single numerical variable.

## **5. Solution to Business Objective**

Answer Here. To achieve the business objective of improving ride completion rates and customer satisfaction,
the company should focus on balancing supply and demand more effectively. Increasing driver
availability during peak hours, especially at airport locations, can significantly reduce
no-car-available requests. Implementing time-based incentives for drivers and using historical
demand data for better forecasting will help optimize operations and increase revenue.

# **Conclusion**

This exploratory data analysis provided valuable insights into ride request patterns, driver
availability, and ride outcomes. The analysis revealed that peak hours and airport pickup points
experience the highest demand and also the highest number of unfulfilled requests. By leveraging
time-based and location-based insights, the business can improve driver allocation strategies,
reduce cancellations, and enhance overall customer experience. This EDA demonstrates how data-
driven decision making can help improve operational efficiency and business performance.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***