## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv("/content/Uber Request clean data.csv")

### Dataset First View

In [None]:
# Dataset First Look
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
df.shape

### Dataset Information

In [None]:
# Dataset Info
df.info()



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
df.columns

In [None]:
# Dataset Describe
df.describe()

### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
df.nunique()

## ***3. Data Wrangling***

In [None]:
# Convert timestamps to datetime
df['Request timestamp'] = pd.to_datetime(df['Request timestamp'], errors='coerce')
df['Drop timestamp'] = pd.to_datetime(df['Drop timestamp'], errors='coerce')

## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1 (Request Status Count)

In [None]:
# Request Status Count
status_counts = df['Status'].value_counts()

plt.figure(figsize=(8, 8))
plt.pie(status_counts, labels=status_counts.index, autopct='%1.1f%%', colors=sns.color_palette('Set2'), startangle=170)
plt.title('Request Status Distribution')
plt.axis('equal')  # Equal aspect ratio ensures the pie chart is circular
plt.show()



##### 1. Why did you pick the specific chart?

A pie chart effectively shows proportionate distribution, making it ideal for visualizing the share of different request statuses.

##### 2. What is/are the insight(s) found from the chart?

A large portion of requests were Cancelled or had No Cars Available.
The completed trips are comparatively fewer than expected.


##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive Insight: Identifies the share of successful trips, helping assess operational efficiency.

Negative Insight: High cancellations or unavailability indicate poor driver supply or peak hour strain, impacting user experience and revenue.

#### Chart - 2 (Request Status by Pickup Point)

In [None]:
# Request Status by Pickup Point
plt.figure(figsize=(16, 24))
plt.subplot(4, 2, 2)
sns.countplot(data=df, x='Pickup point', hue='Status', palette='Set1')
plt.title('Request Status by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Number of Requests')

##### 1. Why did you pick the specific chart?

Grouped bar charts compare multiple categories (Status) across subgroups (Pickup Point), making it perfect for this case.

##### 2. What is/are the insight(s) found from the chart?

No Cars Available is significantly higher for Airport pickups.
City pickups see more Cancellations than unavailability.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Reveals location-specific issues, helping deploy targeted driver supply.

Negative: If Airport issues aren’t resolved, it can frustrate frequent flyers, hurting brand trust.

#### Chart - 3 (Hourly Trip Requests)

In [None]:
# Hourly Trip Requests
plt.figure(figsize=(16,24))
plt.subplot(4, 2, 3)
sns.histplot(data=df, x='Hour', bins=24, kde=False, color='skyblue')
plt.title('Hourly Trip Requests')
plt.xlabel('Hour of the Day')
plt.ylabel('Request Count')


##### 1. Why did you pick the specific chart?

A histogram shows frequency over a continuous variable. Hour-wise distribution helps detect demand patterns.

##### 2. What is/are the insight(s) found from the chart?

Peaks in requests observed during morning (7–9 AM) and evening (5–9 PM), typical rush hours.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Allows scheduling more drivers during high-demand hours.

Negative: Ignoring this trend may lead to higher cancellations or unavailability during peak hours.

#### Chart - 4 (Cancellation Rate by Hour)

In [None]:
# Cancellation Rate by Hour
plt.figure(figsize=(16,24))
plt.subplot(4, 2, 4)
cancel_rate = df[df['Status'] == 'Cancelled'].groupby('Hour').size() / df.groupby('Hour').size()
cancel_rate.plot(kind='bar', color='salmon')
plt.title('Cancellation Rate by Hour')
plt.xlabel('Hour')
plt.ylabel('Cancellation Rate')

##### 1. Why did you pick the specific chart?

Bar charts work well to show rates or comparisons over discrete intervals (hours).

##### 2. What is/are the insight(s) found from the chart?

Early morning (5–9 AM) sees the highest cancellation rates.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Spotting high-risk time slots aids in deploying incentives for drivers.

Negative: High morning cancellations could frustrate users, especially those commuting or catching flights.

#### Chart - 5 ( Driver Availability by Pickup Point)

In [None]:
# Driver Availability by Pickup Point
plt.figure(figsize=(16,24))
plt.subplot(4, 2, 5)
available = df[df['Status'] == 'Trip Completed'].groupby('Pickup point')['Driver id'].nunique()
available.plot(kind='bar', color='mediumseagreen')
plt.title('Available Drivers by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Unique Drivers')

##### 1. Why did you pick the specific chart?

Bar charts effectively display count-based metrics like unique drivers at different locations.

##### 2. What is/are the insight(s) found from the chart?

More drivers are completing trips in the City than from the Airport.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Resource allocation can be optimized by moving idle drivers to underserved areas like the Airport.

Negative: If Airport demand is unmet due to lower driver presence, it leads to loss of high-value customers.

#### Chart - 6 (Trip Completion vs Other Status Over Hours)

In [None]:
# Trip Completion vs Other Status Over Hours
plt.figure(figsize=(30,24))
plt.subplot(4, 2, 6)
sns.countplot(data=df, x='Hour', hue='Status', palette='pastel')
plt.title('Trip Status Distribution by Hour')
plt.xlabel('Hour')
plt.ylabel('Number of Requests')

##### 1. Why did you pick the specific chart?

Grouped bars let us compare multiple statuses over time, ideal for temporal behavior analysis.

##### 2. What is/are the insight(s) found from the chart?

Morning and evening hours show a spike in cancellations and unavailability, confirming demand–supply mismatch.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Positive: Helps devise dynamic pricing and surge-based allocation.

Negative: Persistent mismatch may reduce user retention if left unaddressed.

# **Conclusion**

The analysis of Uber request data reveals key operational patterns and bottlenecks. Peak demand occurs during morning and evening rush hours, yet these times also face the highest cancellation and unavailability rates, especially from the airport. A significant number of requests remain unfulfilled, indicating a supply-demand imbalance. Most completed trips are short, averaging 10–30 minutes, suggesting urban commute dominance. These insights can guide better driver allocation, dynamic pricing, and targeted interventions to reduce service failures. Addressing the high cancellation rates and optimizing driver availability will improve customer satisfaction and enhance business efficiency, ultimately supporting sustained growth and competitive advantage.
