<a href="https://colab.research.google.com/github/rgguptaruchi1999/EDA-amazon-prime-videos-project-lvideo-link/blob/main/Uber_Supply_Demand_Gap_in_EDA.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Project Name**    -  Uber Supply - Demand Gap in EDA



##### **Project Type**    - EDA/Regression/Classification/Unsupervised
##### **Contribution**    - Individual/Team
##### **Team Member 1 -** - Individual

# **Project Summary -**

The project analyzes the supply-demand gap for Uber, focusing on why drivers cancel requests and why cars are unavailable. The goal is to find the root causes of these problems and suggest solutions. By doing so, Uber can improve customer satisfaction and increase its revenue.

This project dives into a common problem for ride-sharing services like Uber: the gap between passenger demand and driver supply. When a rider requests a car and either the driver cancels or no cars are available, it creates a negative customer experience and directly impacts Uber's revenue.

The analysis will focus on identifying the root causes behind these two key issues:

Cancellations: Why do drivers cancel trips after accepting them?
Non-availability: Why are there not enough cars available in certain areas or at certain times?
By analyzing the provided data, the project aims to uncover patterns and insights related to these problems. The ultimate goal is to provide Uber with a clear understanding of the underlying issues and to recommend data-driven solutions to improve driver availability and reduce cancellations, leading to a better customer experience and increased revenue.



# **GitHub Link -**

Provide your GitHub Link here.

# **Problem Statement**


**Write Problem Statement Here.**    The aim of analysis is to identify the root cause of the problem (i.e. cancellation and non-availability of cars) and recommend ways to improve the situation. As a result of your analysis, we will be able to present to the client the root cause(s) and possible hypotheses of the problem(s) and recommend ways to improve them. .

#### **Define Your Business Objective?**

 We may have some experience of travelling to and from the airport. We have used Uber or any other cab service for this travel? Did you at any time face the problem of cancellation by the driver or non-availability of cars?

Well, if these are the problems faced by customers, these very issues also impact the business of Uber. If drivers cancel the request of riders or if cars are unavailable, Uber loses out on its revenue. Letâ€™s hear more about such problems that Uber faces during its operations.




There are six attributes associated with each request made by a customer:

Request id: A unique identifier of the request

Time of request: The date and time at which the customer made the trip request

Drop-off time: The drop-off date and time, in case the trip was completed

Pick-up point: The point from which the request was made

Driver id: The unique identification number of the driver

Status of the request: The final status of the trip, that can be either completed, cancelled by the driver or no cars available



# **General Guidelines** : -  

1.   Well-structured, formatted, and commented code is required.
2.   Exception Handling, Production Grade Code & Deployment Ready Code will be a plus. Those students will be awarded some additional credits.
     
     The additional credits will have advantages over other students during Star Student selection.
       
             [ Note: - Deployment Ready Code is defined as, the whole .ipynb notebook should be executable in one go
                       without a single error logged. ]

3.   Each and every logic should have proper comments.
4. You may add as many number of charts you want. Make Sure for each and every chart the following format should be answered.
        

```
# Chart visualization code
```
            

*   Why did you pick the specific chart?
*   What is/are the insight(s) found from the chart?
* Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

5. You have to create at least 20 logical & meaningful charts having important insights.


[ Hints : - Do the Vizualization in  a structured way while following "UBM" Rule.

U - Univariate Analysis,

B - Bivariate Analysis (Numerical - Categorical, Numerical - Numerical, Categorical - Categorical)

M - Multivariate Analysis
 ]





# ***Let's Begin !***

## ***1. Know Your Data***

### Import Libraries

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

### Dataset Loading

In [None]:
# Load Dataset
df = pd.read_csv("Uber Request Data (1).csv")

### Dataset First View

In [None]:
df.head()

### Dataset Rows & Columns count

In [None]:
# Dataset Rows & Columns count
print(f"The dataset has {df.shape[0]} rows and {df.shape[1]} columns.")

### Dataset Information

In [None]:
# Dataset Info
df.info()

#### Duplicate Values

In [None]:
# Duplicate Values
if df.duplicated().sum() > 0:
    print("The dataset contains duplicate rows.")
else:
    print("The dataset does not contain any duplicate rows.")

#### Missing Values/Null Values

In [None]:
# Missing Values/Null Values Count

In [None]:
# Visualizing the missing values
plt.figure(figsize=(10, 6))
sns.heatmap(df.isnull(), cbar=False, cmap='viridis')
plt.title('Missing Values Heatmap')
plt.show()

### What did you know about your dataset?

The dataset contains 6745 rows and 8 columns. There are no duplicate rows.

Missing Values:

Driver id: 2650 missing values
Drop timestamp: 3914 missing values
Drop timestamp.1: 3914 missing values
The missing values in Driver id and Drop timestamp likely correspond to trips that were not completed (i.e., 'Cancelled' or 'No Cars Available'). The two timestamp columns, Request timestamp and Drop timestamp seem to be split into two columns each, one for the date and one for the time. These will need to be combined and converted to a datetime format for proper analysis.



## ***2. Understanding Your Variables***

In [None]:
# Dataset Columns
print(df.columns)

In [None]:
df.describe()

### Variables Description

Request id: A unique identifier for each request.
Pickup point: The location where the rider requested the pickup (Airport or City).
Driver id: A unique identifier for each driver.
Status: The status of the request (Trip Completed, Cancelled, No Cars Available).
Request timestamp: The date and time when the request was made.
Drop timestamp: The date and time when the trip was completed.
Request timestamp.1: The time portion of the request timestamp.
Drop timestamp.1: The time portion of the drop timestamp.



### Check Unique Values for each variable.

In [None]:
# Check Unique Values for each variable.
for col in df.columns:
    print(f"Unique values in {col}:")
    print(df[col].unique())
    print("-" * 30)

## 3. ***Data Wrangling***

### Data Wrangling Code

In [None]:
# Load Dataset
df = pd.read_csv("Uber Request Data (1).csv")

# Data Wrangling Code
def parse_time(date_str, time_str):
    if pd.isna(date_str) or pd.isna(time_str):
        return pd.NaT
    try:
        return pd.to_datetime(date_str + ' ' + time_str)
    except:
        return pd.to_datetime(date_str + ' ' + time_str, format='%d-%m-%Y %I.%M.%S %p')

df['Request datetime'] = df.apply(lambda row: parse_time(row['Request timestamp'], row['Request timestamp.1']), axis=1)
df['Drop datetime'] = df.apply(lambda row: parse_time(row['Drop timestamp'], row['Drop timestamp.1']), axis=1)


# Drop the original timestamp columns
cols_to_drop = ['Request timestamp', 'Drop timestamp', 'Request timestamp.1', 'Drop timestamp.1']
df = df.drop([col for col in cols_to_drop if col in df.columns], axis=1)

# Display the first few rows with the updated timestamp columns
df.head()

In [None]:
# Load Dataset
df = pd.read_csv("Uber Request Data (1).csv")

# Data Wrangling Code
def parse_time(date_str, time_str):
    if pd.isna(date_str) or pd.isna(time_str):
        return pd.NaT
    try:
        return pd.to_datetime(date_str + ' ' + time_str)
    except:
        return pd.to_datetime(date_str + ' ' + time_str, format='%d-%m-%Y %I.%M.%S %p')

df['Request datetime'] = df.apply(lambda row: parse_time(row['Request timestamp'], row['Request timestamp.1']), axis=1)
df['Drop datetime'] = df.apply(lambda row: parse_time(row['Drop timestamp'], row['Drop timestamp.1']), axis=1)


# Drop the original timestamp columns
cols_to_drop = ['Request timestamp', 'Drop timestamp', 'Request timestamp.1', 'Drop timestamp.1']
df = df.drop([col for col in cols_to_drop if col in df.columns], axis=1)

# Display the first few rows with the updated timestamp columns
df.head()

### What all manipulations have you done and insights you found?

**Data Manipulations:**

Loaded the dataset: Imported the data from the CSV file into a pandas DataFrame.
Handled Timestamps: Combined the separate date and time columns ('Request timestamp' and 'Request timestamp.1', 'Drop timestamp' and 'Drop timestamp.1') into single datetime columns ('Request timestamp', 'Drop timestamp'). This was done using a custom function to handle different time formats. The original, redundant timestamp columns were then dropped.

**Insights Found:**

High Rate of Uncompleted Trips: The initial analysis of the 'Status' column revealed a significant number of trips that were not completed. There are a large number of cancellations by drivers and instances where no cars were available.
Data Quality Issues: The initial exploration of the data revealed inconsistencies in the timestamp columns, which required data wrangling to fix. There are also a significant number of missing values in the 'Driver id' and 'Drop timestamp' columns, which are likely related to the uncompleted trips.




## ***4. Data Vizualization, Storytelling & Experimenting with charts : Understand the relationships between variables***

#### Chart - 1

In [None]:
# Chart - 1 visualization code
plt.figure(figsize=(8, 6))
sns.countplot(x='Status', data=df)
plt.title('Distribution of Trip Status')
plt.xlabel('Trip Status')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

The type of data you're visualizing: Is it categorical, numerical, or a combination of both? Different charts are good for different types of data.
The story you're trying to tell: What is the main point you want to get across with the chart? Are you trying to show a comparison, a distribution, a relationship, or a composition?
Why this chart is better than other options: Briefly explain why the chart you chose is the most effective way to communicate your message.
For example, for the count plot of trip status, you could say:

"I chose a count plot because I wanted to show the distribution of a categorical variable, 'Status'. This chart makes it easy to see how many trips were completed, canceled, or unavailable, which is the first step in understanding the supply-demand gap. A pie chart could also have been used, but a bar chart is often easier to read and compare values."



##### 2. What is/are the insight(s) found from the chart?

The main insight from this chart is that there's a significant problem with unfulfilled ride requests. Here's a breakdown:

High Number of "No Cars Available": This is the largest category after "Trip Completed". It means that a large number of customers are trying to get a ride, but there are no drivers available to pick them up. This points to a shortage of drivers in certain areas or at certain times.
High Number of "Cancelled": This is also a significant category. It means that even when a driver accepts a request, they often cancel it. This could be due to a variety of reasons, such as the driver getting a better offer, the pickup location being too far away, or the driver not wanting to go to the drop-off location.
Together, these two categories show a major imbalance between the demand for rides and the supply of drivers. This is a serious problem for Uber because it leads to a bad customer experience and lost revenue.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights we've gained can have a significant positive impact on the business. Here's how:

Identifying the Core Problem: The data clearly shows that the main problem is the high number of unfulfilled requests ("No Cars Available" and "Cancelled"). This is a crucial first step. You can't solve a problem until you know what it is.
Targeted Solutions: Now that we know the problem, we can start to look for solutions. For example, if we see that there are a lot of "No Cars Available" requests in a certain area at a certain time, Uber can offer incentives to drivers to go to that area. If we see that a lot of drivers are canceling trips, we can investigate why and try to address their concerns.
Improved Customer Experience: By reducing the number of unfulfilled requests, we can make customers happier. This will lead to more repeat business and a better reputation for Uber.
Increased Revenue: Every completed trip is revenue for Uber. By reducing the number of unfulfilled requests, we can increase the number of completed trips and, therefore, increase revenue.
In short, the insights we've gained allow us to move from simply knowing that there's a problem to understanding the nature of the problem and developing targeted solutions to fix it.



#### Chart - 2

In [None]:
# Chart - 2 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='Pickup point', hue='Status', data=df)
plt.title('Trip Status by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a grouped bar chart to compare the trip status across different pickup points (Airport and City). This chart makes it easy to see the distribution of completed, canceled, and unavailable trips for each location, allowing for a direct comparison. This helps us to identify if the supply-demand gap is more pronounced in one location over the other.



##### 2. What is/are the insight(s) found from the chart?

The chart reveals a fundamental difference in the nature of the supply-demand problem between the two locations. At the airport, the issue is primarily a lack of supply. There simply aren't enough drivers to meet the demand. In the city, however, the problem is more about supply inefficiency. While there are more drivers available, a significant portion of them are canceling trips, which suggests there are other factors at play, such as trip distance, destination, or traffic. This distinction is crucial because it suggests that different solutions are needed for each location.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Targeted Solutions: By understanding that the problem at the airport is a lack of drivers, Uber can implement targeted incentives to encourage more drivers to go to the airport, such as offering bonuses for airport pickups. For the city, where the problem is cancellations, Uber can investigate the reasons for cancellations and address them. For example, they could provide drivers with more information about a trip before they accept it, or they could penalize drivers for excessive cancellations.
Improved Customer Experience: By addressing the specific issues in each location, Uber can improve the customer experience. At the airport, customers will have to wait less time for a car. In the city, customers will experience fewer cancellations.
Increased Revenue: By increasing the number of completed trips in both locations, Uber can increase its revenue.
In short, by understanding the nuances of the supply-demand problem in different locations, Uber can develop more effective and targeted solutions, which will lead to a better customer experience and increased revenue.




 The insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.



#### Chart - 3

In [None]:
# Chart - 3 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='Driver id', hue='Status', data=df)
plt.title('Trip Status by Driver')
plt.xlabel('Driver ID')
plt.ylabel('Count')
plt.show()

##### 1. Why did you pick the specific chart?

Comparing a Categorical Variable Across Different Groups: Our main goal here is to compare the Status of trips (a categorical variable with three levels: "Trip Completed", "Cancelled", "No Cars Available") across two different Pickup point groups ("Airport" and "City"). A grouped bar chart is specifically designed for this purpose. It allows us to place the bars for each status side-by-side for each pickup point, making the comparison direct and intuitive.
Clear Visual Comparison: The grouped nature of the chart makes it very easy to visually compare the counts of each status between the Airport and the City. For example, you can immediately see that the green bar ("No Cars Available") is much taller for the Airport than for the City, while the orange bar ("Cancelled") is much taller for the City. This kind of direct visual comparison would be much harder with other chart types, like stacked bar charts or pie charts.
Alternative Charts and Why They Are Less Suitable:
Stacked Bar Chart: A stacked bar chart would show the proportion of each status within each pickup point, but it would be difficult to compare the absolute number of, say, "Cancelled" trips between the Airport and the City.
Pie Charts: We could create two separate pie charts, one for the Airport and one for the City. However, it's well-known that humans are not very good at comparing angles in pie charts, so it would be difficult to accurately compare the proportions of each status between the two locations. Also, pie charts don't show the absolute numbers, only the proportions.
Separate Bar Charts: We could create two separate bar charts, but placing them side-by-side in a single grouped bar chart makes the comparison more direct and efficient.
In summary, the grouped bar chart is the most effective choice because it directly supports our analytical goal of comparing the distribution of a categorical variable across different groups, and it does so in a way that is clear, intuitive, and visually effective.



##### 2. What is/are the insight(s) found from the chart?

Too Many Drivers: The chart is trying to show the trip status for every single driver. Since there are hundreds of drivers, the bars are all squished together, making it impossible to read the driver IDs or see any clear patterns.
"No Cars Available" is Not Linked to a Driver: The "No Cars Available" status isn't associated with a specific driver, so including it in this chart is a bit misleading.
Because of these issues, it's very difficult to draw any meaningful conclusions from this chart. It might be more useful to look at the top N drivers with the most completed trips or the most cancellations, or to group the drivers in some way.

What can I help you build?
0 / 2000
Gemini can make mistakes so double-check it and u

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

the chart showing the trip status by driver is not very clear or insightful in its current form, we can't really gain any insights from it that would help create a positive business impact. To get to a point where we can have a positive business impact, we would need to modify the chart to show more meaningful information. For example, we could:

Show the top 10 drivers with the most completed trips.
Show the top 10 drivers with the most cancelled trips.
Group the drivers by some other characteristic, such as the area they typically work in.
By doing this, we might be able to identify high-performing drivers that we can reward, or low-performing drivers that we can offer additional training or support to.



**Yes we get negative growth too .....

the chart showing the trip status by driver is not very clear or insightful in its current form, we can't really gain any insights from it that would help create a positive business impact. To get to a point where we can have a positive business impact, we would need to modify the chart to show more meaningful information. For example, we could:

Show the top 10 drivers with the most completed trips.
Show the top 10 drivers with the most cancelled trips.
Group the drivers by some other characteristic, such as the area they typically work in.
By doing this, we might be able to identify high-performing drivers that we can reward, or low-performing drivers that we can offer additional training or support to.



#### Chart - 4

In [None]:
df['request_hour'] = df['Request datetime'].dt.hour
plt.figure(figsize=(12, 6))
sns.countplot(x='request_hour', data=df)
plt.title('Number of Uber Requests per Hour')
plt.xlabel('Hour of the Day')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a count plot to visualize the number of Uber requests per hour because it's a very effective way to show the distribution of a discrete variable, in this case, the hour of the day. The height of each bar represents the total number of requests made during that hour, making it very easy to see which hours have the highest and lowest demand. This allows us to quickly identify the peak hours for Uber requests, which is a crucial piece of information for understanding the supply-demand gap.



##### 2. What is/are the insight(s) found from the chart?

The chart reveals a clear bimodal distribution of demand for Uber rides throughout the day. This means that there are two distinct periods of high demand, separated by a period of lower demand. This pattern is likely driven by the daily routines of a large portion of the population, specifically the morning and evening commutes. Understanding this bimodal distribution is crucial for Uber because it means that they need to be able to scale their supply of drivers up and down to meet these predictable peaks and troughs in demand. If they can't, they will either have a surplus of drivers during off-peak hours or a shortage of drivers during peak hours, both of which are bad for business.

What can I help you build?
0 / 2000
Gemini can make mistakes so double-check it and use code with cautio

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

 in the chart, can have a significant positive impact on the business. Here's a theoretical explanation of how:

Optimized Resource Allocation: By knowing when the peak demand hours are, Uber can more effectively allocate its resources (i.e., drivers). They can use this information to create incentives for drivers to be on the road during these high-demand periods, ensuring that there are enough cars available to meet customer needs. This leads to a more efficient use of their driver network.
Improved Customer Satisfaction: When a customer requests a ride during a peak hour and is able to get one quickly, their satisfaction with the service increases. By understanding the demand patterns, Uber can work to ensure that this is the case more often than not. This leads to a better customer experience and increased customer loyalty.
Increased Revenue: A more efficient allocation of drivers and higher customer satisfaction will naturally lead to an increase in the number of completed trips. This, in turn, will lead to an increase in revenue for Uber.
Proactive vs. Reactive: By understanding the predictable nature of the demand, Uber can move from a reactive model (i.e., responding to surges in demand as they happen) to a more proactive model (i.e., anticipating surges in demand and preparing for them in advance). This allows for a more stable and reliable service.
In essence, by understanding the rhythm of the city as reflected in the ride request data, Uber can fine-tune its operations to be more efficient, more customer-friendly, and more profitable.







The chart shows us that there are predictable peak hours of demand. If Uber is unable to meet this demand, it will lead to a negative customer experience. Customers who are unable to get a ride during these peak hours will be frustrated and may turn to other ride-sharing services. This can lead to:

Customer Churn: Losing existing customers to competitors.
Negative Word-of-Mouth: Unhappy customers are likely to share their negative experiences with others, which can damage Uber's reputation and make it harder to attract new customers.
Lost Revenue: Every unmet request is a loss of potential revenue.
So, while the chart itself is just a representation of demand, it's a clear warning sign. If Uber doesn't have a strategy to meet the demand during these peak hours, it will inevitably lead to negative growth.






#### Chart - 5

In [None]:
# Chart - 5 visualization code
plt.figure(figsize=(12, 6))
sns.countplot(x='request_hour', hue='Status', data=df)
plt.title('Trip Status by Hour of the Day')
plt.xlabel('Hour of the Day')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

A grouped bar chart is the ideal choice here for a few key theoretical reasons:

*   **Comparing a Categorical Variable Across Different Groups:** Our main goal is to compare the `Status` of trips (a categorical variable with three levels: "Trip Completed", "Cancelled", and "No Cars Available") across different hours of the day (a discrete numerical variable that we are treating as categories). A grouped bar chart is specifically designed for this purpose. It allows us to place the bars for each status side-by-side for each hour, making the comparison direct and intuitive.

*   **Clear Visual Comparison:** The grouped nature of the chart makes it very easy to visually compare the counts of each status for a given hour and also to see how these counts change across the day. For example, you can immediately see that the green bar ("No Cars Available") is much taller in the evening hours, while the orange bar ("Cancelled") is more prominent in the morning. This kind of direct visual comparison would be much harder with other chart types.

*   **Alternative Charts and Why They Are Less Suitable:**
    *   **Stacked Bar Chart:** A stacked bar chart would show the proportion of each status within each hour, but it would be difficult to compare the absolute number of, say, "Cancelled" trips between 8 AM and 9 AM.
    *   **Line Chart:** A line chart could be used to show the trend of each status over time, but it would be less effective at showing the absolute number of requests for each status at each hour.
    *   **Pie Charts:** We could create 24 separate pie charts, one for each hour. However, it's well-known that humans are not very good at comparing angles in pie charts, so it would be difficult to accurately compare the proportions of each status between different hours. Also, pie charts don't show the absolute numbers, only the proportions.

In summary, the grouped bar chart is the most effective choice because it directly supports our analytical goal of comparing the distribution of a categorical variable across different groups, and it does so in a way that is clear, intuitive, and visually effective.

A grouped bar chart is the ideal choice here for a few key theoretical reasons:

*   **Comparing a Categorical Variable Across Different Groups:** Our main goal is to compare the `Status` of trips (a categorical variable with three levels: "Trip Completed", "Cancelled", and "No Cars Available") across different hours of the day (a discrete numerical variable that we are treating as categories). A grouped bar chart is specifically designed for this purpose. It allows us to place the bars for each status side-by-side for each hour, making the comparison direct and intuitive.

*   **Clear Visual Comparison:** The grouped nature of the chart makes it very easy to visually compare the counts of each status for a given hour and also to see how these counts change across the day. For example, you can immediately see that the green bar ("No Cars Available") is much taller in the evening hours, while the orange bar ("Cancelled") is more prominent in the morning. This kind of direct visual comparison would be much harder with other chart types.

*   **Alternative Charts and Why They Are Less Suitable:**
    *   **Stacked Bar Chart:** A stacked bar chart would show the proportion of each status within each hour, but it would be difficult to compare the absolute number of, say, "Cancelled" trips between 8 AM and 9 AM.
    *   **Line Chart:** A line chart could be used to show the trend of each status over time, but it would be less effective at showing the absolute number of requests for each status at each hour.
    *   **Pie Charts:** We could create 24 separate pie charts, one for each hour. However, it's well-known that humans are not very good at comparing angles in pie charts, so it would be difficult to accurately compare the proportions of each status between different hours. Also, pie charts don't show the absolute numbers, only the proportions.

In summary, the grouped bar chart is the most effective choice because it directly supports our analytical goal of comparing the distribution of a categorical variable across different groups, and it does so in a way that is clear, intuitive, and visually effective.

##### 2. What is/are the insight(s) found from the chart?

A correlation heatmap is a graphical representation of data where the individual values contained in a matrix are represented as colors. In this case, the heatmap shows the correlation between the numerical columns in your dataset.

Here are the insights from this chart:

Weak Positive Correlation: There's a weak positive correlation (0.011) between request_hour and Request id. This is expected since the Request id likely increases as the day progresses.
Very Weak Negative Correlation: There's a very weak negative correlation (-0.011) between Driver id and request_hour. This might suggest some drivers are less active during certain hours, but the correlation is too weak to be significant.
No Strong Correlations: Overall, the heatmap shows no strong correlations between the numerical variables in your dataset.
This lack of strong correlation suggests that you'll need to look at the relationships between your categorical variables (like Status and Pickup point) and your numerical variables to find more meaningful insights.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Answer Here

The chart reveals a clear bimodal distribution of demand for Uber rides throughout the day. This means that there are two distinct periods of high demand, separated by a period of lower demand. This pattern is likely driven by the daily routines of a large portion of the population, specifically the morning and evening commutes. Understanding this bimodal distribution is crucial for Uber because it means that they need to be able to scale their supply of drivers up and down to meet these predictable peaks and troughs in demand. If they can't, they will either have a surplus of drivers during off-peak hours or a shortage of drivers during peak hours, both of which are bad for business.





The chart shows us that there are predictable peak hours of demand. If Uber is unable to meet this demand, it will lead to a negative customer experience. Customers who are unable to get a ride during these peak hours will be frustrated and may turn to other ride-sharing services. This can lead to:

Customer Churn: Losing existing customers to competitors.
Negative Word-of-Mouth: Unhappy customers are likely to share their negative experiences with others, which can damage Uber's reputation and make it harder to attract new customers.
Lost Revenue: Every unmet request is a loss of potential revenue.
So, while the chart itself is just a representation of demand, it's a clear warning sign. If Uber doesn't have a strategy to meet the demand during these peak hours, it will inevitably lead to negative growth.

Colab

#### Chart - 6

In [None]:
# Chart - 6 visualization code
df['request_hour'] = df['Request datetime'].dt.hour
plt.figure(figsize=(15, 6))
sns.catplot(x='request_hour', hue='Status', col='Pickup point', data=df, kind='count')
plt.suptitle('Trip Status by Pickup Point and Hour of the Day', y=1.05)
plt.show()

##### 1. Why did you pick the specific chart?

I chose a catplot with kind='count' to create a faceted bar chart. This allows us to see the distribution of trip statuses for each hour of the day, separated by pickup point. This is a very effective way to compare the supply-demand gap at the airport and in the city.




##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the "No Cars Available" status is a major problem at the airport, especially during the evening peak hours. This is likely because there are not enough drivers who are willing to go to the airport to pick up passengers. In the city, the main problem is "Cancelled" trips, which are most common during the morning peak hours. This could be because drivers are more selective with their trips during the morning rush hour, or because they are trying to avoid traffic.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

Identifying the Core Problem: The data clearly shows that the main problem is the high number of unfulfilled requests ("No Cars Available" and "Cancelled"). This is a crucial first step. You can't solve a problem until you know what it is.
Targeted Solutions: Now that we know the problem, we can start to look for solutions. For example, if we see that there are a lot of "No Cars Available" requests in a certain area at a certain time, Uber can offer incentives to drivers to go to that area. If we see that a lot of drivers are canceling trips, we can investigate why and try to address their concerns.
Improved Customer Experience: By reducing the number of unfulfilled requests, we can make customers happier. This will lead to more repeat business and a better reputation for Uber.
Increased Revenue: Every completed trip is revenue for Uber. By reducing the number of unfulfilled requests, we can increase the number of completed trips and, therefore, increase revenue.
In short, the insights we've gained allow us to move from simply knowing that there's a problem to understanding the nature of the problem and developing targeted solutions to fix it.


**Yes there is negative growth**
the insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.




#### Chart - 7

In [None]:
# Chart - 7 visualization code
def get_time_slot(hour):
    if 5 <= hour < 12:
        return 'Morning'
    elif 12 <= hour < 17:
        return 'Afternoon'
    elif 17 <= hour < 22:
        return 'Evening'
    else:
        return 'Night'

df['time_slot'] = df['request_hour'].apply(get_time_slot)

plt.figure(figsize=(10, 6))
sns.countplot(x='time_slot', data=df)
plt.title('Number of Uber Requests per Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Number of Requests')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a count plot to visualize the number of Uber requests per time slot because it's a very effective way to show the distribution of a categorical variable, in this case, the time slot. The height of each bar represents the total number of requests made during that time slot, making it very easy to see which time slots have the highest and lowest demand.



##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the "Morning" and "Evening" time slots have the highest demand for Uber rides. This is consistent with the bimodal distribution of demand that we saw in the hourly chart. The "Afternoon" and "Night" time slots have lower demand.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

The insights we've gained can have a significant positive impact on the business. Here's how:

Identifying the Core Problem: The data clearly shows that the main problem is the high number of unfulfilled requests ("No Cars Available" and "Cancelled"). This is a crucial first step. You can't solve a problem until you know what it is.
Targeted Solutions: Now that we know the problem, we can start to look for solutions. For example, if we see that there are a lot of "No Cars Available" requests in a certain area at a certain time, Uber can offer incentives to drivers to go to that area. If we see that a lot of drivers are canceling trips, we can investigate why and try to address their concerns.
Improved Customer Experience: By reducing the number of unfulfilled requests, we can make customers happier. This will lead to more repeat business and a better reputation for Uber.
Increased Revenue: Every completed trip is revenue for Uber. By reducing the number of unfulfilled requests, we can increase the number of completed trips and, therefore, increase revenue.
In short, the insights we've gained allow us to move from simply knowing that there's a problem to understanding the nature of the problem and developing targeted solutions to fix it.



#### Chart - 8

In [None]:
# Chart - 8 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='time_slot', hue='Status', data=df)
plt.title('Trip Status by Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Count')

##### 1. Why did you pick the specific chart?

I chose a grouped bar chart to visualize the trip status by time slot because it allows for a direct comparison of the number of completed, cancelled, and unavailable trips in each time slot. This helps us to see how the supply-demand gap changes throughout the day.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the "No Cars Available" status is most prominent during the evening, while "Cancelled" trips are most common in the morning. This is consistent with what we saw in the hourly chart.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help Uber to develop targeted solutions to address the supply-demand gap at different times of the day. For example, they could offer incentives to drivers to be on the road during the evening peak hours, and they could investigate the reasons for the high number of cancellations during the morning peak hours. By addressing these issues, Uber can improve the customer experience and increase revenue.

** There are negative growth**

The insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.

#### Chart - 9

In [None]:
# Chart - 9 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='time_slot', hue='Pickup point', data=df)
plt.title('Trip Status by Time Slot and Pickup Point')
plt.xlabel('Time Slot')
plt.ylabel('Count')

##### 1. Why did you pick the specific chart?

I chose a grouped bar chart to visualize the number of requests per time slot, broken down by pickup point. This allows for a direct comparison of the demand at the airport and in the city at different times of the day.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the demand for Uber rides is higher in the city than at the airport across all time slots. The demand in the city is highest in the morning and evening, while the demand at the airport is highest in the evening.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight can help Uber to allocate its resources more effectively. For example, they can ensure that there are more drivers available in the city during the morning and evening peak hours, and more drivers available at the airport during the evening peak hours.


Yes, the insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.





#### Chart - 10

In [None]:
# Chart - 10 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='Status', hue='Pickup point', data=df)
plt.title('Trip Status by Pickup Point')
plt.xlabel('Pickup Point')
plt.ylabel('Count')

##### 1. Why did you pick the specific chart?

I chose a grouped bar chart to visualize the trip status by pickup point. This allows for a direct comparison of the number of completed, cancelled, and unavailable trips at the airport and in the city.



##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the number of "No Cars Available" is much higher at the airport than in the city. This suggests that there is a shortage of drivers at the airport. On the other hand, the number of "Cancelled" trips is much higher in the city than at the airport. This could be because drivers in the city are more likely to cancel trips due to traffic or other reasons.



##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight can help Uber to develop targeted solutions to address the specific problems in each location. For example, they could offer incentives to drivers to go to the airport, and they could investigate the reasons for the high number of cancellations in the city.


Yes, the insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.



#### Chart - 11

In [None]:
# Chart - 11 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='Status', hue='time_slot', data=df)
plt.title('Trip Status by Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Count')

##### 1. Why did you pick the specific chart?

I chose a grouped bar chart to visualize the trip status by time slot. This allows for a direct comparison of the number of completed, cancelled, and unavailable trips in each time slot.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the "No Cars Available" status is most prominent during the evening, while "Cancelled" trips are most common in the morning. This is consistent with what we saw in the hourly chart.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help Uber to develop targeted solutions to address the supply-demand gap at different times of the day. For example, they could offer incentives to drivers to be on the road during the evening peak hours, and they could investigate the reasons for the high number of cancellations during the morning peak hours. By addressing these issues, Uber can improve the customer experience and increase revenue.

Yes, the insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.



#### Chart - 12

In [None]:
# Chart - 12 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='Status', hue='request_hour', data=df)
plt.title('Trip Status by Request Hour')
plt.xlabel('Request Hour')
plt.ylabel('Count')

##### 1. Why did you pick the specific chart?

I chose a grouped bar chart to visualize the trip status by request hour. This allows for a direct comparison of the number of completed, cancelled, and unavailable trips at each hour of the day.

##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the number of "No Cars Available" is highest during the evening peak hours, while the number of "Cancelled" trips is highest during the morning peak hours. This is consistent with what we saw in the previous charts.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

This insight can help Uber to develop targeted solutions to address the supply-demand gap at different times of the day. For example, they could offer incentives to drivers to be on the road during the evening peak hours, and they could investigate the reasons for the high number of cancellations during the morning peak hours. By addressing these issues, Uber can improve the customer experience and increase revenue.

Yes, the insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.



#### Chart - 13

In [None]:
# Chart - 13 visualization code
plt.figure(figsize=(10, 6))
sns.countplot(x='Status', hue='time_slot', data=df)
plt.title('Trip Status by Time Slot')
plt.xlabel('Time Slot')
plt.ylabel('Count')

##### 1. Why did you pick the specific chart?

I chose a grouped bar chart to visualize the trip status by time slot. This allows for a direct comparison of the number of completed, cancelled, and unavailable trips in each time slot.



##### 2. What is/are the insight(s) found from the chart?

The chart reveals that the "No Cars Available" status is most prominent during the evening, while "Cancelled" trips are most common in the morning. This is consistent with what we saw in the hourly chart.

##### 3. Will the gained insights help creating a positive business impact?
Are there any insights that lead to negative growth? Justify with specific reason.

These insights can help Uber to develop targeted solutions to address the supply-demand gap at different times of the day. For example, they could offer incentives to drivers to be on the road during the evening peak hours, and they could investigate the reasons for the high number of cancellations during the morning peak hours. By addressing these issues, Uber can improve the customer experience and increase revenue.

Yes, the insights that point to negative growth are the high numbers of "Cancelled" trips and "No Cars Available". Here's the justification:

Lost Revenue: Every time a trip is cancelled or a car is unavailable, Uber loses out on the revenue it would have earned from that trip. This has a direct and immediate negative impact on the company's bottom line.
Poor Customer Experience: When a customer's request is cancelled or they can't get a car, it creates a frustrating and negative experience. This can lead to them choosing a competitor for their next ride, resulting in customer churn and a loss of future revenue.
Damaged Brand Reputation: If customers consistently have a bad experience with Uber, it will damage the company's brand reputation. This can make it harder to attract new customers and retain existing ones, leading to long-term negative growth.
In essence, the high number of unfulfilled requests is a clear indicator of a problem that is actively harming the business and preventing it from growing.



#### Chart - 14 - Correlation Heatmap

In [None]:
# Correlation Heatmap visualization code
plt.figure(figsize=(10, 6))
sns.heatmap(df.select_dtypes(include=np.number).corr(), annot=True, cmap='coolwarm')
plt.title('Correlation Heatmap')
plt.show()

##### 1. Why did you pick the specific chart?

I chose a correlation heatmap to visualize the correlation between all the numerical columns in the dataframe. This chart makes it easy to see which variables are correlated with each other, and how strong the correlation is.



##### 2. What is/are the insight(s) found from the chart?

The correlation heatmap shows that there is a weak positive correlation between request_hour and Request id. This is expected, as the request ID is likely to increase as the day goes on. There is also a very weak negative correlation between Driver id and request_hour. This could indicate that some drivers are less likely to be on the road during certain hours, but the correlation is too weak to be significant. Overall, there are no strong correlations between the numerical variables in the dataset.



#### Chart - 15 - Pair Plot

In [None]:
# Pair Plot visualization code
sns.pairplot(df.select_dtypes(include=np.number))
plt.show()

##### 1. Why did you pick the specific chart?

I chose a pair plot to visualize the relationships between all the numerical variables in the dataset. This chart allows us to see both the distribution of each variable and the relationships between them.

##### 2. What is/are the insight(s) found from the chart?

The pair plot confirms that there are no strong linear relationships between the numerical variables in the dataset. The distributions of the variables are also not very informative.

## **5. Solution to Business Objective**

#### What do you suggest the client to achieve Business Objective ?
Explain Briefly.

Based on the analysis, here are some suggestions to achieve the business objective:

Address the shortage of cars at the airport in the evening:
Offer incentives to drivers to go to the airport during the evening peak hours. This could include higher fares, bonuses for completing a certain number of airport trips, or a guaranteed minimum earning for a certain number of hours.
Provide drivers with real-time information about the demand at the airport, so that they can make an informed decision about whether or not to go there.
Address the high number of cancellations in the city in the morning:
Investigate the reasons for the high number of cancellations. This could be done by surveying drivers or by analyzing the data to see if there are any patterns in the cancellations.
Provide drivers with more information about a trip before they accept it, such as the destination and the estimated fare. This will help them to make a more informed decision about whether or not to accept the trip.
Implement a penalty for drivers who cancel trips without a valid reason. This will discourage drivers from cancelling trips unnecessarily.
Improve the overall customer experience:
Provide customers with more accurate information about the estimated time of arrival of their ride.
Offer customers a discount or a free ride if their trip is cancelled or if they have to wait for a long time for a car.
Implement a system to track customer feedback and use it to improve the service.


# **Conclusion**

The analysis of the Uber request data has revealed a significant supply-demand gap, which is a major problem for the business. The key findings are:

There is a high rate of unfulfilled requests, with a large number of "Cancelled" trips and "No Cars Available".
The nature of the problem is different in the city and at the airport. In the city, the main problem is "Cancelled" trips, especially during the morning peak hours. At the airport, the main problem is "No Cars Available", especially during the evening peak hours.
Based on these findings, we have recommended a number of targeted solutions to address the supply-demand gap. These include offering incentives to drivers to go to the airport, investigating the reasons for cancellations in the city, and improving the overall customer experience. By implementing these solutions, Uber can reduce the number of unfulfilled requests, improve customer satisfaction, and increase revenue.

### ***Hurrah! You have successfully completed your EDA Capstone Project !!!***