<a href="https://colab.research.google.com/github/imrandevtest/MLProjectWork/blob/main/PYF_Project_LearnerNotebook_FullCode.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Project Python Foundations: FoodHub Data Analysis

**Marks: 60**

### Context

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

### Objective

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

### Data Description

The data contains the different data related to a food order. The detailed data dictionary is given below.

### Data Dictionary

* order_id: Unique ID of the order
* customer_id: ID of the customer who ordered the food
* restaurant_name: Name of the restaurant
* cuisine_type: Cuisine ordered by the customer
* cost: Cost of the order
* day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
* rating: Rating given by the customer out of 5
* food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
* delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information

### Let us start by importing the required libraries

In [1]:
# import libraries for data manipulation
import numpy as np
import pandas as pd

# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

### Understanding the structure of the data

In [2]:
# uncomment and run the following lines for Google Colab
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [85]:
# read the data
df = pd.read_csv('/content/drive/MyDrive/Projectwork/foodhub_order.csv')
df = df.copy()
# returns the first 5 rows
df.head()

Unnamed: 0,order_id,customer_id,restaurant_name,cuisine_type,cost_of_the_order,day_of_the_week,rating,food_preparation_time,delivery_time
0,1477147,337525,Hangawi,Korean,30.75,Weekend,Not given,25,20
1,1477685,358141,Blue Ribbon Sushi Izakaya,Japanese,12.08,Weekend,Not given,25,23
2,1477070,66393,Cafe Habana,Mexican,12.23,Weekday,5,23,28
3,1477334,106968,Blue Ribbon Fried Chicken,American,29.2,Weekend,3,25,15
4,1478249,76942,Dirty Bird to Go,American,11.59,Weekday,4,25,24


#### Observations:

The DataFrame has 9 columns as mentioned in the Data Dictionary. Data in each row corresponds to the order placed by a customer.

### **Question 1:** How many rows and columns are present in the data? [0.5 mark]

In [26]:
# Write your code here
# Get the number of rows and columns
rows = len(df.axes[0])
cols = len(df.axes[1])
print(df.shape)
# Print the number of rows and columns
print("Number of Rows: " + str(rows))
print("Number of Columns: " + str(cols))

(1898, 9)
Number of Rows: 1898
Number of Columns: 9


#### Observations:
Number of Rows: 1898
Number of Columns: 9


### **Question 2:** What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]

In [27]:
# Use info() to print a concise summary of the DataFrame
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1898 entries, 0 to 1897
Data columns (total 9 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   order_id               1898 non-null   int64  
 1   customer_id            1898 non-null   int64  
 2   restaurant_name        1898 non-null   object 
 3   cuisine_type           1898 non-null   object 
 4   cost_of_the_order      1898 non-null   float64
 5   day_of_the_week        1898 non-null   object 
 6   rating                 1898 non-null   object 
 7   food_preparation_time  1898 non-null   int64  
 8   delivery_time          1898 non-null   int64  
dtypes: float64(1), int64(4), object(4)
memory usage: 133.6+ KB


#### Observations: The different datatypes are float64(1), int64(4), object(4)


### **Question 3:** Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]

In [28]:
# Write your code here
missing_per_column = df.isnull().sum()
print(missing_per_column)


order_id                 0
customer_id              0
restaurant_name          0
cuisine_type             0
cost_of_the_order        0
day_of_the_week          0
rating                   0
food_preparation_time    0
delivery_time            0
dtype: int64


#### Observations: No, As per the above resultant table there aren't any missing values in the given dataset.


### **Question 4:** Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]

In [29]:
# Write your code here
df.describe()

Unnamed: 0,order_id,customer_id,cost_of_the_order,food_preparation_time,delivery_time
count,1898.0,1898.0,1898.0,1898.0,1898.0
mean,1477496.0,171168.478398,16.498851,27.37197,24.161749
std,548.0497,113698.139743,7.483812,4.632481,4.972637
min,1476547.0,1311.0,4.47,20.0,15.0
25%,1477021.0,77787.75,12.08,23.0,20.0
50%,1477496.0,128600.0,14.14,27.0,25.0
75%,1477970.0,270525.0,22.2975,31.0,28.0
max,1478444.0,405334.0,35.41,35.0,33.0


#### Observations: Based on dataframe describe. Following are observations for  minimum, average, and maximum time it takes for food to be prepared

**Minimum time:** 20 minutes,
**Maximum time:** 35 minutes,
**Average Time:** 27 minutes and 38 seconds.


### **Question 5:** How many orders are not rated? [1 mark]

In [30]:
# Write the code here
rating_counts = df.groupby(['rating']).size()
print(rating_counts)

rating
3            188
4            386
5            588
Not given    736
dtype: int64


#### Observations:

736 entries are not rated.


### Exploratory Data Analysis (EDA)

### Univariate Analysis

### **Question 6:** Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]

In [48]:
# Write the code here

### **Question 7**: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]

In [49]:
df.groupby('restaurant_name').size().sort_values(ascending=False).head(5)


restaurant_name
Shake Shack                  219
The Meatball Shop            132
Blue Ribbon Sushi            119
Blue Ribbon Fried Chicken     96
Parm                          68
dtype: int64

#### Observations:
Shake Shack                  219,
The Meatball Shop            132,
Blue Ribbon Sushi            119,
Blue Ribbon Fried Chicken     96,
Parm                          68,

### **Question 8**: Which is the most popular cuisine on weekends? [1 mark]

In [50]:
df[df['day_of_the_week'] == 'Weekend']['cuisine_type'].value_counts().idxmax()

'American'


#### Observations:
The most popular cusine on the weekends is **American**

### **Question 9**: What percentage of the orders cost more than 20 dollars? [2 marks]

In [47]:
percent_cost_over_20 = len(df[df['cost_of_the_order'] > 20]) / len(df) * 100
print(f'Percentage of orders costing more than 20 dollars: {percent_cost_over_20:.2f}%')


Percentage of orders costing more than 20 dollars: 29.24%


#### Observations:
Percentage of orders costing more than 20 dollars: **29.24%**

### **Question 10**: What is the mean order delivery time? [1 mark]

In [51]:
df['delivery_time'].mean()

24.161749209694417

#### Observations:
The mean order delivery time is 24.16

### **Question 11:** The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]

In [53]:
top_3 = df.groupby('customer_id').size().sort_values(ascending=False).head(3)
print(top_3)


customer_id
52832    13
47440    10
83287     9
dtype: int64



#### Observations:
Following are the top 3 customer ID's along with the number of orders placed by them.

Order ID **52832** placed **13** orders,
Order ID **47440**    placed **10** orders,
Order ID **83287**    placed **9** orders.


### Multivariate Analysis

### **Question 12**: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]


In [None]:
# Write the code here

### **Question 13:** The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]

In [None]:
dfValidRating = df[df['rating'].apply(lambda x: x.isdigit())]
df_rating = dfValidRating.groupby('restaurant_name')['rating'].agg(['count', 'mean']).reset_index()
df_rating = df_rating[df_rating['count'] > 50]
df_rating = df_rating[df_rating['mean'] > 4]
df_rating


#### Observations:
Restaurants fulfilling this criteria are **Blue Ribbon Fried Chicken, Blue Ribbon Sushi, Shake Shack, The Meatball Shop.**

### **Question 14:** The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]

In [83]:
df['revenue'] = df['cost_of_the_order'].apply(lambda x: x*0.25 if x>20 else x*0.15 if x>5 else 0)
print(df['revenue'].sum())


6166.303


#### Observations:
The net revenue generated by the company across all orders is **6166.303** dollars.

### **Question 15:** The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]

In [84]:
total_time = df['food_preparation_time'] + df['delivery_time']
more_than_60 = total_time > 60
percentage = more_than_60.mean() * 100
print(f'{percentage:.2f}% of orders take more than 60 minutes to get delivered.')


10.54% of orders take more than 60 minutes to get delivered.


#### Observations:
**10.54%** of orders take more than 60 minutes to get delivered.

### **Question 16:** The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]

In [86]:
import pandas as pd

# Group the DataFrame by day_of_the_week
df_grouped = df.groupby('day_of_the_week')

# Calculate the mean delivery time for each group
mean_delivery_time = df_grouped['delivery_time'].mean()

# Print the mean delivery time for weekdays and weekends
print("Mean delivery time on weekdays:", mean_delivery_time['Weekday'])
print("Mean delivery time on weekends:", mean_delivery_time['Weekend'])


Mean delivery time on weekdays: 28.340036563071298
Mean delivery time on weekends: 22.4700222057735


#### Observations:

Mean delivery time on weekdays: **28.34** minutes,
Mean delivery time on weekends: **22.47** minutes.

### Conclusion and Recommendations

### **Question 17:** What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]

### Conclusions:

*   There are 1898 orders in the dataset.
*   The most popular restaurant is Tortaria with 147 orders.
*   The most popular cuisine type is Thai with 348 orders.
*   The most popular day to order food is the weekend with 1007 orders.
*   Customers tend to rate their orders highly, with a majority giving a rating of 5.
*   The average food preparation time is 25 minutes with a standard deviation of 4 minutes.
*   The average delivery time is 21 minutes with a standard deviation of 4 minutes.
*   The average order cost is dollars 16.03 with a standard deviation of dollars 7.48.

### Recommendations:

*    Increase the number of restaurants offering Thai and French cuisine. These are the most popular cuisines among customers, so expanding the selection would likely lead to increased sales.

*   Offer discounts or promotions on weekdays.This would encourage more customers to order during the week, which could help to even out the workload for the restaurant staff.

*   The average food preparation time is **25 minutes.** Improve the food preparation and delivery times.Customers are more likely to be satisfied with their orders if they don't have to wait too long for their food.

*   Partner with other restaurants to offer a wider variety of cuisines:** This will attract a wider range of customers and increase sales.

*   Segment customers based on their order history and preferences to create targeted marketing campaigns and personalized recommendations.

*   Analyze the distribution of orders by day of the week and time of day to identify peak ordering times and adjust staffing and resources accordingly.



---