# Project Foundations for Data Science: FoodHub Data Analysis

**Marks: 60**

### Context

The number of restaurants in New York is increasing day by day. Lots of students and busy professionals rely on those restaurants due to their hectic lifestyles. Online food delivery service is a great option for them. It provides them with good food from their favorite restaurants. A food aggregator company FoodHub offers access to multiple restaurants through a single smartphone app.

The app allows the restaurants to receive a direct online order from a customer. The app assigns a delivery person from the company to pick up the order after it is confirmed by the restaurant. The delivery person then uses the map to reach the restaurant and waits for the food package. Once the food package is handed over to the delivery person, he/she confirms the pick-up in the app and travels to the customer's location to deliver the food. The delivery person confirms the drop-off in the app after delivering the food package to the customer. The customer can rate the order in the app. The food aggregator earns money by collecting a fixed margin of the delivery order from the restaurants.

### Objective

The food aggregator company has stored the data of the different orders made by the registered customers in their online portal. They want to analyze the data to get a fair idea about the demand of different restaurants which will help them in enhancing their customer experience. Suppose you are hired as a Data Scientist in this company and the Data Science team has shared some of the key questions that need to be answered. Perform the data analysis to find answers to these questions that will help the company to improve the business.

### Data Description

The data contains the different data related to a food order. The detailed data dictionary is given below.

### Data Dictionary

* order_id: Unique ID of the order
* customer_id: ID of the customer who ordered the food
* restaurant_name: Name of the restaurant
* cuisine_type: Cuisine ordered by the customer
* cost: Cost of the order
* day_of_the_week: Indicates whether the order is placed on a weekday or weekend (The weekday is from Monday to Friday and the weekend is Saturday and Sunday)
* rating: Rating given by the customer out of 5
* food_preparation_time: Time (in minutes) taken by the restaurant to prepare the food. This is calculated by taking the difference between the timestamps of the restaurant's order confirmation and the delivery person's pick-up confirmation.
* delivery_time: Time (in minutes) taken by the delivery person to deliver the food package. This is calculated by taking the difference between the timestamps of the delivery person's pick-up confirmation and drop-off information

### <u>Importing the required libraries</u>

In [None]:
# Importing library to supress warnings
import warnings
warnings.filterwarnings('ignore')

# Importing libraries for reading data and data manipulation
import numpy as np
import pandas as pd

# Importing libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### <u>Loading the dataset in dataframe and understanding the structure of the data</u>

In [None]:
# Reading data from csv file to a dataframe
df = pd.read_csv('foodhub_order.csv')

# Returns first 5 rows from the dataframe
df.head()

FileNotFoundError: ignored

#### Observations:

* The DataFrame has 9 columns as mentioned in the Data Dictionary. Data in each row corresponds to the order placed by a customer.

### **Question 1:** How many rows and columns are present in the data? [0.5 mark]

In [None]:
# Checking number of rows and columns in the dataset
df.shape

#### Observations:
* There are 1898 rows and 9 columns present in the dataset.

### **Question 2:** What are the datatypes of the different columns in the dataset? (The info() function can be used) [0.5 mark]

In [None]:
# Using info() to check the datatypes of the columns and give concise summary of the DataFrame
df.info()

#### Observations:
* There are 1 float type column, 4 integer type columns and 4 object type columns in the dataset.

### **Question 3:** Are there any missing values in the data? If yes, treat them using an appropriate method. [1 mark]

In [None]:
# Using isna() to find if any column in the data has missing values.
df.isna().sum()

#### Observations:
* There are no missing values in the data.

### **Question 4:** Check the statistical summary of the data. What is the minimum, average, and maximum time it takes for food to be prepared once an order is placed? [2 marks]

In [None]:
# Using describe() to get descriptive stastical summary of the numerical data
df.describe()

#### Observations:
* The minimum, average and maximum time it takes for the food to be prepared are 20 minutes, 27.37 minutes and 35 minutes respectively.
* Total 1898 orders have been placed by the customers.
* Average cost of the order is 16.49. The minimum cost of the order is 4.47 and maximum is 35.41.
* Average order delivery time is 24.16 minutes. The fastest delivery time is 15 minutes and the longest delivery time it took is 33 minutes.

In [None]:
# Using describe() to get descriptive stastical summary of the object type data
df.describe(include=['O'])

#### Observations:
* There are 178 restaurants available to order from on the FoodHub app.  
* The most famous restaurant is Shake Shack with count of 219 orders out of 1898 orders.
* There are 14 unique cuisines available to order from.
* The most ordered cuisine is American cuisine with 584 orders out of 1898 orders.
* Most orders are placed on Weekends.Total number of orders placed on weekends is 1351.
* 736 orders have not been rated by the customers.  

### **Question 5:** How many orders are not rated? [1 mark]

In [None]:
# Using value_counts() to find the count of unique values present in rating column
df['rating'].value_counts()

In [None]:
# Finding the percentage of orders that have not been rated
df['rating'].value_counts(normalize=True)

#### Observations:
* Out of 1898 orders, 736 orders have not been rated.
* In terms of percentage, 38.77% of orders have not been given any rating which is a significantly high number.
* No order has been given rating of 1 and 2.
* 30.98% of orders have been rated 5 by the customers followed by 20.33% orders with rating of 4.

### <u>Exploratory Data Analysis (EDA)</u>

### Univariate Analysis

### **Question 6:** Explore all the variables and provide observations on their distributions. (Generally, histograms, boxplots, countplots, etc. are used for univariate exploration.) [9 marks]

### >> Observations on Order ID

In [None]:
# Performing univariate analysis on order ID
df['order_id'].nunique()

**Observation:**
* There are 1898 unique order ID for each order placed by the customers.

### >> Observations on Customer ID

In [None]:
# Finding number of unique customer ID
df['customer_id'].nunique()

In [None]:
# Finding number of customers who placed 1 or more orders
custid = pd.DataFrame(df['customer_id'].value_counts())
custid.value_counts()

In [None]:
# Finding the top customers
df['customer_id'].value_counts()

In [None]:
# Plotting bar chart for count of top 10 customer ID
df['customer_id'].value_counts().head(10).plot.bar()
plt.xlabel('Customer ID')
plt.ylabel('Number of orders placed')
plt.show()

**Observation:**
* 1200 registered customers have placed orders using FoodHub app.
* 784 customers placed 1 order on FoodHub app followed by 267 customers who placed 2 orders.
* Customer ID 52832 is the top customer who has placed 13 orders using FoodHub app.
* Top 5 most frequent customers are customer ID 52832, 47440, 83287, 250494 and 259341.

### >> Observations on Restaurant Name

In [None]:
# Checking number of unique restaurant name
df['restaurant_name'].nunique()

In [None]:
# Checking the number of orders placed with each restaurant
df['restaurant_name'].value_counts()

In [None]:
# Checking the percentage of orders placed with each restaurant
df['restaurant_name'].value_counts(normalize=True)

In [None]:
# Plotting countplot for top 10 restaurants
sns.countplot(x = df['restaurant_name'], order = df.restaurant_name.value_counts().iloc[:10].index)
plt.xticks(rotation=90)
plt.xlabel("Restaurant Name")
plt.ylabel("Order Count")
plt.show()

**Observations:**
* There are 178 restaurants available on the FoodHub app for ordering.
* Shake Shack is the most popular restaurant with 219 (11.5%) total order received.
* The top 5 restaurants are Shake Shack, The Meatball Shop, Blue Ribbon Sushi, Blue Ribbon Fried Chicken and Parm accounting for 33.4% of the total orders placed.

### >> Observations on Cuisine Type

In [None]:
# Checking number of unique cuisine types available for ordering
df['cuisine_type'].nunique()

In [None]:
# Checking the number of orders placed for each cuisine type
df['cuisine_type'].value_counts()

In [None]:
# Checking the percentage of orders placed for each cuisine type
df['cuisine_type'].value_counts(normalize=True)

In [None]:
# Checking distribution of cuisine_type on countplot
plt.figure(figsize=(15,6))
sns.countplot(data=df, x='cuisine_type')
plt.xticks(rotation=90)
plt.show()

**Observations:**
* There are 14 unique cuisine types available on the FoodHub app.
* Top 4 cusine types are American, Japanese, Italian and Chinese. 82.56% of all orders are placed among these 4 cusine types.
* American cuisine is the most famous with 30.77% of all orders placed followed by Japanese cusine with 24.76% order share.

### >> Observations on Cost of the order

In [None]:
# Plotting Histogram to understand the distribution of cost of the order
plt.axvline(df['cost_of_the_order'].mean(), color='r')
sns.histplot(data = df, x = 'cost_of_the_order', kde = True)
plt.show()

# Plotting Boxplot to further understand the distribution of cost of the order
sns.boxplot(x='cost_of_the_order',data=df)
plt.show()

**Observations:**
* The distribution is skewed towards right.
* Average order cost is around 16 USD.
* 50% of orders cost below 14 dollar and 75% of the orders cost below 22 dollar.
* The minimum value of the order placed is around 4 dollar and the maximum value is around 35 dollar.

### >> Observations on Day of the week

In [None]:
# Checking number of unique values present in day_of_the_week column
df['day_of_the_week'].nunique()

In [None]:
# Checking the unique values present in day_of_the_week column
df['day_of_the_week'].unique()

In [None]:
# Checking total number of orders placed on weekend and weekdays
df['day_of_the_week'].value_counts()

In [None]:
# Checking percentage of orders placed on weekend and weekdays
df['day_of_the_week'].value_counts(normalize=True)

In [None]:
# Checking distribution of number of orders placed on weekdays and weekends on countplot
sns.countplot(data=df, x='day_of_the_week')
plt.show()

**Observations:**
* Most of the orders are placed on Weekends(Saturday and Sunday).
* 71.2% (around 1350) of the orders are placed on Weekends and 28.8% (around 550) of the orders are placed on Weekdays.

### >> Observations on Rating

In [None]:
# Checking percentage of orders based on rating given by the customers
df['rating'].value_counts(normalize=True)

In [None]:
# Checking distribution of ratings given by the customers using countplot
sns.countplot(x='rating', data=df)
plt.show()

**Observations:**
* Around 740 orders (38.77%) were not given any rating by customers. This number is significantly high and need to be throughly analyzed to understand the best possible way to handle this data.
* Around 580 orders (30.98%) were given rating of 5 and around 400 (20.33%) orders were rated 4.
* 190 orders were given rating of 3 which accounts for 9% of the total order.
* None of the orders received rating of 1 or 2.

### >> Observations on Food Preparation Time

In [None]:
# Plotting Boxplot to understand the distribution of the food preparation time
sns.boxplot(x='food_preparation_time',data=df)
plt.show()

**Observations:**
* The distribution of food preparation time is symmetrical.
* Minimum time taken to prepare an order is 20 minutes and maximum is 35 minute.
* On an average an order takes 27 minutes to be prepared.
* 75% of the orders take 20-31 minutes to be prepared.

### >> Observations on Delivery time

In [None]:
# Plotting Boxplot to understand the distribution of the food delivery time
sns.boxplot(x='delivery_time',data=df)
plt.show()

**Observations:**
* The distribution is a little left skewed.
* Average time taken by the driver for delivery is 25 minutes.
* The fastest food delivery took 15 minutes and the maximum time taken for delivery is 33 minutes.
* 75% of the orders are delivered between 15-28 minutes.

### **Question 7**: Which are the top 5 restaurants in terms of the number of orders received? [1 mark]

In [None]:
# Finding the top restaurants in terms of the number of orders received
df['restaurant_name'].value_counts()

#### Observations:
* The top 5 restaurants are Shake Shack, The Meatball Shop, Blue Ribbon Sushi, Blue Ribbon Fried Chicken and Parm.
* Shake Shack is the most famous restaurant among all with 219 orders received.
* 33.4% of the total orders were placed in these top 5 restaurants.

### **Question 8**: Which is the most popular cuisine on weekends? [1 mark]

In [None]:
# Finding the most popular cuisine on weekends
df[df['day_of_the_week']=='Weekend']['cuisine_type'].value_counts()

#### Observations:
* American cuisine is the most popular cuisine on weekends.
* 415 orders (30.7%) were placed by customers on weekends for American cuisine.

### **Question 9**: What percentage of the orders cost more than 20 dollars? [2 marks]

In [None]:
# Finding total number of orders that cost more than 20 dollars and storing in a variable
more_than_20dollar_order = df[df['cost_of_the_order']>20].shape[0]
print("Number of orders that cost more than 20 dollars:",more_than_20dollar_order)

# Finding total number of orders placed using FoodHub app and storing in a variable
total_order = df.shape[0]
print("Total number of orders placed using FoodHub app:",total_order)

# Calculating total percentage of the orders that cost more than 20 dollars
order_percentage = (more_than_20dollar_order/total_order)*100
print(f"Total percentage of orders that cost more than 20 dollars: {round(order_percentage,2)}%")

#### Observations:
* Total percentage of orders that cost more than 20 dollars is 29.24%
* Out of 1898 orders, 555 orders cost more than 20 dollars.

### **Question 10**: What is the mean order delivery time? [1 mark]

In [None]:
# Finding the average order delivery time
round(df['delivery_time'].mean(),2)

#### Observations:
* The mean order delivery time is 24.16 minutes.

### **Question 11:** The company has decided to give 20% discount vouchers to the top 3 most frequent customers. Find the IDs of these customers and the number of orders they placed. [1 mark]

In [None]:
# Finding top 3 most frequent customers
df['customer_id'].value_counts().head(3)

#### Observations:
* The top 3 most frequent customers are customer ID 52832, 47440 and 83287 with total number of orders they placed are 13, 10 and 9 respectively.

### Multivariate Analysis

### **Question 12**: Perform a multivariate analysis to explore relationships between the important variables in the dataset. (It is a good idea to explore relations between numerical variables as well as relations between numerical and categorical variables) [10 marks]


### >> Correlation analysis between numerical variables in the data to understand their relationships

In [None]:
# Checking correlation between important numerical variables using Heatmap
columns = ['cost_of_the_order','food_preparation_time','delivery_time']
sns.heatmap(df[columns].corr(),annot=True,cmap='coolwarm');

**Observations:**
* The correlation between food preparation time and cost of the order is not evident as the value is 0.042 which is very low.
* The correlation between food preparation time and delivery time is not evident as the value is 0.011 which is very low.
* The correlation between delivery time and cost of the order is negative however the value is very low and thus the correlation is not apparent.

### >> Plotting pairplot between important numerical variables to understand their relationships

In [None]:
# Using pairplot to further understand the relationship between food preparation time, cost of the order and delivery_time.
sns.pairplot(data=df[['cost_of_the_order','food_preparation_time','delivery_time']])
plt.show()

**Observations:**
* No relationship observed between food preparation time, cost of the order and delivery_time.

### >> Analysis of relationship between Day of the week and Delivery Time

In [None]:
# Calculating the total delivery time based on day of the week
df.groupby('day_of_the_week')['delivery_time'].sum()

In [None]:
# Using barplot to understand the relationship between Day of the week and Delivery Time
sns.barplot(x=df['day_of_the_week'],y=df['delivery_time'])
plt.show()

In [None]:
# Using boxplot understand distribution of Delivery Time by Day of the week
sns.boxplot(x=df['day_of_the_week'],y=df['delivery_time'])
plt.show;

**Observations:**
* More orders are placed on weekends than weekdays as total sum of delivery time on weekends is almost double than on weekdays.
* Weekday has higher average delivery time (28 minutes) than weekend (22 minutes). This could be due to less traffic on weekends than on weekdays or less number of FoodHub employees available for delivery on weekdays.
* The fastest delivery time reported on weekday is 24 minutes and on weekend is 15 minutes.

### >> Analysis of relationship between Day of the week and Food Preparation Time

In [None]:
# Calculating average food preparation time on weekday and weekend
df.groupby('day_of_the_week')['food_preparation_time'].mean()

In [None]:
# Using barplot to understand the relationship between Day of the week and Food Preparation Time
sns.barplot(x=df['day_of_the_week'],y=df['food_preparation_time'])
plt.show()

In [None]:
# Using boxplot to understand the distribution of Food Preparation Time by Day of the week
sns.boxplot(x=df['day_of_the_week'],y=df['food_preparation_time'])
plt.show;

**Observations:**
* The average food preparation time on weekend and weekday is similar.

### >> Analysis of relationship between Day of the week and Cost of the order

In [None]:
# Calculating average cost of the order by day of the week
df.groupby(['day_of_the_week'])['cost_of_the_order'].mean()

In [None]:
# Creating boxplot to understand the distribution of cost of the order by day of the week
sns.barplot(data = df, x = 'day_of_the_week', y = 'cost_of_the_order')
plt.show()

**Observations:**
* The average cost of the order is slightly higher on weekends than weekdays.

### >> Analysis of distribution between Cuisine Type and Cost of the order

In [None]:
# Creating barplot to understand the distribution of total number of orders by cuisine type
df['cuisine_type'].value_counts().plot.bar()
plt.xticks(rotation=90)
plt.show()

In [None]:
# Creating boxplot to understand the distribution of Cost of the order by Cuisine Type
sns.boxplot(x=df['cuisine_type'],y=df['cost_of_the_order'])
plt.xticks(rotation=90)
plt.show()

**Observations:**
* American cuisine is the most popular with around 580 orders placed by customers.
* Vietnamese cuisine was the least ordered.
* Korean cuisine is the cheapest cuisine with the maximum order cost under 14 dollars.
* There are 3 outlier orders for Korean cuisine with order cost between 6-8 dollars and 2 outliers with order cost between 28-31 dollars.
* There is 1 outlier order in vietnamese cusine and 4 outliers in Mediterranean cuisine.

### >> Analysis of distribution between Cuisine Type and Food Preparation Time

In [None]:
# Calculating average food preparation time by cuisine type
df.groupby('cuisine_type')['food_preparation_time'].mean().sort_values(ascending=False)

In [None]:
# Creating boxplot to understand the distribution of food preparation time by cuisine type
sns.boxplot(x=df['cuisine_type'],y=df['food_preparation_time'])
plt.xticks(rotation=90)
plt.show()

**Observations:**
* Based on the average food preparation time, Southern cuisine takes long time to prepare followed by Chinese and Japanese cuisine.
* Korean cuisine takes less time to prepare followed by Vietnamese cuisine.

### >> Analysis of distribution between Cuisine Type and Rating

In [None]:
# Creating countplot to understand the distribution of Cuisine Type and Rating
plt.figure(figsize=(15,7))
sns.countplot(x=df['cuisine_type'],hue=df['rating'],order=df['cuisine_type'].value_counts().index)
plt.xticks(rotation=90)
plt.show()

**Observations:**
* American, Japanese, Italian and Chinese cuisines are the most highly rated cuisines.
* Vietanamese cuisine has been not been rated much by customers. This could also be because less orders are placed for Vietnamese cuisine.

### >> Analysis of distribution between Rating and Cost of the Order

In [None]:
# Calculating average cost of the order by rating and storing in a new dataframe
avg_cost_by_rating = df.groupby('rating')['cost_of_the_order'].mean().reset_index()
avg_cost_by_rating.rename(columns={'cost_of_the_order': 'avg_cost_of_the_order'},inplace=True)

# Creating Lineplot to understand the distribution of average cost of the order and Rating
sns.lineplot(avg_cost_by_rating['rating'],avg_cost_by_rating['avg_cost_of_the_order'])

# Setting the x and y axis labels
plt.xlabel('Rating')
plt.ylabel('Average Cost of the Order')
plt.show()

In [None]:
# Creating boxplot to understand the distribution of Cost of the order by Rating
sns.boxplot(x=df['rating'],y=df['cost_of_the_order'],order=df['rating'].value_counts(ascending=True).index)
plt.show()

**Observations:**
* The orders rated 5 have the highest average cost followed by orders with rating of 4 and 3.
* The orders that are not given any rating has the lowest average cost.

### >> Analysis of distribution between Rating and Delivery Time

In [None]:
# Calculating average delivery time of the order by rating
df.groupby('rating')['delivery_time'].mean()

In [None]:
# Calculating average delivery time by rating and storing in a new dataframe
avg_delvtime_by_rating = df.groupby('rating')['delivery_time'].mean().reset_index()
avg_delvtime_by_rating.rename(columns={'delivery_time': 'avg_delivery_time'},inplace=True)

# Creating Lineplot to understand the distribution of average delivery time of the order and Rating
sns.lineplot(avg_delvtime_by_rating['rating'],avg_delvtime_by_rating['avg_delivery_time'])

# Setting the x and y axis labels
plt.xlabel('Rating')
plt.ylabel('Average delivery time of the Order')
plt.show()

In [None]:
# Creating boxplot to understand the distribution of delivery time by rating
sns.boxplot(x=df['rating'], y=df['delivery_time'])
plt.show()

**Observations:**
* Orders that received rating of 4 has the lowest average delivery time of 23.8 minutes.
* The average delivery time is 24.2 minutes for orders with rating of 5.
* Orders that received rating of 3 has the highest delivery time average (24.5 minutes).
* Orders that did not receive any rating by customers has about similar average delivery time as orders with rating of 5.

### >> Analysis of distribution between Rating and Food Preparation Time

In [None]:
# Calculating average food preparation time by rating
df.groupby('rating')['food_preparation_time'].mean()

In [None]:
# Creating boxplot to understand the food preparation time time by rating
sns.barplot(x=df['rating'], y=df['food_preparation_time'])
plt.show()

**Observations:**
* The average food preparation time is similar for all the orders that received rating or didn't receive rating which is around 27 minutes.

### >> Analysis of distribution between Rating and Restaurant Name


In [None]:
df['restaurant_name'].value_counts().iloc[:10]

In [None]:
# Creating countplot to find the top 10 restaurants by rating
plt.figure(figsize=(15,7))
plt.xticks(rotation=90)
sns.countplot(x=df['restaurant_name'],hue=df['rating'],order=df['restaurant_name'].value_counts().iloc[:10].index)
plt.show()

**Observations:**
* The top 3 restaurants are Shake Shack, The Meatball Shop and Blue Ribbon Sushi.
* Around 85 orders out of 219 orders from Shake Shack were not rated by customers.
* All the top 10 restaurants have some orders that were not given any rating by customers.

### **Question 13:** The company wants to provide a promotional offer in the advertisement of the restaurants. The condition to get the offer is that the restaurants must have a rating count of more than 50 and the average rating should be greater than 4. Find the restaurants fulfilling the criteria to get the promotional offer. [3 marks]

In [None]:
# Finding those orders that were not given any rating by customers, dropping them and storing the rated orders in a new dataframe
rated_orders = df.drop(df.index[[df['rating']=='Not given']])

#Converting rating column datatype to integer
rated_orders['rating'] = rated_orders['rating'].astype(int)

In [None]:
# Finding the restaurants whose rating count is more than 50
rated_orders['restaurant_name'].value_counts() > 50

In [None]:
# Dropping the restaurants whose rating count is less than 50 and store in a new dataframe
top_rated_restaurants = rated_orders.drop(rated_orders.index[(rated_orders['restaurant_name']!='Shake Shack') & (rated_orders['restaurant_name']!='The Meatball Shop')
                                                             & (rated_orders['restaurant_name']!='Blue Ribbon Sushi')
                                                             & (rated_orders['restaurant_name']!='Blue Ribbon Fried Chicken')])

#Calculating average rating of the restaurants whose rating count is more than 50
top_rated_restaurants.groupby('restaurant_name')['rating'].mean().sort_values(ascending=False)

#### Observations:
* Only 4 restaurants fulfill the criteria to get the promotional offer.
* The 4 restaurants are The Meatball Shop, Blue Ribbon Fried Chicken, Shake Shack and Blue Ribbon Sushi with average rating greater than 4.
* Among the qualified restaurants, The Meatball Shop is the highest rated and Blue Ribbon Sushi is the lowest rated with average rating of 4.5 and 4.2 respectively.

### **Question 14:** The company charges the restaurant 25% on the orders having cost greater than 20 dollars and 15% on the orders having cost greater than 5 dollars. Find the net revenue generated by the company across all orders. [3 marks]

In [None]:
# Looping through the cost of the order colum for each order and computing the total revenue based on the order value
count20 = 0
count5 = 0
total = 0
for price in df['cost_of_the_order']:
    if price > 20:
        total = total + (0.25*price)
        count20 += 1
    elif price > 5:
        total = total + (0.15*price)
        count5 += 1
    else:
        continue
print(f"Total number of orders with order value more then 20 dollars: {count20}")
print(f"Total number of orders with order value more then 5 dollars: {count5}")
print(f"The net revenue generated by the company across all orders: {round(total,2)} USD")

#### Observations:
* The net revenue generated by FoodHub across all orders is 6,166.30 USD
* Out of 1898 orders, 555 orders cost more than 20 USD each.
* Out of 1898 orders, 1334 orders cost more than 5 USD but less than 20 USD.

### **Question 15:** The company wants to analyze the total time required to deliver the food. What percentage of orders take more than 60 minutes to get delivered from the time the order is placed? (The food has to be prepared and then delivered.) [2 marks]

In [None]:
# Finding total number of orders
total_order = df.shape[0]
print(f"Total Number of orders: {total_order}")

# Finding total number of orders that takes more than 60 minutes to deliver since the time the order is placed
order_num_morethan60 = df[df['food_preparation_time']+df['delivery_time'] > 60].shape[0]
print(f"Total number of orders that takes more than 60 minutes to get delivered: {order_num_morethan60}")

# Calculating percentage of order that takes more than 60 minutes to be delivered
print(f"Percentage of order that takes more than 60 minutes: {round((order_num_morethan60/total_order)*100,2)}%")

#### Observations:
* 10.54% orders take more than 60 minutes to get delivered from the time the order is placed.
* Out of 1898 orders, 200 orders takes more than 60 minutes to deliver since the time the order is placed by the customer.

### **Question 16:** The company wants to analyze the delivery time of the orders on weekdays and weekends. How does the mean delivery time vary during weekdays and weekends? [2 marks]

In [None]:
# Calculating mean delivery time by day of the week
df.groupby('day_of_the_week')['delivery_time'].mean()

#### Observations:
* The average delivery time is less on weekends than weekdays by almost 6 minutes. The reason behind this could be less traffic on weekends than on weekdays or less number of FoodHub employees available for delivery on weekdays.
* The mean delivery time on weekdays is 28.34 minutes.
* The mean delivery time on weekends is 22.47 minutes.

### Conclusion and Recommendations

### **Question 17:** What are your conclusions from the analysis? What recommendations would you like to share to help improve the business? (You can use cuisine type and feedback ratings to drive your business recommendations.) [6 marks]

### Conclusions:
* Total of 1898 orders were placed using FoodHub app.
* About 65% (784 customers) of FoodHub customers have only placed single order followed by 22% (267 customers) customers who placed only two orders.
* Majority of orders (71.2%) were placed during weekends (Saturday & Sunday) and 28.8% orders are placed on Weekdays (Monday-Friday).
* 38.77% orders have not been given any rating, which is a significantly high number while the remaining orders were given a rating of 3,4 and 5.
* Out of 178 restaurants available on the FoodHub app, Shake Shack is the most popular, followed by The Meatball Shop and Blue Ribbon Sushi.
* American cuisine is the most popular among the 14 cuisines available on the FoodHub app, followed by Japanese and Italian cuisine.
* Average cost of an order placed on the app is around 16 USD.
* The food preparation time takes 27 minutes on an average, and the delivery takes around 25 minutes.
* Weekdays have higher average delivery time (28 minutes) than weekends (22 minutes). This could be due to more traffic or less number of FoodHub employees available on weekdays.

### Recommendations:
* Efforts need to be made to understand why significant numbers of customers placed order on the FoodHub app only once or twice. Identifying the root cause will help focus efforts more effectively.
* Offer discounts and special promotions to attract new customers and retain existing ones.
* Encourage customers to provide reviews and feedback. Use the input to identify areas for improvement and continuously strive to enhance the customer experience.
* Provide offers and deals on restaurants that are less popular among customers.
* Position the menu items within the app strategically. Highlight popular dishes and create sections that are visually appealing and easy to navigate.
* As weekdays have higher delivery time, FoodHub should analyze the availability of delivery employees and also optimize delivery routes using route planning software to minimize delivery times.
* Since majority of orders are placed on weekend, FoodHub should introduce weekend specific deals to increase the customer engagement. FoodHub should also analyze the availability of delivery employees so that a better customer service can be provided.

---