# Pandas Exploration of Restaurant Data

## Will be performing Data Exploration and Data Cleaning through Pandas

## Objective:

The objective of this analysis is to provide data insights to future restaurant owners who are planning to open a food delivery store but has no ideas on below business decisions such as: which cuisine types should they do, which zones to choose for their kitchen site to get the most sales, which is the most effective time period of a day to do marketing,etc. To help these restaurant owners, we use SQL to answer these questions:

Which restaurant received the most orders?

Which restaurant saw most sales?

Which customer ordered the most?

Display restaurant name and the category where name starts with s ?

Which is the most liked cuisine?

Which zone has the most sales?

The payment mode used maximum number of times

Restaurant receiving the delivery rating greater than 4

Maximum delivery time taken by restaurant

customer name, restaurant name and the category where category is ordinary and name starts with d

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
pd.set_option('display.max_rows', None)

In [2]:
df1 = pd.read_csv('data/Orders.csv', low_memory=False)
df1.head()

Unnamed: 0,Order ID,Customer Name,Restaurant ID,Order Date,Quantity of Items,Order Amount,Payment Mode,Delivery Time Taken (mins),Customer Rating-Food,Customer Rating-Delivery
0,OD1,Srini,6,1/1/22 23:15,5,633,Debit Card,47,5,3
1,OD2,Revandh,13,1/1/22 19:21,5,258,Credit Card,41,3,5
2,OD3,David,9,1/1/22 23:15,7,594,Cash on Delivery,30,3,4
3,OD4,Selva,4,1/1/22 20:31,5,868,Cash on Delivery,30,3,4
4,OD5,Vinny,4,1/1/22 11:10,4,170,Debit Card,18,4,3


In [3]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 10 columns):
 #   Column                      Non-Null Count  Dtype 
---  ------                      --------------  ----- 
 0   Order ID                    500 non-null    object
 1   Customer Name               500 non-null    object
 2   Restaurant ID               500 non-null    int64 
 3   Order Date                  500 non-null    object
 4   Quantity of Items           500 non-null    int64 
 5   Order Amount                500 non-null    int64 
 6   Payment Mode                500 non-null    object
 7   Delivery Time Taken (mins)  500 non-null    int64 
 8   Customer Rating-Food        500 non-null    int64 
 9   Customer Rating-Delivery    500 non-null    int64 
dtypes: int64(6), object(4)
memory usage: 39.2+ KB


In [4]:
df2 = pd.read_csv('data/Restaurants.csv', low_memory=False)
df2.head()

Unnamed: 0,RestaurantID,RestaurantName,Cuisine,Zone,Category
0,1,The Cave Hotel,Continental,Zone B,Pro
1,2,SSK Hotel,North Indian,Zone D,Pro
2,3,ASR Restaurant,South Indian,Zone D,Ordinary
3,4,Win Hotel,South Indian,Zone D,Ordinary
4,5,Denver Restaurant,Continental,Zone D,Pro


In [5]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 5 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   RestaurantID    20 non-null     int64 
 1   RestaurantName  20 non-null     object
 2   Cuisine         20 non-null     object
 3   Zone            20 non-null     object
 4   Category        20 non-null     object
dtypes: int64(1), object(4)
memory usage: 928.0+ bytes


Not a large dataset by any means, but will be good for use with beginner projects

In [6]:
df1.isna().sum()

Order ID                      0
Customer Name                 0
Restaurant ID                 0
Order Date                    0
Quantity of Items             0
Order Amount                  0
Payment Mode                  0
Delivery Time Taken (mins)    0
Customer Rating-Food          0
Customer Rating-Delivery      0
dtype: int64

In [7]:
df2.isna().sum()

RestaurantID      0
RestaurantName    0
Cuisine           0
Zone              0
Category          0
dtype: int64

In [8]:
df1 = df1.rename(columns={'Restaurant ID': 'RestaurantID'})

No null values, gives us the advantage with answering these questions much faster. 

Now that we have seen the data, looked into any of the data cleaning, we can now start answering the questions.

#### Which restaurant received the most orders?


In [9]:
order_count_by_restaurant = df1.merge(df2, on='RestaurantID')['RestaurantName'].value_counts()
most_ordered_restaurant = order_count_by_restaurant.max()


In [10]:
print(f"Restaurant that received the most orders is: {most_ordered_restaurant}")


Restaurant that received the most orders is: 32


#### Which restaurant saw most sales?

In [11]:
sales_by_restaurant = df1.merge(df2, on='RestaurantID').groupby('RestaurantName')['Order Amount'].sum()
highest_sales_restaurant = sales_by_restaurant.max()


In [12]:
print(f"Restaurant with the highest sales is: {highest_sales_restaurant}")


Restaurant with the highest sales is: 19168


#### Which customer ordered the most?

In [13]:
most_ordering_customer = df1['Customer Name'].value_counts().max()


In [14]:
print(f"Customer who ordered the most is: {most_ordering_customer}")


Customer who ordered the most is: 34


#### Display restaurant name and the category where name starts with s ?

In [15]:
restaurants_starting_with_s = df2[df2['RestaurantName'].str.startswith('S')]
restaurant_category_starting_with_s = restaurants_starting_with_s[['RestaurantName', 'Category']]


In [16]:
print(restaurant_category_starting_with_s)


   RestaurantName  Category
1       SSK Hotel       Pro
18      Sam Hotel  Ordinary


#### Which is the most liked cuisine?

In [17]:
merged_data = df1.merge(df2, on='RestaurantID')
avg_food_rating_by_cuisine = merged_data.groupby('Cuisine')['Customer Rating-Food'].mean()
most_liked_cuisine = avg_food_rating_by_cuisine.idxmax()


In [18]:
print(f"Most liked cuisine is: {most_liked_cuisine}")


Most liked cuisine is: North Indian


#### Which zone has the most sales?

In [19]:
sales_by_zone = df1.merge(df2, on='RestaurantID').groupby('Zone')['Order Amount'].sum()
highest_sales_zone = sales_by_zone.idxmax()



In [20]:
print(f"The zone with the most sales is: {highest_sales_zone}")


The zone with the most sales is: Zone D


#### The payment mode used maximum number of times

In [21]:
most_common_payment_mode = df1['Payment Mode'].value_counts().idxmax()


In [22]:
print(f"The most common payment mode is: {most_common_payment_mode}")


The most common payment mode is: Debit Card


#### Restaurant receiving the delivery rating greater than 4

In [23]:
high_rated_delivery_restaurants = df1[df1['Customer Rating-Delivery'] > 4].merge(df2, on='RestaurantID')


In [24]:
print(high_rated_delivery_restaurants[['RestaurantName', 'Category']])


       RestaurantName  Category
0     Veer Restaurant  Ordinary
1     Veer Restaurant  Ordinary
2     Veer Restaurant  Ordinary
3     Veer Restaurant  Ordinary
4     Veer Restaurant  Ordinary
5          Dave Hotel  Ordinary
6          Dave Hotel  Ordinary
7          Dave Hotel  Ordinary
8          Dave Hotel  Ordinary
9          Dave Hotel  Ordinary
10         Dave Hotel  Ordinary
11         Dave Hotel  Ordinary
12         Dave Hotel  Ordinary
13          Sam Hotel  Ordinary
14          Sam Hotel  Ordinary
15          Sam Hotel  Ordinary
16                AMN  Ordinary
17                AMN  Ordinary
18                AMN  Ordinary
19                AMN  Ordinary
20                AMN  Ordinary
21          KSR Hotel       Pro
22          KSR Hotel       Pro
23          KSR Hotel       Pro
24          KSR Hotel       Pro
25            Zam Zam  Ordinary
26            Zam Zam  Ordinary
27            Zam Zam  Ordinary
28            Zam Zam  Ordinary
29            Zam Zam  Ordinary
30      

#### Maximum delivery time taken by restaurant

In [25]:
max_delivery_time_by_restaurant = df1.groupby('RestaurantID')['Delivery Time Taken (mins)'].max()
most_delivery_time_restaurant = max_delivery_time_by_restaurant.idxmax()

# Find the restaurant name corresponding to the most delivery time
most_delivery_time_restaurant_name = df2[df2['RestaurantID'] == most_delivery_time_restaurant]['RestaurantName'].iloc[0]


In [26]:
print(f"The restaurant with the maximum delivery time is: {most_delivery_time_restaurant_name}")


The restaurant with the maximum delivery time is: ASR Restaurant


#### Customer name, restaurant name and the category where category is ordinary and name starts with d#

In [27]:
merged_data = df1.merge(df2, on='RestaurantID')
desired_customers = merged_data.loc[(merged_data['Category'] == 'Ordinary') & (merged_data['RestaurantName'].str.startswith('D'))]


In [28]:
print(desired_customers[['Customer Name', 'RestaurantName', 'Category']])


    Customer Name RestaurantName  Category
220           Dev     Dave Hotel  Ordinary
221         Swamy     Dave Hotel  Ordinary
222       Revandh     Dave Hotel  Ordinary
223        Farhan     Dave Hotel  Ordinary
224         Vinny     Dave Hotel  Ordinary
225         Vinny     Dave Hotel  Ordinary
226         Srini     Dave Hotel  Ordinary
227       Revandh     Dave Hotel  Ordinary
228        Chinny     Dave Hotel  Ordinary
229      Veronica     Dave Hotel  Ordinary
230         Swamy     Dave Hotel  Ordinary
231       Sabeena     Dave Hotel  Ordinary
232         Meera     Dave Hotel  Ordinary
233         Gopal     Dave Hotel  Ordinary
234         Srini     Dave Hotel  Ordinary
235       Sabeena     Dave Hotel  Ordinary
236           Dev     Dave Hotel  Ordinary
237           Dev     Dave Hotel  Ordinary
238       Sabeena     Dave Hotel  Ordinary
239        Chinny     Dave Hotel  Ordinary
