# **Zomato Restaurants Analysis**
This project aims to understand customer preferences and restaurant trends to support key business decisions in the food industry. I analyze Zomatoâ€™s restaurant dataset using `Python` for data cleaning and exploratory data analysis (EDA), `SQL` for answering business queries, and `Power BI` for building an interactive dashboard that highlights meaningful insights.

## 1. Import Libraries

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sqlalchemy import create_engine

## 2. Load Dataset

In [2]:
df = pd.read_csv('../Data/Zomato-Restaurants-Dataset.csv')
df.head()

Unnamed: 0.1,Unnamed: 0,restaurant name,restaurant type,rate (out of 5),num of ratings,avg cost (two people),online_order,table booking,cuisines type,Unnamed: 9,area,local address
0,0,#FeelTheROLL,Quick Bites,3.4,7,200,No,No,Fast Food,0,Bellandur,Bellandur
1,1,#L-81 Cafe,Quick Bites,3.9,48,400,Yes,No,"Fast Food, Beverages",1,"Byresandra,Tavarekere,Madiwala",HSR
2,2,#refuel,Cafe,3.7,37,400,Yes,No,"Cafe, Beverages",2,Bannerghatta Road,Bannerghatta Road
3,3,'@ Biryani Central,Casual Dining,2.7,135,550,Yes,No,"Biryani, Mughlai, Chinese",3,Marathahalli,Marathahalli
4,4,'@ The Bbq,Casual Dining,2.8,40,700,Yes,No,"BBQ, Continental, North Indian, Chinese, Bever...",4,Bellandur,Bellandur


## 3. Data Cleaning

In [3]:
df.shape

(7105, 12)

In [4]:
df.columns

Index(['Unnamed: 0', 'restaurant name', 'restaurant type', 'rate (out of 5)',
       'num of ratings', 'avg cost (two people)', 'online_order',
       'table booking', 'cuisines type', 'Unnamed: 9', 'area',
       'local address'],
      dtype='object')

In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7105 entries, 0 to 7104
Data columns (total 12 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   Unnamed: 0             7105 non-null   int64  
 1   restaurant name        7105 non-null   object 
 2   restaurant type        7105 non-null   object 
 3   rate (out of 5)        7105 non-null   float64
 4   num of ratings         7105 non-null   int64  
 5   avg cost (two people)  7105 non-null   int64  
 6   online_order           7105 non-null   object 
 7   table booking          7105 non-null   object 
 8   cuisines type          7105 non-null   object 
 9   Unnamed: 9             7105 non-null   int64  
 10  area                   7105 non-null   object 
 11  local address          7105 non-null   object 
dtypes: float64(1), int64(4), object(7)
memory usage: 666.2+ KB


In [6]:
df = df.drop(['Unnamed: 0','Unnamed: 9'],axis=1)

In [7]:
df.columns = df.columns.str.replace(' ','_').str.lower()

In [8]:
df = df.rename(columns={
    'rate_(out_of_5)':'rating',
    'avg_cost_(two_people)':'two_cost'
    })

In [9]:
df['restaurant_name'] = (
    df['restaurant_name']
    .astype(str)
    .str.replace(r"[^A-Za-z0-9\s]", " ", regex=True) 
    .str.replace(r"\s+", " ", regex=True)           
    .str.strip()                                     
    .str.title()                                     
)


In [10]:
df.isnull().sum()

restaurant_name    0
restaurant_type    0
rating             0
num_of_ratings     0
two_cost           0
online_order       0
table_booking      0
cuisines_type      0
area               0
local_address      0
dtype: int64

In [11]:
df.head()

Unnamed: 0,restaurant_name,restaurant_type,rating,num_of_ratings,two_cost,online_order,table_booking,cuisines_type,area,local_address
0,Feeltheroll,Quick Bites,3.4,7,200,No,No,Fast Food,Bellandur,Bellandur
1,L 81 Cafe,Quick Bites,3.9,48,400,Yes,No,"Fast Food, Beverages","Byresandra,Tavarekere,Madiwala",HSR
2,Refuel,Cafe,3.7,37,400,Yes,No,"Cafe, Beverages",Bannerghatta Road,Bannerghatta Road
3,Biryani Central,Casual Dining,2.7,135,550,Yes,No,"Biryani, Mughlai, Chinese",Marathahalli,Marathahalli
4,The Bbq,Casual Dining,2.8,40,700,Yes,No,"BBQ, Continental, North Indian, Chinese, Bever...",Bellandur,Bellandur


In [None]:
df['rating'] = df['rating'].replace(0,np.nan).astype(float)

median_rating = df['rating'].median()

df['rating'].fillna(median_rating, inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['rating'].fillna(median_rating, inplace=True)


In [34]:
df['two_cost'] = df['two_cost'].replace(0,np.nan).astype(float)
median_two_cost = df['two_cost'].median()

df['two_cost'].fillna(median_two_cost,inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['two_cost'].fillna(median_two_cost,inplace=True)


## 4. Exploratory Data Analysis


In [37]:
df.describe()

Unnamed: 0,rating,num_of_ratings,two_cost
count,7105.0,7105.0,7105.0
mean,3.514117,188.921042,539.161013
std,0.461028,592.171049,461.211329
min,1.8,1.0,40.0
25%,3.2,16.0,300.0
50%,3.5,40.0,400.0
75%,3.8,128.0,600.0
max,4.9,16345.0,6000.0
