To explore and analyze restaurant data from Zomato's Bangalore listings to gain business insights using Python. This project covers data cleaning, transformation, visual exploration, and statistical analysis.

Imoporting Libraries

In [32]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

Load Dataset

In [33]:
df = pd.read_csv(r"C:\Users\abhir\Downloads\zomato.csv")

Initial Overview

Print a concise summary including non-null values and data types

In [34]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 51717 entries, 0 to 51716
Data columns (total 17 columns):
 #   Column                       Non-Null Count  Dtype 
---  ------                       --------------  ----- 
 0   url                          51717 non-null  object
 1   address                      51717 non-null  object
 2   name                         51717 non-null  object
 3   online_order                 51717 non-null  object
 4   book_table                   51717 non-null  object
 5   rate                         43942 non-null  object
 6   votes                        51717 non-null  int64 
 7   phone                        50509 non-null  object
 8   location                     51696 non-null  object
 9   rest_type                    51490 non-null  object
 10  dish_liked                   23639 non-null  object
 11  cuisines                     51672 non-null  object
 12  approx_cost(for two people)  51371 non-null  object
 13  reviews_list                 51

Get the number of rows and columns in the dataset

In [35]:
df.shape

(51717, 17)

Check the count of missing values in each column

In [36]:
df.isnull().sum()

url                                0
address                            0
name                               0
online_order                       0
book_table                         0
rate                            7775
votes                              0
phone                           1208
location                          21
rest_type                        227
dish_liked                     28078
cuisines                          45
approx_cost(for two people)      346
reviews_list                       0
menu_item                          0
listed_in(type)                    0
listed_in(city)                    0
dtype: int64

Display statistical summaries for all columns including non-numerical

In [37]:
df.describe()

Unnamed: 0,votes
count,51717.0
mean,283.697527
std,803.838853
min,0.0
25%,7.0
50%,41.0
75%,198.0
max,16832.0


Check data types of all columns

In [38]:
df.dtypes

url                            object
address                        object
name                           object
online_order                   object
book_table                     object
rate                           object
votes                           int64
phone                          object
location                       object
rest_type                      object
dish_liked                     object
cuisines                       object
approx_cost(for two people)    object
reviews_list                   object
menu_item                      object
listed_in(type)                object
listed_in(city)                object
dtype: object

Display number of unique values per column

In [58]:
df.nunique()

name                7086
online_order           2
book_table             2
rate                  31
votes               2323
location              92
rest_type             88
dish_liked          5257
cuisines            2487
cost                  65
listed_in(type)        7
cost_per_person       65
primary_cuisines      88
dtype: int64

List top restaurants by frequency

In [59]:
df['name'].value_counts().head()

name
Cafe Coffee Day      82
Onesta               81
Empire Restaurant    68
Kanti Sweets         60
Just Bake            56
Name: count, dtype: int64

List top locations by frequency

In [60]:
df['location'].value_counts().head()

location
BTM                      2236
Indiranagar              1658
Whitefield               1638
Koramangala 5th Block    1582
HSR                      1566
Name: count, dtype: int64

First five rows

In [39]:
df.head(5)

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city)
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1/5,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1/5,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8/5,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7/5,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8/5,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari


Data Cleaning and Pre-processing

Drop irrelevant columns that do not contribute to analysis

In [40]:
columns_to_drop = ['url', 'address', 'phone', 'reviews_list', 'menu_item', 'listed_in(city)']
df.drop(columns= columns_to_drop, inplace = True)

Remove duplicate rows to avoid skewed results

In [41]:
df.drop_duplicates(inplace = True)

Drop rows where critical columns have missing values

In [42]:
df.dropna(subset= ['rate', 'approx_cost(for two people)', 'location'], inplace = True)

Fill missing values in less critical columns with default values

In [43]:
df['cuisines'] = df['cuisines'].fillna('Unknown')
df['rest_type'] = df['rest_type'].fillna('Others')
df['dish_liked'] = df['dish_liked'].fillna('Not specified')


Clean 'rate' column by removing '/5', stripping spaces, and converting to numeric

In [57]:

df['rate'] = df['rate'].astype(str).str.replace('/5', '', regex=False).str.strip()
df['rate'] = df['rate'].replace('-', np.nan)
df['rate'] = pd.to_numeric(df['rate'], errors='coerce')



Remove commas from 'cost' column and convert to float

In [48]:
df['approx_cost(for two people)']= df['approx_cost(for two people)'].astype(str).str.replace(",", "").astype(float)
df.rename(columns= {'approx_cost(for two people)':'cost'}, inplace = True)

Add new column with cost per person

In [63]:
df['cost_per_person'] = df['cost']/2

Extract the first listed cuisine as the primary one

In [61]:
df['primary_cuisines'] = df['cuisines'].str.split(',').str[0]

Convert 'Yes/No' to boolean in online order and table booking

In [50]:
df['online_order']= df['online_order'].map({'Yes':True, 'No':False})
df['book_table']= df['book_table'].map({'Yes':True, 'No':False})

Standardize column names: lowercase, strip spaces, replace spaces with underscores

In [53]:
df.columns= df.columns.str.strip().str.lower().str.replace(" ", "_")

In [54]:
df.head()

Unnamed: 0,name,online_order,book_table,rate,votes,location,rest_type,dish_liked,cuisines,cost,listed_in(type),cost_per_person,primary_cuisines
0,Jalsa,True,True,4.1,775,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800.0,Buffet,400.0,North Indian
1,Spice Elephant,True,False,4.1,787,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800.0,Buffet,400.0,Chinese
2,San Churro Cafe,True,False,3.8,918,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800.0,Buffet,400.0,Cafe
3,Addhuri Udupi Bhojana,False,False,3.7,88,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300.0,Buffet,150.0,South Indian
4,Grand Village,False,False,3.8,166,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600.0,Buffet,300.0,North Indian


We cleaned the dataset by removing unnecessary columns and duplicate rows, handling missing values in key fields, and standardizing entries. The rate column was cleaned by removing text like /5, and non-numeric values such as 'NEW' or '-' were filtered out. The approx_cost(for two people) column was cleaned and renamed to cost. We also extracted new columns like cost_per_person and primary_cuisine for better insights. Lastly, we converted relevant columns to appropriate types and standardized all column names for consistency.