# Zomato Bangalore Dataset Analysis & Visualization

## 4.Data Cleaning & Pre-processing

## ðŸ”¹ Importing Required Libraries
We begin by importing the essential Python libraries for data analysis and cleaning. These include:
- `pandas` for data manipulation,
- `numpy` for numerical operations,


In [4]:
import numpy as np 
import pandas as pd


### ðŸ”¹Load Dataset

The dataset is loaded using `pd.read_csv()` and displayed to understand its structure, dimensions, and first few records.


In [5]:
df = pd.read_csv(r"C:\Users\abhir\Downloads\zomato.csv")
df.head()

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,approx_cost(for two people),reviews_list,menu_item,listed_in(type),listed_in(city)
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,Yes,Yes,4.1/5,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,Yes,No,4.1/5,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,Yes,No,3.8/5,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,No,No,3.7/5,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,No,No,3.8/5,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari


### ðŸ”¹ 4.1 Drop irrelevant columns that do not contribute to analysis

These columns do not contribute to our analysis. Dropping reduces dataset size and complexity.

In [3]:
columns_to_drop = ['url', 'address', 'phone', 'reviews_list', 'menu_item', 'listed_in(city)']
df.drop(columns= columns_to_drop, inplace = True)

### ðŸ”¹4.2 Remove duplicate rows to avoid skewed results 

Duplicates may skew data and introduce bias. Removing ensures each restaurant is unique.

In [4]:
df.drop_duplicates(inplace= True)

### ðŸ”¹ 4.3 Drop Rows with Missing Values in Critical Columns

Missing values can cause problems in calculations and model training. For critical columns (rate, approx_cost(for two people), location), missing entries are removed

In [5]:
df.dropna(subset= ['rate', 'approx_cost(for two people)', 'location'], inplace = True)

### ðŸ”¹ 4.4 Fill Missing Values in Less Critical Columns

For non-critical columns (cuisines, rest_type, dish_liked), missing values are replaced with default placeholders.

In [6]:
df['cuisines'] = df['cuisines'].fillna('Unknown')
df['rest_type'] = df['rest_type'].fillna('Others')
df['dish_liked'] = df['dish_liked'].fillna('Not specified')

### ðŸ”¹ 4.5 Clean and Convert rate Column

The `rate` column contains strings like `'4.1/5'` or `'-'`.  
Steps:
1. Remove the `/5` part.
2. Strip spaces.
3. Replace `'-'` with NaN (since it's not a valid rating).
4. Convert to `float` type for calculations.


In [7]:
df['rate'] = df['rate'].astype(str).str.replace('/5', '', regex=False).str.strip()
df['rate'] = df['rate'].replace('-', np.nan)
df['rate'] = pd.to_numeric(df['rate'], errors='coerce')

### ðŸ”¹ 4.6 Clean and Rename approx_cost(for two people)

Removes commas, converts to float, and renames to cost.

In [8]:
df['approx_cost(for two people)']= df['approx_cost(for two people)'].astype(str).str.replace(",", "").astype(float)
df.rename(columns= {'approx_cost(for two people)':'cost'}, inplace = True)

### ðŸ”¹4.7 Add new column with cost per person

Dividing `cost` by 2 provides an individual meal price estimate.

In [9]:
df['cost_per_person'] = df['cost'] / 2

### ðŸ”¹4.8 Extract the first listed cuisine as the primary one

Extracting the first cuisine listed gives us the main food category each restaurant is known for.

In [10]:
df['primary_cuisines'] = df['cuisines'].str.split(',').str[0]

### ðŸ”¹4.9 Convert 'Yes/No' to boolean in online order and table booking

For `online_order` and `book_table`,  
we map `'Yes'` â†’ `True` and `'No'` â†’ `False` to facilitate logical/boolean filtering.


In [11]:
df['online_order'] = df['online_order'].map({'Yes': True, 'No': False})
df['book_table'] = df['book_table'].map({'Yes': True, 'No': False})


### ðŸ”¹4.10 Normalize column names: lowercase and underscores instead of spaces

Making all column names lowercase and separating words with underscores:
- Improves typing convenience
- Avoids case-sensitivity issues


In [12]:
df.columns = df.columns.str.strip().str.lower().str.replace(" ", "_")

### ðŸ”¹4.11 Remove rows with zero or negative cost values

Any restaurant entry showing `0` or negative `cost` values is unrealistic and likely due to data entry errors.  
Removing them ensures more accurate cost-based analysis.


In [13]:
df[df['cost'] > 0 ]

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,cost,reviews_list,menu_item,listed_in(type),listed_in(city),cost_per_person,primary_cuisines
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,400.0,North Indian
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,400.0,Chinese
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,400.0,Cafe
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,150.0,South Indian
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,300.0,North Indian
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51712,https://www.zomato.com/bangalore/best-brews-fo...,"Four Points by Sheraton Bengaluru, 43/3, White...",Best Brews - Four Points by Sheraton Bengaluru...,False,False,3.6,27,080 40301477,Whitefield,Bar,,Continental,1500.0,"[('Rated 5.0', ""RATED\n Food and service are ...",[],Pubs and bars,Whitefield,750.0,Continental
51713,https://www.zomato.com/bangalore/vinod-bar-and...,"Number 10, Garudachar Palya, Mahadevapura, Whi...",Vinod Bar And Restaurant,False,False,,0,+91 8197675843,Whitefield,Bar,,Finger Food,600.0,[],[],Pubs and bars,Whitefield,300.0,Finger Food
51714,https://www.zomato.com/bangalore/plunge-sherat...,Sheraton Grand Bengaluru Whitefield Hotel & Co...,Plunge - Sheraton Grand Bengaluru Whitefield H...,False,False,,0,,Whitefield,Bar,,Finger Food,2000.0,[],[],Pubs and bars,Whitefield,1000.0,Finger Food
51715,https://www.zomato.com/bangalore/chime-sherato...,Sheraton Grand Bengaluru Whitefield Hotel & Co...,Chime - Sheraton Grand Bengaluru Whitefield Ho...,False,True,4.3,236,080 49652769,"ITPL Main Road, Whitefield",Bar,"Cocktails, Pizza, Buttermilk",Finger Food,2500.0,"[('Rated 4.0', 'RATED\n Nice and friendly pla...",[],Pubs and bars,Whitefield,1250.0,Finger Food


### ðŸ”¹4.12 Remove unrealistic rating values (e.g., greater than 5)

This step filters the dataset to keep only rows where the rating is **5 or below**, removing any invalid or unrealistic ratings.


In [14]:
df = df[df['rate'] <= 5]

### ðŸ”¹4.13 Remove rows where votes are missing or non-

This step keeps only the rows where the **votes** value is greater than zero, removing entries with missing or non-positive vote counts.


In [15]:
df[df['votes'] > 0]

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,cost,reviews_list,menu_item,listed_in(type),listed_in(city),cost_per_person,primary_cuisines
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,400.0,North Indian
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,400.0,Chinese
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,400.0,Cafe
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,150.0,South Indian
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,300.0,North Indian
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
51709,https://www.zomato.com/bangalore/the-farm-hous...,"136, SAP Labs India, KIADB Export Promotion In...",The Farm House Bar n Grill,False,False,3.7,34,+91 9980121279\n+91 9900240646,Whitefield,"Casual Dining, Bar",,"North Indian, Continental",800.0,"[('Rated 4.0', 'RATED\n Ambience- Big and spa...",[],Pubs and bars,Whitefield,400.0,North Indian
51711,https://www.zomato.com/bangalore/bhagini-2-whi...,"139/C1, Next To GR Tech Park, Pattandur Agraha...",Bhagini,False,False,2.5,81,080 65951222,Whitefield,"Casual Dining, Bar","Biryani, Andhra Meal","Andhra, South Indian, Chinese, North Indian",800.0,"[('Rated 4.0', 'RATED\n A fine place to chill...",[],Pubs and bars,Whitefield,400.0,Andhra
51712,https://www.zomato.com/bangalore/best-brews-fo...,"Four Points by Sheraton Bengaluru, 43/3, White...",Best Brews - Four Points by Sheraton Bengaluru...,False,False,3.6,27,080 40301477,Whitefield,Bar,,Continental,1500.0,"[('Rated 5.0', ""RATED\n Food and service are ...",[],Pubs and bars,Whitefield,750.0,Continental
51715,https://www.zomato.com/bangalore/chime-sherato...,Sheraton Grand Bengaluru Whitefield Hotel & Co...,Chime - Sheraton Grand Bengaluru Whitefield Ho...,False,True,4.3,236,080 49652769,"ITPL Main Road, Whitefield",Bar,"Cocktails, Pizza, Buttermilk",Finger Food,2500.0,"[('Rated 4.0', 'RATED\n Nice and friendly pla...",[],Pubs and bars,Whitefield,1250.0,Finger Food


### ðŸ”¹4.14 Filter Sparse Locations

Removed locations with fewer than 10 listings to ensure reliable location-based insights.


In [16]:
location_counts = df['location'].value_counts()
valid_locations = location_counts[location_counts >= 10].index
df = df[df['location'].isin(valid_locations)]

### ðŸ”¹4.15 Preview of first five records of cleaned data

In [17]:
df.head(5)

Unnamed: 0,url,address,name,online_order,book_table,rate,votes,phone,location,rest_type,dish_liked,cuisines,cost,reviews_list,menu_item,listed_in(type),listed_in(city),cost_per_person,primary_cuisines
0,https://www.zomato.com/bangalore/jalsa-banasha...,"942, 21st Main Road, 2nd Stage, Banashankari, ...",Jalsa,True,True,4.1,775,080 42297555\r\n+91 9743772233,Banashankari,Casual Dining,"Pasta, Lunch Buffet, Masala Papad, Paneer Laja...","North Indian, Mughlai, Chinese",800.0,"[('Rated 4.0', 'RATED\n A beautiful place to ...",[],Buffet,Banashankari,400.0,North Indian
1,https://www.zomato.com/bangalore/spice-elephan...,"2nd Floor, 80 Feet Road, Near Big Bazaar, 6th ...",Spice Elephant,True,False,4.1,787,080 41714161,Banashankari,Casual Dining,"Momos, Lunch Buffet, Chocolate Nirvana, Thai G...","Chinese, North Indian, Thai",800.0,"[('Rated 4.0', 'RATED\n Had been here for din...",[],Buffet,Banashankari,400.0,Chinese
2,https://www.zomato.com/SanchurroBangalore?cont...,"1112, Next to KIMS Medical College, 17th Cross...",San Churro Cafe,True,False,3.8,918,+91 9663487993,Banashankari,"Cafe, Casual Dining","Churros, Cannelloni, Minestrone Soup, Hot Choc...","Cafe, Mexican, Italian",800.0,"[('Rated 3.0', ""RATED\n Ambience is not that ...",[],Buffet,Banashankari,400.0,Cafe
3,https://www.zomato.com/bangalore/addhuri-udupi...,"1st Floor, Annakuteera, 3rd Stage, Banashankar...",Addhuri Udupi Bhojana,False,False,3.7,88,+91 9620009302,Banashankari,Quick Bites,Masala Dosa,"South Indian, North Indian",300.0,"[('Rated 4.0', ""RATED\n Great food and proper...",[],Buffet,Banashankari,150.0,South Indian
4,https://www.zomato.com/bangalore/grand-village...,"10, 3rd Floor, Lakshmi Associates, Gandhi Baza...",Grand Village,False,False,3.8,166,+91 8026612447\r\n+91 9901210005,Basavanagudi,Casual Dining,"Panipuri, Gol Gappe","North Indian, Rajasthani",600.0,"[('Rated 4.0', 'RATED\n Very good restaurant ...",[],Buffet,Banashankari,300.0,North Indian


### ðŸ”¹Exporting the Cleaned Data to CSV

In [19]:
df.to_csv('cleaned_zomato_dataset', index= False)

## ðŸ”¹Interpretation Summary

By the end of cleaning, we:
- Eliminated irrelevant & duplicate data
- Handled missing values wisely
- Standardized formats and data types
- Added useful new columns
- Kept only valid, high-quality rows

The dataset is now **analysis-ready**.
