# Sales Performance Analysis  
## Zomato Bangalore Restaurants Dataset

### Project Objective
The objective of this project is to analyze Bangalore restaurant data from Zomato to understand
sales performance across locations, cuisines, and service availability.

### Dataset
Source: Zomato Bangalore Restaurants (Public Kaggle Dataset)

### Note
Revenue and profit metrics are analytically derived for business analysis purposes.


## Step 1: Data Loading & Initial Inspection
In this step, we load the dataset and perform basic inspection to understand its structure,
data types, and data quality issues.


In [18]:
import pandas as pd
import numpy as np


In [19]:
df = pd.read_csv(r'C:\Users\swath\OneDrive\Desktop\github project\zomato.csv')


In [20]:
df.shape


(7105, 12)

In [21]:
df.columns


Index(['Unnamed: 0.1', 'Unnamed: 0', 'restaurant name', 'restaurant type',
       'rate (out of 5)', 'num of ratings', 'avg cost (two people)',
       'online_order', 'table booking', 'cuisines type', 'area',
       'local address'],
      dtype='object')

In [22]:
df.describe(include='all')


Unnamed: 0.2,Unnamed: 0.1,Unnamed: 0,restaurant name,restaurant type,rate (out of 5),num of ratings,avg cost (two people),online_order,table booking,cuisines type,area,local address
count,7105.0,7105.0,7105,7105,7037.0,7105.0,7048.0,7105,7105,7105,7105,7105
unique,,,7105,81,,,,2,2,2175,30,90
top,,,#FeelTheROLL,Quick Bites,,,,Yes,No,"North Indian, Chinese","Byresandra,Tavarekere,Madiwala",Whitefield
freq,,,1,2840,,,,3727,6361,421,798,459
mean,3552.0,3552.0,,,3.514253,188.921042,540.286464,,,,,
std,2051.181164,2051.181164,,,0.463249,592.171049,462.902305,,,,,
min,0.0,0.0,,,1.8,1.0,40.0,,,,,
25%,1776.0,1776.0,,,3.2,16.0,300.0,,,,,
50%,3552.0,3552.0,,,3.5,40.0,400.0,,,,,
75%,5328.0,5328.0,,,3.8,128.0,600.0,,,,,


In [23]:
df.isnull().sum()


Unnamed: 0.1              0
Unnamed: 0                0
restaurant name           0
restaurant type           0
rate (out of 5)          68
num of ratings            0
avg cost (two people)    57
online_order              0
table booking             0
cuisines type             0
area                      0
local address             0
dtype: int64

In [24]:
df.duplicated().sum()


0

## Initial Data Observations
The dataset contains 7,105 restaurant records from Bangalore.

Two unnamed index columns are present and need to be removed.

Ratings and average cost columns have a small number of missing values.

The number of ratings is highly skewed, indicating a few very popular restaurants.

No direct sales or revenue data is available; business metrics will be derived analytically.


In [25]:
df = df.drop(columns=['Unnamed: 0', 'Unnamed: 0.1'])


In [26]:
df.columns = (
    df.columns
    .str.strip()
    .str.lower()
    .str.replace(' ', '_')
    .str.replace('(', '', regex=False)
    .str.replace(')', '', regex=False)
)


In [27]:
df.columns


Index(['restaurant_name', 'restaurant_type', 'rate_out_of_5', 'num_of_ratings',
       'avg_cost_two_people', 'online_order', 'table_booking', 'cuisines_type',
       'area', 'local_address'],
      dtype='object')

In [28]:
df.isnull().sum()


restaurant_name         0
restaurant_type         0
rate_out_of_5          68
num_of_ratings          0
avg_cost_two_people    57
online_order            0
table_booking           0
cuisines_type           0
area                    0
local_address           0
dtype: int64

In [29]:
df = df.dropna(subset=['rate_out_of_5'])


In [30]:
median_cost = df['avg_cost_two_people'].median()
df['avg_cost_two_people'] = df['avg_cost_two_people'].fillna(median_cost)


In [31]:
df.isnull().sum()


restaurant_name        0
restaurant_type        0
rate_out_of_5          0
num_of_ratings         0
avg_cost_two_people    0
online_order           0
table_booking          0
cuisines_type          0
area                   0
local_address          0
dtype: int64

In [32]:
df.shape


(7037, 10)

## Data Cleaning Summary
- Removed unnecessary index columns.
- Standardized column names for consistency.
- Dropped records with missing ratings.
- Imputed missing average cost values using median.
- Dataset is now clean and ready for feature engineering.


In [34]:
df.to_csv(
    r'C:\Users\swath\OneDrive\Desktop\github project\data\processed\zomato_cleaned.csv',
    index=False
)
