<h1 style="text-align: center;">Swiggy Data Analysis</h1>


<h3>Overview of the Project:</h3>

The Swiggy Data Analysis project aims to explore and derive insights from Swiggy's dataset, which likely contains information about restaurant listings, delivery times, city, avg ratings, total ratings count, cuisines, cost for two, address, locality and veg/non-Veg among other features. The primary goal is to use this data for in-depth analysis, focusing on identifying patterns, trends, and performance indicators that could benefit stakeholders, such as restaurants, delivery partners, or the Swiggy platform itself.

<h5>Let's Dive Into Analysis</h5>

In [2]:
#importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re                               # regex pattern matching

import warnings
warnings.filterwarnings("ignore")       # Ignoring warnings

import os                               # Allows interaction with the operating system


<h5>Overview of the Data:</h5>

In [3]:
# loading the data and converting it into a dataframe
data_set = pd.read_csv('Cleaned Swiggy_dataset - Swiggy_dataset.csv.csv')
data_df = pd.DataFrame(data_set)


In [4]:
# seeing the topview of the data
data_df.head()

Unnamed: 0,Type,ID,Name,Uuid,City,Type.1,Avg Ratings,Total Ratings Count,Cuisines,Cost for two,Delivery Time,Min Delivery Time,Max Delivery Time,Address,Locality,Veg/Non-Veg
0,F,37933,Faasos - Wraps & Rolls,6fe9caf1-02a7-4e66-83bb-1b4ff296b683,Ahmedabad,Vastrapur,4.2,500+ ratings,['Combo' 'Snacks' 'Beverages' 'Desserts' 'Indi...,₹200 FOR TWO,44,44,44,Shop No 2 Hotel Shahi Palace Vastrapur Lake Ah...,Hotel Shahi Palace,Non-Veg
1,F,81814,Burger King,10083576-d32d-4a0e-8a82-3236ef342a19,Ahmedabad,Ellisbridge,4.2,1000+ ratings,['American' 'Fast Food'],₹350 FOR TWO,33,33,33,Shop # 5 Gr Flr Third Eye 2Panchavati Circle O...,3Rd Eye Ii Ellis Bridge Cg Road,Non-Veg
2,F,107046,Mahalaxmi Pav Bhaji,fbfe3bfa-03fd-4708-b913-06e2c9ee9639,Ahmedabad,Ellisbridge,4.3,500+ ratings,['North Indian'],₹200 FOR TWO,28,28,28,Gf 9/10 Dev Complexnear Parimal Charasta Opp. ...,C G Road,Veg
3,F,108879,Jay Jalaram Parotha House,994e73e1-7c7c-4ad9-87e7-ecfecd4e36fa,Ahmedabad,Ellisbridge,3.9,100+ ratings,['North Indian'],₹250 FOR TWO,29,29,29,Capital Commercial Centre Near Uco Bank Ashram...,Ashram Road,Veg
4,F,140314,Jalaram Parotha House,e6ba1c33-66b6-45a2-8860-6ccde126da9a,Ahmedabad,Paldi,3.9,20+ ratings,['North Indian' 'Gujarati'],₹450 FOR TWO,28,28,28,1214 Sahjanand Trade Centre Opposite Kothawala...,Paldi,Veg


In [5]:
# checking the data info
data_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5412 entries, 0 to 5411
Data columns (total 16 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Type                 5412 non-null   object 
 1   ID                   5412 non-null   int64  
 2   Name                 5412 non-null   object 
 3   Uuid                 5409 non-null   object 
 4   City                 5412 non-null   object 
 5   Type.1               5412 non-null   object 
 6   Avg Ratings          5412 non-null   float64
 7   Total Ratings Count  5412 non-null   object 
 8   Cuisines             5412 non-null   object 
 9   Cost for two         5412 non-null   object 
 10  Delivery Time        5412 non-null   int64  
 11  Min Delivery Time    5412 non-null   int64  
 12  Max Delivery Time    5412 non-null   int64  
 13  Address              5408 non-null   object 
 14  Locality             5408 non-null   object 
 15  Veg/Non-Veg          5410 non-null   o

In [6]:
# checking the shape of the data
print("This data has", data_df.shape[0],"rows and",data_df.shape[1],"columns")

This data has 5412 rows and 16 columns


In [7]:
# checking for missing values
print("Total values in the dataset\n",data_df.count())
print()
print("Missing values in the dataset")
print(data_df.isnull().sum())
print("-"*30)

Total values in the dataset
 Type                   5412
ID                     5412
Name                   5412
Uuid                   5409
City                   5412
Type.1                 5412
Avg Ratings            5412
Total Ratings Count    5412
Cuisines               5412
Cost for two           5412
Delivery Time          5412
Min Delivery Time      5412
Max Delivery Time      5412
Address                5408
Locality               5408
Veg/Non-Veg            5410
dtype: int64

Missing values in the dataset
Type                   0
ID                     0
Name                   0
Uuid                   3
City                   0
Type.1                 0
Avg Ratings            0
Total Ratings Count    0
Cuisines               0
Cost for two           0
Delivery Time          0
Min Delivery Time      0
Max Delivery Time      0
Address                4
Locality               4
Veg/Non-Veg            2
dtype: int64
------------------------------


In [8]:
# we are going to drop the null values
data_df.dropna(inplace=True)

In [9]:
# Now we are going to drop the unnecessary columns regarding analysis
print("Total values after filtering the data")
data_df.drop(['Type','ID','Uuid','Type.1','Min Delivery Time','Max Delivery Time'],axis=1,inplace=True)
data_df.head()


Total values after filtering the data


Unnamed: 0,Name,City,Avg Ratings,Total Ratings Count,Cuisines,Cost for two,Delivery Time,Address,Locality,Veg/Non-Veg
0,Faasos - Wraps & Rolls,Ahmedabad,4.2,500+ ratings,['Combo' 'Snacks' 'Beverages' 'Desserts' 'Indi...,₹200 FOR TWO,44,Shop No 2 Hotel Shahi Palace Vastrapur Lake Ah...,Hotel Shahi Palace,Non-Veg
1,Burger King,Ahmedabad,4.2,1000+ ratings,['American' 'Fast Food'],₹350 FOR TWO,33,Shop # 5 Gr Flr Third Eye 2Panchavati Circle O...,3Rd Eye Ii Ellis Bridge Cg Road,Non-Veg
2,Mahalaxmi Pav Bhaji,Ahmedabad,4.3,500+ ratings,['North Indian'],₹200 FOR TWO,28,Gf 9/10 Dev Complexnear Parimal Charasta Opp. ...,C G Road,Veg
3,Jay Jalaram Parotha House,Ahmedabad,3.9,100+ ratings,['North Indian'],₹250 FOR TWO,29,Capital Commercial Centre Near Uco Bank Ashram...,Ashram Road,Veg
4,Jalaram Parotha House,Ahmedabad,3.9,20+ ratings,['North Indian' 'Gujarati'],₹450 FOR TWO,28,1214 Sahjanand Trade Centre Opposite Kothawala...,Paldi,Veg


In [10]:
# Dropping values if the data has any duplicate data
data_df.drop_duplicates(inplace=True)

In [11]:
# Renaming columns
data_df.rename(columns={'Name':'Restaurant Name','Avg Ratings':'Avg Rating'},inplace=True)

In [12]:
# Checking the new cleaned data
data_df.head()

Unnamed: 0,Restaurant Name,City,Avg Rating,Total Ratings Count,Cuisines,Cost for two,Delivery Time,Address,Locality,Veg/Non-Veg
0,Faasos - Wraps & Rolls,Ahmedabad,4.2,500+ ratings,['Combo' 'Snacks' 'Beverages' 'Desserts' 'Indi...,₹200 FOR TWO,44,Shop No 2 Hotel Shahi Palace Vastrapur Lake Ah...,Hotel Shahi Palace,Non-Veg
1,Burger King,Ahmedabad,4.2,1000+ ratings,['American' 'Fast Food'],₹350 FOR TWO,33,Shop # 5 Gr Flr Third Eye 2Panchavati Circle O...,3Rd Eye Ii Ellis Bridge Cg Road,Non-Veg
2,Mahalaxmi Pav Bhaji,Ahmedabad,4.3,500+ ratings,['North Indian'],₹200 FOR TWO,28,Gf 9/10 Dev Complexnear Parimal Charasta Opp. ...,C G Road,Veg
3,Jay Jalaram Parotha House,Ahmedabad,3.9,100+ ratings,['North Indian'],₹250 FOR TWO,29,Capital Commercial Centre Near Uco Bank Ashram...,Ashram Road,Veg
4,Jalaram Parotha House,Ahmedabad,3.9,20+ ratings,['North Indian' 'Gujarati'],₹450 FOR TWO,28,1214 Sahjanand Trade Centre Opposite Kothawala...,Paldi,Veg


In [13]:
# Groupping the restaurants
data_df.groupby('Restaurant Name').count()

Unnamed: 0_level_0,City,Avg Rating,Total Ratings Count,Cuisines,Cost for two,Delivery Time,Address,Locality,Veg/Non-Veg
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
10 Downing Street,1,1,1,1,1,1,1,1,1
10 Downing street,1,1,1,1,1,1,1,1,1
1131 BAR+KITCHEN,1,1,1,1,1,1,1,1,1
1441 Pizzeria,5,5,5,5,5,5,5,5,5
154 Breakfast Club,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...
star santosh Dhaba,1,1,1,1,1,1,1,1,1
swad bodol,1,1,1,1,1,1,1,1,1
temptationz cakes & more,1,1,1,1,1,1,1,1,1
wow chinese,1,1,1,1,1,1,1,1,1


<p>As we can see '10 Downing Street' and '10 Downing street' are same but they are present in different rows so we are going to replace 'street' with 'Street'</p>

In [16]:
# Replacing the street with Street using pattern matching
data_df['Restaurant Name'] = data_df['Restaurant Name'].str.replace('street','Street',regex=True)

In [19]:
# Again grouping the data
data_df.groupby('Restaurant Name').count()

Unnamed: 0_level_0,City,Avg Rating,Total Ratings Count,Cuisines,Cost for two,Delivery Time,Address,Locality,Veg/Non-Veg
Restaurant Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
10 Downing Street,2,2,2,2,2,2,2,2,2
1131 BAR+KITCHEN,1,1,1,1,1,1,1,1,1
1441 Pizzeria,5,5,5,5,5,5,5,5,5
154 Breakfast Club,1,1,1,1,1,1,1,1,1
1944 The Hocco Kitchen,1,1,1,1,1,1,1,1,1
...,...,...,...,...,...,...,...,...,...
star santosh Dhaba,1,1,1,1,1,1,1,1,1
swad bodol,1,1,1,1,1,1,1,1,1
temptationz cakes & more,1,1,1,1,1,1,1,1,1
wow chinese,1,1,1,1,1,1,1,1,1


In [30]:
# Checking the average rating of restaurant and their type

data_df['Avg Rating'].unique(),data_df['Avg Rating'].dtype

(array([4.2, 4.3, 3.9, 4.1, 4.4, 3.7, 4. , 4.5, 3.8, 3. , 4.6, 2.8, 3.3,
        4.8, 3.5, 2.9, 2.4, 2.6, 3.2, 3.6, 4.7, 3.4, 2.7, 3.1, 4.9, 5. ,
        2.5, 2.2, 2.3, 2. ]),
 dtype('float64'))

In [51]:
# Checking if their is any null value in the Total Ratings column

data_df['Total Ratings Count'].isnull().value_counts(),data_df['Total Ratings Count'].dtype

(Total Ratings Count
 False    5400
 Name: count, dtype: int64,
 dtype('O'))

In [53]:
# Convert the dtype of the 'Total Ratings Count' column
data_df['Total Ratings Count'].astype('float')

ValueError: could not convert string to float: '500+ ratings'

In [35]:
# Checking the unique value for locality

data_df['Locality'].unique()

array(['Hotel Shahi Palace', '3Rd Eye Ii Ellis Bridge Cg Road',
       'C G Road', ..., 'Sudama Chowk', 'Jahangirpura', 'Mota Varachha'],
      dtype=object)

In [41]:
# Checking the unique values and null values for cuisines
data_df['Cuisines'].unique(), data_df['Cuisines'].isnull().value_counts()

(array(["['Combo' 'Snacks' 'Beverages' 'Desserts' 'Indian']",
        "['American' 'Fast Food']", "['North Indian']", ...,
        "['Fast Food' 'Chinese' 'Italian' 'Beverages' 'Desserts']",
        "['Indian' 'Continental' 'Chinese']",
        "['Indian' 'Chinese' 'South Indian' 'Punjabi' 'Fast Food']"],
       dtype=object),
 Cuisines
 False    5400
 Name: count, dtype: int64)

As we can see there is no error in 'Locality' and 'Cuisines' column so their is no need to perform any cleaning operation

In [44]:
# Column contains approximate cost for two people.
# Checking if the column contains any duplicate or null value

data_df['Cost for two'].unique(), data_df['Cost for two'].isnull().value_counts()

(array(['₹200 FOR TWO', '₹350 FOR TWO', '₹250 FOR TWO', '₹450 FOR TWO',
        '₹500 FOR TWO', '₹400 FOR TWO', '₹300 FOR TWO', '₹700 FOR TWO',
        '₹600 FOR TWO', '₹800 FOR TWO', '₹900 FOR TWO', '₹199 FOR TWO',
        '₹1200 FOR TWO', '₹150 FOR TWO', '₹550 FOR TWO', '₹0 FOR TWO',
        '₹100 FOR TWO', '₹180 FOR TWO', '₹750 FOR TWO', '₹650 FOR TWO',
        '₹120 FOR TWO', '₹1000 FOR TWO', '₹160 FOR TWO', '₹280 FOR TWO',
        '₹850 FOR TWO', '₹1100 FOR TWO', '₹149 FOR TWO', '₹80 FOR TWO',
        '₹220 FOR TWO', '₹193 FOR TWO', '₹499 FOR TWO', '₹251 FOR TWO',
        '₹299 FOR TWO', '₹126 FOR TWO', '₹99 FOR TWO', '₹260 FOR TWO',
        '₹170 FOR TWO', '₹15 FOR TWO', '₹270 FOR TWO', '₹326 FOR TWO',
        '₹225 FOR TWO', '₹290 FOR TWO', '₹130 FOR TWO', '₹2500 FOR TWO',
        '₹1600 FOR TWO', '₹1500 FOR TWO', '₹275 FOR TWO', '₹330 FOR TWO',
        '₹50 FOR TWO', '₹1300 FOR TWO', '₹599 FOR TWO', '₹197 FOR TWO',
        '₹256 FOR TWO', '₹1250 FOR TWO', '₹375 FOR TWO', '₹140 

In [47]:
# After removing or modifying the rows resetting the index is important

data_df.reset_index(drop=True,inplace=True)

In [48]:
data_df

Unnamed: 0,Restaurant Name,City,Avg Rating,Total Ratings Count,Cuisines,Cost for two,Delivery Time,Address,Locality,Veg/Non-Veg
0,Faasos - Wraps & Rolls,Ahmedabad,4.2,500+ ratings,['Combo' 'Snacks' 'Beverages' 'Desserts' 'Indi...,₹200 FOR TWO,44,Shop No 2 Hotel Shahi Palace Vastrapur Lake Ah...,Hotel Shahi Palace,Non-Veg
1,Burger King,Ahmedabad,4.2,1000+ ratings,['American' 'Fast Food'],₹350 FOR TWO,33,Shop # 5 Gr Flr Third Eye 2Panchavati Circle O...,3Rd Eye Ii Ellis Bridge Cg Road,Non-Veg
2,Mahalaxmi Pav Bhaji,Ahmedabad,4.3,500+ ratings,['North Indian'],₹200 FOR TWO,28,Gf 9/10 Dev Complexnear Parimal Charasta Opp. ...,C G Road,Veg
3,Jay Jalaram Parotha House,Ahmedabad,3.9,100+ ratings,['North Indian'],₹250 FOR TWO,29,Capital Commercial Centre Near Uco Bank Ashram...,Ashram Road,Veg
4,Jalaram Parotha House,Ahmedabad,3.9,20+ ratings,['North Indian' 'Gujarati'],₹450 FOR TWO,28,1214 Sahjanand Trade Centre Opposite Kothawala...,Paldi,Veg
...,...,...,...,...,...,...,...,...,...,...
5395,Sai Pavbhaji & Chinese,Surat,3.9,20+ ratings,['South Indian' 'Fast Food' 'Chinese'],₹200 FOR TWO,65,Shop -7 Palak Residency Cross Road Amroli Kosa...,Sarthana,Non-Veg
5396,Laziz Pizza - Platinum Point,Surat,3.9,50+ ratings,['Pizzas' 'Fast Food'],₹400 FOR TWO,66,G-9 Sunshine Complex Opposite Cng Pump Sudama ...,Sudama Chowk,Veg
5397,Lk Master Non-Veg,Surat,4.2,100+ ratings,['Gujarati'],₹350 FOR TWO,75,Doctor Park Rd Parijat Nagar Jahangir Pura Sur...,Jahangirpura,Non-Veg
5398,SHREE ANNAPOORNA FAMILY RESTAURANT,Surat,4.6,20+ ratings,['Indian' 'Chinese' 'South Indian' 'Punjabi' '...,₹250 FOR TWO,62,Golden City Rd Mota Varachha Surat Gujarat 394...,Mota Varachha,Non-Veg


In [50]:
# Check how many columns we have after after cleaning the data
data_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5400 entries, 0 to 5399
Data columns (total 10 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Restaurant Name      5400 non-null   object 
 1   City                 5400 non-null   object 
 2   Avg Rating           5400 non-null   float64
 3   Total Ratings Count  5400 non-null   object 
 4   Cuisines             5400 non-null   object 
 5   Cost for two         5400 non-null   object 
 6   Delivery Time        5400 non-null   int64  
 7   Address              5400 non-null   object 
 8   Locality             5400 non-null   object 
 9   Veg/Non-Veg          5400 non-null   object 
dtypes: float64(1), int64(1), object(8)
memory usage: 422.0+ KB


In [54]:
data_df.to_csv('./swiggy_cleaned_data.csv')