![](https://media.giphy.com/media/7JzHsh3UTip20/giphy.gif)

## What is in this Kernel ??

* [Overview of the data](#1)
* [Preprocessing](#2)
    * [Removing Duplicates](#3)
* [Exploratory Data Analysis](#4)
    * [Top 10 Restaurants in Ahmedabad & Gandhinagar (in #Outlets)](#5)
    * [Types of Restaurants in Ahmedabad & Gandhinagar](#6)
    * [Number of Restaurants in each area](#7)
    * [Most comman Cuisines in Ahmedabad](#8)
    * [Most comman cuisines in Gandhinagar](#9)
    * [Average Cost for two persons](#10)
    * [Average Cost for each Establishment type](#11)
    * [Average cost per locality](#12)
    * [Top Highlights of Restaurants](#13)
    * [Digital Payments](#14)
    * [Delivery service](#15)
    * [Aggregate rating, Rating text, Votes & Photo count Analysis](#16)
    * [Which are the top Restaurants in Ahmedabad?](#17)
        * [Top Quick Bites Restaurants in Ahmedabad](#18)
        * [Top Casual Dining Restaurants in Ahmedabad](#19)
        * [Top Cafes in Ahmedabad](#20)
        * [Top Dessert Parlours in Ahmedabad](#21)
    * [Which are the top Restaurants in Gandhinagar?](#22)
        * [Top Quick Bites Restaurants in Gandhinagar](#23)
        * [Top Food Courts in Gandhinagar](#24)
        * [Top Sweet shops in Gandhinagar](#25)
    * [Locality & Cuisine wise Top Restaurants](#26)
        * [Top North Indian Restaurants in Bodakdev](#27)
        * [Top Fast Food Restaurants in Satellite](#28)
    

#### Please Upvote ✌! If you like my work!!

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

# [Overview of the data]()<a id="1"></a><br>

In [None]:
data = pd.read_csv('../input/zomato_restaurants_in_India.csv')

In [None]:
# We only want data of Ahmedabad & Gandhinagar
data = data[(data.city=="Ahmedabad") | (data.city=="Gandhinagar")]
data.shape

In [None]:
data.head()

# [Preprocessing]()<a id="2"></a><br>

## [Removing duplicate entries]()<a id="3"></a><br>

In [None]:
# Checking for redundant data
data["res_id"].nunique()

We can see that most of data is redundant 👯‍♂️  
Our next step will be to remove this data redundancy 

In [None]:
data.drop_duplicates(["res_id"],keep='first',inplace=True)
data.shape

In [None]:
data.set_index("res_id",inplace=True)

In [None]:
data.info()

# [Exploratory Data Analysis]()<a id="4"></a><br>

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
sns.countplot(x='city',data=data)
plt.title("Restaurants count")

Number of Restaurants in Gandhinagar is less than Ahmedabad

## [Top 10 Restaurants in Ahmedabad & Gandhinagar (in #Outlets)]()<a id="5"></a><br>

In [None]:
#plt.figure(figsize=(35,15))
#plt.figure(figsize=(8,8))
sns.barplot(y=(data[data["city"]=="Ahmedabad"]["name"].value_counts()).index[:10],x=(data[data["city"]=="Ahmedabad"]["name"].value_counts()).values[:10])
plt.title("Top 10 Restaurants of Ahmedabad",fontsize=10)

* **Domino's Pizza** is leading with number of outlets in Ahmedabad  
We will analyze this in more details later

In [None]:
sns.barplot(y=(data[data["city"]=="Gandhinagar"]["name"].value_counts()).index[:10],x=(data[data["city"]=="Gandhinagar"]["name"].value_counts()).values[:10])
plt.title("Top 10 Restaurants of Gandhinagar",fontsize=10)

* **Kabhi B,Domino's Pizza,Radhe Sweets,Vipul Dudhiya** sweets are leading in number of outlets in Gandhinagar  
we will analyze this in more details later

## [Types of Restaurants in Ahmedabad & Gandhinagar]()<a id="6"></a><br>

In [None]:
Ahmedabad = data[data["city"]=="Ahmedabad"]
Gandhinagar = data[data["city"]=="Gandhinagar"]

plt.figure(figsize=(20,8))
plt.subplot(1,2,1)
sns.barplot(y=Ahmedabad["establishment"].value_counts().index,x=Ahmedabad["establishment"].value_counts().values)
plt.title("Establishment Counts (Ahmedabad)")

plt.subplot(1,2,2)
sns.barplot(y=Gandhinagar["establishment"].value_counts().index,x=Gandhinagar["establishment"].value_counts().values)
plt.title("Establishment Counts (Gandhinagar)")

* Most of the Restaurants in both the cities are of type **"Quick Bites"**  
* Ahmedabad has 240+ restaurants for **Casual Dining**
* Nearly **20 Food courts** in Gandhinagar
We will analyze this in more details later

## [Number of Restaurants in each area]()<a id="7"></a><br>

In [None]:
# We only want the area (Ex: not "SBR Social, Bodakdev" but Bodakdev)
test = Ahmedabad.copy()
for i in test.index:
    test.loc[i,"locality"] = str(test.loc[i,"locality"]).split(', ')[-1] if str(test.loc[i,"locality"]).split(', ')[-1] != 'Gandhinagar' else str(test.loc[i,"locality"]).split(', ')[-2]
test["locality"].value_counts().index

In [None]:
Ahmedabad = test.copy()
#Gandhinagar_new = FetchArea(Gandhinagar).copy()

In [None]:
test = Gandhinagar.copy()
for i in test.index:
    test.loc[i,"locality"] = str(test.loc[i,"locality"]).split(', ')[-1] if str(test.loc[i,"locality"]).split(', ')[-1] != 'Gandhinagar' else str(test.loc[i,"locality"]).split(', ')[-2]
test["locality"].value_counts().index

In [None]:
Gandhinagar = test.copy()

In [None]:
plt.figure(figsize=(30,20))
plt.subplot(1,2,1)
sns.barplot(x=Ahmedabad['locality'].value_counts().values,y=Ahmedabad['locality'].value_counts().index)
plt.title("#Restaurants in each Area (Ahmedabad)")

plt.subplot(1,2,2)
sns.barplot(x=Gandhinagar['locality'].value_counts().values,y=Gandhinagar['locality'].value_counts().index)
plt.title("#Restaurants in each Area (Gandhinagar)")

* **Bodakdev** in Ahmedabad has 160+ Restaurants (Food lovers spot)
* In Gandhinagar **Airport Gandhinagar Highway** has more number of restaurants

## [Most comman Cuisines in Ahmedabad]()<a id="8"></a><br>

In [None]:
Ahmedabad['cuisines'].value_counts()[:10]

In [None]:
# To see which cuisines are higher in counts, we will make a map to count values
test = Ahmedabad.copy()
Cuisines_Count = {}
for cuisines in test['cuisines']:
    for c in str(cuisines).split(', '):
        if c in Cuisines_Count:
            Cuisines_Count[c] = Cuisines_Count[c] + 1
        else:
            Cuisines_Count[c] = 1

In [None]:
sortedC = sorted(Cuisines_Count.items(),key=lambda kv:kv[1])[::-1]
import collections
Cuisines_Count = collections.OrderedDict(sortedC)
# Reference : https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value

In [None]:
Cuisines_Count_A = Cuisines_Count
plt.figure(figsize=(7,7))
sns.barplot(y=[str(x) for x in Cuisines_Count.keys()][:10],x=[int(x) for x in Cuisines_Count.values()][:10])
plt.title("10 most comman Cuisines in Ahmedabad")

* **North Indian** & **Fast food** are more famous in Ahmedabad

## [Most comman cuisines in Gandhinagar]()<a id="9"></a><br>

In [None]:
test = Gandhinagar.copy()
Cuisines_Count = {}
for cuisines in test['cuisines']:
    for c in str(cuisines).split(', '):
        if c in Cuisines_Count:
            Cuisines_Count[c] = Cuisines_Count[c] + 1
        else:
            Cuisines_Count[c] = 1
            
sortedC = sorted(Cuisines_Count.items(),key=lambda kv:kv[1])[::-1]
import collections
Cuisines_Count = collections.OrderedDict(sortedC)

Cuisines_Count_G = Cuisines_Count
plt.figure(figsize=(7,7))
sns.barplot(y=[str(x) for x in Cuisines_Count.keys()][:10],x=[int(x) for x in Cuisines_Count.values()][:10])
plt.title("10 most comman Cuisines in Gandhinagar")

* Similar case in Gandhinagar : **North Indian & Fast food** are most famous cuisines  

## [Average Cost for two persons]()<a id="10"></a><b>

In [None]:
data["average_cost_for_two"].describe()

Here min average cost for two is - 0  
which is not possible! It means that that values are missing  

We will assign this restaurants "average cost for two" as per the area and establishment

In [None]:
ind = data[data["average_cost_for_two"]==0].index.values
lis = []
for i in ind:
    if [data.loc[i,"establishment"],data.loc[i,"locality"]] not in lis:
        lis.append([data.loc[i,"establishment"],data.loc[i,"locality"]])
lis

In [None]:
import math
avg_cost = []
for [e,l] in lis:
    length = data[(data["establishment"]==e) & (data["locality"]==l)].shape[0]
    avg_cost.append(math.ceil(data[(data["establishment"]==e) & (data["locality"]==l) & (data["average_cost_for_two"]!=0)]["average_cost_for_two"].mean()*length/(length-1)))
avg_cost

In [None]:
d = {}
for ([i,j],c) in zip(lis,avg_cost):
    d[i] = c
d

In [None]:
for i in data[data["average_cost_for_two"]==0].index.values:
    est = data.loc[i,"establishment"]
    data.loc[i,"average_cost_for_two"] = d[est]

In [None]:
plt.figure(figsize=(16,8))
plt.subplot(1,2,1)
sns.distplot(data[data["city"]=="Gandhinagar"]["average_cost_for_two"],rug=True)
plt.title("Average cost for 2 (Gandhinagar)")

plt.subplot(1,2,2)
sns.distplot(data[data["city"]=="Ahmedabad"]["average_cost_for_two"],rug=True)
plt.title("Average cost for 2 (Ahmedabad)")

* Most of the restaurants have **~250** average cost for two persons in both of the cities 

##  [Average Cost for each Establishment type]()<a id="11"></a><br>

In [None]:
#ahm = Ahmedabad.copy()
plt.figure(figsize=(25,10))
plt.subplot(1,2,1)
sns.barplot(y="establishment",x="average_cost_for_two",data=Ahmedabad)
plt.title("Average cost for two per Establishment (Ahmedabad)")

#gndh = data[data["city"]=="Gandhinagar"]
plt.subplot(1,2,2)
sns.barplot(y="establishment",x="average_cost_for_two",data=Gandhinagar)
plt.title("Average cost for two per Establishment (Gandhinagar)")

* **Fine Dining** costs **1750+** in Ahmedabad where in Gandhinagar it costs **1300+**

## [Average cost per locality]()<a id="12"></a><br>

In [None]:
# Finding Locality wise average cost in Ahmedabad
localityXAvgCost_A = {}
for loc in Ahmedabad["locality"].unique():
    localityXAvgCost_A[loc] = Ahmedabad[Ahmedabad["locality"]==loc]["average_cost_for_two"].mean()
# Sorting Dictionary
sorted_LA_A = sorted(localityXAvgCost_A.items(),key=lambda kv:kv[1])[::-1]
localityXAvgCost_A = collections.OrderedDict(sorted_LA_A)

plt.figure(figsize=(15,15))
sns.barplot(y=[str(x) for x in localityXAvgCost_A.keys()],x=[int(x) for x in localityXAvgCost_A.values()])
plt.title("Average cost per Locality (Ahmedabad)")

* **Ashram Road** in Ahmedabad has **highest(1100+)** Average cost for two persons
* Where **Bodakdev** with most number of Restaurants has Average cost of **600**

In [None]:
#plt.figure(figsize=(8,8))
sns.barplot(y="locality",x="average_cost_for_two",data=Gandhinagar)
plt.title("Average cost per Locality (Gandhinagar)")

* **Sector 25** with only one Restaurant has highest cost of **~1100**
* **Airport Gandhinagar** with most number of restaurants has cost of **~450**

## [Top Highlights of Restaurants]()<a id="13"></a><br>

In [None]:
# Plots top 20(default) highlights of a City
def Highlights(data,city,top=20):    
    highlight_Counts = {}
    for highlights in data["highlights"]:
        for highlight in str(highlights).split(', '):
            if highlight[0]=='[':
                highlight = highlight[1:]
            if highlight[-1]==']':
                highlight = highlight[:-1]

            if highlight in highlight_Counts:
                highlight_Counts[highlight] += 1
            else:
                highlight_Counts[highlight] = 1

    sorted_highlights_a = sorted(highlight_Counts.items(),key= lambda x : x[1])[::-1]
    highlight_Counts = collections.OrderedDict(sorted_highlights_a)

    plt.figure(figsize=(10,10))
    sns.barplot(y=[str(x) for x in highlight_Counts.keys()][:top],x=[int(x) for x in highlight_Counts.values()][:top])
    plt.title("Top " +str(top) + " Highlights of Restaurants in " + city)
    
    return highlight_Counts

In [None]:
highlights_Ahmedabad = Highlights(Ahmedabad,"Ahmedabad")

* Not a strange thing that **Cash payment** is on top
* **Takeaway** is available in most of the restaurants
* Number of **Pure Veg** Restaurants is very large (As most of the people in the city are pure veg)
* We will explore Delivery, Digital Payment and other highlights more

In [None]:
highlights_Gandhinagar = Highlights(Gandhinagar,"Gandhinagar")

* Same results as Ahmedabad

## [Digital Payments]()<a id="14"></a><br>

As people are moving towards cash less system, let's see how many Restaurants accepts digital payments  
It could be credit card, debit card or any other digital payment method

In [None]:
total_res_A = Ahmedabad.shape[0]
Digital_Payments_A = 0 

for highlights in Ahmedabad["highlights"]:
    if "'Digital Payments Accepted'" in str(highlights).split(', '):
        Digital_Payments_A +=1
    elif "'Credit Card'" in str(highlights).split(', '):
        Digital_Payments_A +=1
    elif "'Debit Card'" in str(highlights).split(', '):
        Digital_Payments_A +=1
    elif "'Sodexo'" in str(highlights).split(', '):
        Digital_Payments_A +=1


# Data to plot
labels = ["Yes","No"]
sizes = [Digital_Payments_A,total_res_A-Digital_Payments_A]

# Plot
plt.figure(figsize=(8,8))
plt.pie(sizes,labels=labels,startangle=90,autopct='%.1f%%',colors=["red","yellow"],wedgeprops={ 'linewidth' : 3,'edgecolor' : "black" })
plt.title("Digital Payment (Ahmedabad)")

* Here we can more than 60% of the Restaurants have Digital Payment option in Ahmedabad

In [None]:
total_res_G = Gandhinagar.shape[0]
Digital_Payments_G = 0 

for highlights in Gandhinagar["highlights"]:
    if "'Digital Payments Accepted'" in str(highlights).split(', '):
        Digital_Payments_G +=1
    elif "'Credit Card'" in str(highlights).split(', '):
        Digital_Payments_G +=1
    elif "'Debit Card'" in str(highlights).split(', '):
        Digital_Payments_G +=1
    elif "'Sodexo'" in str(highlights).split(', '):
        Digital_Payments_G +=1


# Data to plot
labels = ["Yes","No"]
sizes = [Digital_Payments_G,total_res_G-Digital_Payments_G]

# Plot
plt.figure(figsize=(8,8))
plt.pie(sizes,labels=labels,shadow=True,startangle=90,autopct='%.1f%%',wedgeprops={ 'linewidth' : 3,'edgecolor' : "black" })
plt.title("Digital Payment (Gandhinagar)")

* Gandhinagar has 74% restaurants with Digital Payment mode

## [Delivery service]()<a id="15"></a><br>

In [None]:
plt.figure(figsize=(15,7))
plt.subplot(1,2,1)
plt.pie([highlights_Ahmedabad["'Delivery'"],total_res_A-highlights_Ahmedabad["'Delivery'"]],
       startangle=90,autopct='%.1f%%',
       labels=["Yes","No"])
plt.title("Delivery (Ahmedabad)")

plt.subplot(1,2,2)
plt.pie([highlights_Gandhinagar["'Delivery'"],total_res_G-highlights_Gandhinagar["'Delivery'"]],
       startangle=90,autopct='%.1f%%',
       labels=["Yes","No"])
plt.title("Delivery (Gandhinagar)")

## [Aggregate rating, Rating text, Votes & Photo count Analysis]()<a id="16"></a><br>

In [None]:
# Ahmedabad & Gandhinar ratings distributions
plt.figure(figsize=(20,40))

plt.subplot(4,2,1)
sns.distplot(Ahmedabad["aggregate_rating"])
plt.title("Ahmedabad aggregate ratings")

plt.subplot(4,2,2)
sns.distplot(Gandhinagar["aggregate_rating"])
plt.title("Gandhinagar aggregate ratings")

plt.subplot(4,2,3)
sns.countplot(Ahmedabad["rating_text"])
plt.title("Ahmedabad rating text counts")
plt.xticks(rotation=45)

plt.subplot(4,2,4)
sns.countplot(Gandhinagar["rating_text"])
plt.title("Gandhinagar rating text counts")

plt.subplot(4,2,5)
sns.distplot(Ahmedabad["votes"])
plt.title("Ahmedabad #votes distribution")

plt.subplot(4,2,6)
sns.distplot(Gandhinagar["votes"])
plt.title("Gandhinagar #votes distribution")

plt.subplot(4,2,7)
sns.distplot(Ahmedabad["photo_count"])
plt.title("Ahmedabad #Photos distribution")

plt.subplot(4,2,8)
sns.distplot(Gandhinagar["photo_count"])
plt.title("Gandhinagar #Photos distribution")

* Most of ratings are around **3.5-4** in most of the restaurants in **both of the cities**
* Number of Restaurants with **"Good" & "Very Good"** ratings are more in **Ahmedabad**
* Where in Gandhinagar **"Average"** rating restaurants are more
* **Votes & Photo** count distributions are **skewed** towards **"0"**

## [Which are the top Restaurants in Ahmedabad?]()<a id="17"></a><br>

#### I have created a Function to find and plot top Restaurants 

In [None]:
def BestRestaurants(city,esta):
    data = city[city["establishment"]==esta]["name"].value_counts()[:10]
    ratings = {}
    #print(data.index.values)
    for name in data.index.values:
        ratings[name] = city[(city["establishment"]==esta) & (city["name"]==name)]["aggregate_rating"].mean()
    #print(ratings)
    sorted_ratings = sorted(ratings.items(),key=lambda x:x[1])[::-1]
    ratings = collections.OrderedDict(sorted_ratings)
    #print(ratings)
    plt.figure(figsize=(8,8))
    sns.barplot(y=[str(x) for x in ratings.keys()],x=[float(x) for x in ratings.values()])
    plt.title("Top 10 " + esta + " Restaurants in " + str(city["city"].values[0]))
    plt.xlabel("Average Ratings")
    plt.ylabel("Restaurant Names")

### [Top Quick Bites Restaurants in Ahmedabad]()<a id="18"></a><br> 

In [None]:
BestRestaurants(Ahmedabad,"['Quick Bites']")

* **La Pino'z Pizza,  New Freez land & Shakti - The Sandwich Shop** are top 3 Quick Bites Restaurants in Ahmedabad

### [Top Casual Dining Restaurants in Ahmedabad]()<a id="19"></a><br>

In [None]:
BestRestaurants(Ahmedabad,"['Casual Dining']")

### [Top Cafes in Ahmedabad]()<a id="20"></a><br> 

In [None]:
BestRestaurants(Ahmedabad,"['Café']")

### [Top Dessert Parlours in Ahmedabad]()<a id="21"></a><br> 

In [None]:
BestRestaurants(Ahmedabad,"['Dessert Parlour']")

## [Which are the top Restaurants in Gandhinagar?]()<a id="22"></a><br>

### [Top Quick Bites Restaurants in Gandhinagar]()<a id="23"></a><br>

In [None]:
BestRestaurants(Gandhinagar,"['Quick Bites']")

### [Top Food Courts in Gandhinagar]()<a id="24"></a><br>

In [None]:
BestRestaurants(Gandhinagar,"['Food Court']")

### [Top Sweet shops in Gandhinagar]()<a id="25"></a><br>

In [None]:
BestRestaurants(Gandhinagar,"['Sweet Shop']")

## [Locality & Cuisine wise Top Restaurants]()<a id="26"></a><br>

#### I have developed a function to do this!!

In [None]:
from statistics import mean
def BestRestaurantsLocal(city,locality,cuisine):
    data = city[city["locality"]==locality]
    ratings = {}
    #print(data.index.values)
    for i in data.index.values:
        if cuisine in str(data.loc[i,"cuisines"]).split(', '):
            if data.loc[i,"name"] not in ratings:
                ratings[data.loc[i,"name"]] = [data.loc[i,"aggregate_rating"]]
            else:
                ratings[data.loc[i,"name"]].append(data.loc[i,"aggregate_rating"])
    for r in ratings.keys():
        ratings[str(r)] = mean(ratings[str(r)])
    sorted_ratings = sorted(ratings.items(),key=lambda x:x[1])[::-1]
    ratings = collections.OrderedDict(sorted_ratings)
    #print(ratings)
    plt.figure(figsize=(8,8))
    if len(ratings)>10:
        sns.barplot(y=[str(x) for x in ratings.keys()][:10],x=[float(x) for x in ratings.values()][:10])
        #plt.title("Top 10 " + cuisine + " Restaurants in " + locality)
        #plt.xlabel("Average Ratings")
        #plt.ylabel("Restaurant Names")
    else:
        sns.barplot(y=[str(x) for x in ratings.keys()],x=[float(x) for x in ratings.values()])
    plt.title("Top 10 " + cuisine + " Restaurants in " + locality)
    plt.xlabel("Average Ratings")
    plt.ylabel("Restaurant Names")

### [Top North Indian Restaurants in Bodakdev]()<a id="27"></a><br> 

In [None]:
BestRestaurantsLocal(Ahmedabad,'Bodakdev','North Indian')

### [Top Fast Food Restaurants in Satellite]()<a id="28"></a><br>

In [None]:
BestRestaurantsLocal(Ahmedabad,'Satellite','Fast Food')