# Exploratory Data Analysis on Zomato Bangalore Dataset

### The basic idea of analyzing the Zomato dataset is to get a fair idea about the factors affecting the establishment of different types of restaurant at different places in Bengaluru, aggregate rating of each restaurant.

### The first agenda of this project is:

### Perform "Exploratory Data Analysis(EDA)" on the Zomato Dataset.

## Importing libraraies and Data Set

In [107]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import plotly.graph_objs as go
import plotly.offline as py
import warnings 
warnings.filterwarnings('ignore')
plt.style.use('ggplot')

print('Libraries Imported.')

### Importing Zomato Bangalore DataSet

In [108]:
zomato_data = pd.read_csv("../input/zomato-bangalore-dataset/zomato.csv")

In [109]:
zomato_data.head()

## About the features in the dataset :

1. `URL` = contains the URL of the restaurant on the zomato website

2. `address` = contains the address of the restaurant in Bengaluru

3. `name` = contains the name of the restaurant

4. `online_order` = whether online ordering is available in the restaurant or not

5. `book_table` = table book option available or not

6. `rate` = contains the overall rating of the restaurant out of 5

7. `votes` = contains total number of rating for the restaurant as of the above-mentioned date

8. `phone` = contains the phone number of the restaurant

9. `location` = contains the neighborhood in which the restaurant is located

10. `rest_type` = restaurant type

11. `dish_liked` = dishes people liked in the restaurant

12. `cuisines` = food styles, separated by a comma

13. `approx_cost(for two people)` = contains the approximate cost for a meal for two people

14. `reviews_list` = list of tuples containing reviews for the restaurant, each tuple consists of two valuesColumns Description:


## Performing EDA on the dataset to answer some basic questions 

Q: What is the size of data we are dealing with?

In [110]:
 zomato_data.shape

Q: What are the different features in the dataset?

In [111]:
zomato_data.columns

Q: What is the Data type of each feature

In [112]:
zomato_data.dtypes

Q: Are there null values in the dataset? If yes, how many?

In [113]:
zomato_data.isna().sum()

Removing some columns that are not of much use in EDA

In [114]:
df = zomato_data.drop(['url','phone'],axis=1)

Q: Are there duplicated entries of restaurant in the dataset?

In [115]:
df.duplicated().sum()

Removing the Duplicate Restaurants

In [116]:
df.drop_duplicates(inplace=True)
df.duplicated().sum()

Renaming the Columns appropriately

In [117]:
df = df.rename(columns={'approx_cost(for two people)':'Cost', 'listed_in(type)':'Type', 'listed_in(city)':'City'})

## Univariate Analysis

Q: What are the 20 most famous restaurant chains in Bengaluru?

In [118]:
chains=df['name'].value_counts()[:20]
plt.figure(figsize=(20,12))
sns.barplot(x=chains, y=chains.index, palette='Set2')
plt.title("Most 20 famous resturants chains in Bangaluru")
plt.xlabel("Number of Outlets")
plt.ylabel("Restaurant Chains");
for index,value in enumerate(chains):
    plt.text(value+0.5,index,str(value),color='black')

Q: What is the proportion of restaurants that offer table booking option?

In [119]:
sns.countplot(df['book_table'])
fig=plt.gcf()
fig.set_size_inches(6,6)
plt.title('Table Booking Availability')
plt.xlabel('Options')
plt.ylabel('Count');

**Insight:**

**Most Restuarants do not provide online table booking**

Q: What is the proportion of restaurants that offer Online Ordering?

In [120]:
sns.countplot(df['online_order'])
fig=plt.gcf()
fig.set_size_inches(6,6)
plt.title("Online Ordering Availability")
plt.xlabel('Options')
plt.ylabel('Count');

**Insight**

**Most Resturants offer option for online order and delivery**

Q: What is the distribution of `rate` feature?

In [121]:
#cleaning rate feature
df['rate'] = df['rate'].fillna(0)
df['rate']=df['rate'].str.strip('/5').str.strip('NEW').str.strip('-').str.strip('')
df["rate"] = pd.to_numeric(df["rate"], downcast="float")

In [122]:
#plotting rate
plt.figure(figsize=(10,8))
sns.histplot(df['rate'],bins=20,color='orange',kde=True)
plt.xlabel('Rating of Restaurants(out of 5)',fontsize=16)
plt.ylabel('Frequency',fontsize=10)
plt.title('Distribution of Rating of Restaurants(on a scale of 1-5)');

**Insight**

**We can infer from above that most of the ratings are within 3.5 and 4.5**

Q: What is the proportion of different restaurant types in the dataset?

In [163]:
ax=sns.countplot(df['Type'],saturation=1)
fig = plt.gcf()
fig.set_size_inches(8,8)
ax.set_xlabel('Service Type',labelpad=20)
ax.set_ylabel('')
ax.set_xticklabels(ax.get_xticklabels(),rotation=-40,ha='center')
ax.set_title("Distribution of Type of Services");

**Insight**

**The two main types of services are Delivery and Dine Out**

## Bivariate Analysis

Q: What is relationship of Cost Vs. Rating? Does Online Availability explain some of this variation?

In [123]:
#cleaning column
df['Cost']=df['Cost'].str.replace(",","").astype(float)

In [124]:
plt.figure(figsize=(15,7))
sns.scatterplot(x="rate",y="Cost",hue="online_order",data=df)
plt.title("Distribution of the Cost for two Vs. Ratings in Parallel with Online Ordering Availibility")
plt.xlabel("Restaurant Ratings",fontsize=12)
plt.ylabel('Cost for two');

**Insight :**

**Cost seems to be high for a lot of restaurants that do not provide online order which suggests that they must be luxury dine-ins**

Q: What is the relationship between Cost for two and Type of Restaurant in the datatset? 

In [146]:
plt.figure(figsize=(14,10))
sns.histplot(x='Cost',bins=20,kde=True,hue='Type',data=df)
plt.xlim((0,3000))
plt.xlabel('Cost for two')
plt.title('Distribution of Cost for two for each type of restaurant');

In [190]:
plot_data=df.groupby('location').Cost.mean()
plot_data=plot_data.sort_values(ascending=False)
fig, ax = plt.subplots(figsize=(10,20))
ax=sns.barplot(x=plot_data.values,y=plot_data.index)
for index,value in enumerate(plot_data):
    plt.text(value,index,str(round(value)),color='black',fontsize=9)
ax.set_title('Relationship between Location of Restaurant and average Cost for two')
ax.set_xlabel('Average cost for two')
ax.grid(True);

Insight

Most of the restuarants have an approximate cost between 0 and 1000 for two people