# Taste the problems of Zomato


This notebook holds the information about working on sample zomato data to answer questions of stakeholders and generate meaningful solutions to those problems
by identifying the patterns and developing insights.


***Questions to Answer***

- **Do a greater number of restaurants provide online delivery as opposed to offline services?**
- **Which types of restaurants are the most favored by the general public?**
- **What price range is preferred by couples for their dinner at restaurants?**

Import the important libraries to use -

In [None]:
# %pip install numpy pandas seaborn matplotlib

In [10]:
import pandas as pd
import numpy as np 
import seaborn as sns
import matplotlib.pyplot as plt
import prettytable
prettytable.DEFAULT='DEFAULT'

In [12]:
# Lets Collect the data
df=pd.read_csv('D:\\Drive into Analysis\\Zomato Data Analysis\\Zomato-data-.csv')
if df is not None :
    print(df.head(5))
else :
    print('Failed to load the data from the given path')

                    name online_order book_table   rate  votes  \
0                  Jalsa          Yes        Yes  4.1/5    775   
1         Spice Elephant          Yes         No  4.1/5    787   
2        San Churro Cafe          Yes         No  3.8/5    918   
3  Addhuri Udupi Bhojana           No         No  3.7/5     88   
4          Grand Village           No         No  3.8/5    166   

   approx_cost(for two people) listed_in(type)  
0                          800          Buffet  
1                          800          Buffet  
2                          800          Buffet  
3                          300          Buffet  
4                          600          Buffet  


In [None]:
# Check for null values 
df.isna().sum()

name                           0
online_order                   0
book_table                     0
rate                           0
votes                          0
approx_cost(for two people)    0
listed_in(type)                0
dtype: int64

In [16]:
# Lets Check for DataType
df.dtypes

name                           object
online_order                   object
book_table                     object
rate                           object
votes                           int64
approx_cost(for two people)     int64
listed_in(type)                object
dtype: object

In [20]:
# Since rating is object we need to change it to int and and replace that '/' 
def handle(value):
    value=str(value).split('/')
    value=value[0]
    return value
df['rate']=df['rate'].apply(handle)
df.head(2)

Unnamed: 0,name,online_order,book_table,rate,votes,approx_cost(for two people),listed_in(type)
0,Jalsa,Yes,Yes,4.1,775,800,Buffet
1,Spice Elephant,Yes,No,4.1,787,800,Buffet


In [23]:
# Converting rate as int
df['rate']=df['rate'].astype(float)
df.dtypes

name                            object
online_order                    object
book_table                      object
rate                           float64
votes                            int64
approx_cost(for two people)      int64
listed_in(type)                 object
dtype: object

In [24]:
# There's no null value in given data set lets check the data for statistic values

df.describe(include='all')

Unnamed: 0,name,online_order,book_table,rate,votes,approx_cost(for two people),listed_in(type)
count,148,148,148,148.0,148.0,148.0,148
unique,145,2,2,,,,4
top,San Churro Cafe,No,No,,,,Dining
freq,2,90,140,,,,110
mean,,,,3.633108,264.810811,418.243243,
std,,,,0.402271,653.676951,223.085098,
min,,,,2.6,0.0,100.0,
25%,,,,3.3,6.75,200.0,
50%,,,,3.7,43.5,400.0,
75%,,,,3.9,221.75,600.0,


From above statistical analysis clears following point :
- Tight clustering in Rate
- Widely spread Values in Votes and Approx_Cost(moderate spread)
- There 145 unique restaurants
- Total count is 148


In [None]:
# Information about data
df.info()

In [31]:
# Lets Rename the columns appropriately
df.rename(columns={'name':'Restaurant','online_order':'Online','book_table':'Bookings','rate':'Ratings(5)','votes':'Vote','approx_cost(for two people)':'Average Cost( For 2)','listed_in(type)':'Type'},inplace=True)

In [36]:
# Check the columns wether renamed 
df.head(5)

Unnamed: 0,Restaurant,Online,Bookings,Ratings(5),Vote,Average Cost( FOR 2),Type
0,Jalsa,Yes,Yes,4.1,775,800,Buffet
1,Spice Elephant,Yes,No,4.1,787,800,Buffet
2,San Churro Cafe,Yes,No,3.8,918,800,Buffet
3,Addhuri Udupi Bhojana,No,No,3.7,88,300,Buffet
4,Grand Village,No,No,3.8,166,600,Buffet
