## <u>Introduction</u>

#### The notebook surveys and analyzes Dineout restaurants in Indian market. To determine regional performance and customer behaviour; region based analysis is perfomed. 
#### The project evaluates Indian restaurants on the basis of multiple attributes. This abstract key elements that are essential for restaurants to own beneficial position in the aggressive market. A comparison among different regions is also obtained to highlight scope of improvement.

## <u>Table of Contents</u>
*  [1. Analyzing Dataframe](#1)
*  [2. How are restaurants distributed across India?](#2)
*  [3. How are average ratings distributed across India?](#3)
*  [4. How is cost distributed across India?](#4)
*  [5. How are votes distributed across India?](#5)
*  [6. How is the overall performance of restaurants across different states?](#6)
*  [7. What are top cuisines in India?](#7)
*  [8. How are the cuisines distributed among states?](#8)
*  [9. What are top restaurant locations in Maharashtra, Delhi and Karnataka?](#9)
*  [10. References](#10)

# Libraries

In [None]:
import pandas as pd
import numpy as np
import plotly.figure_factory as ff
from plotly.offline import iplot

import plotly.express as px
from plotly.subplots import make_subplots

import plotly.graph_objects as go

<a id='1'></a>
# Analyzing Dataframe 

In [None]:
# Reading dataframe 
df = pd.read_csv('../input/dineout-restaurants-in-india/dineout_restaurants.csv')
df.head()

In [None]:
# Evaluating dataframe
print('* Size of dataframe: {}\n'.format(df.shape))
print('* Datatype of columns are:\n {}\n'.format(df.info()))

In [None]:
df.describe()

In [None]:
df['City'].value_counts()

**Cities** can be categorized in terms of **State**. 

## Adding State Column

In [None]:
df['State'] = df['City']
df['State'] = df['City'].replace({'Bangalore': 'Karnataka', 'Delhi':'Delhi NCR', 'Mumbai':'Maharashtra', 'Kolkata':'Bengal', 'Hyderabad':'Telangana', 'Ahmedabad':'Gujarat', 'Chennai':'Tamil Nadu', 'Pune':'Maharashtra', 'Jaipur':'Rajasthan', 'Chandigarh':'Punjab', 'Indore':'Madhya Pradesh', 'Gurgaon':'Delhi NCR', 'Noida':'Delhi NCR', 'Vadodara':'Gujarat', 'Lucknow':'Uttar Pradesh', 'Agra':'Uttar Pradesh', 'Nagpur':'Maharashtra', 'Surat':'Gujarat', 'Ludhiana':'Punjab', 'Goa':'Goa', 'Ghaziabad':'Delhi NCR', 'Udaipur':'Rajasthan', 'Kochi':'Kerala'})
df['State'].value_counts()

**Kochi** has **just two restaurants**.

## Removing Kochi 

In [None]:
kochi_df = df[df['City']=='Kochi']
kochi_df.index
df = df.drop(kochi_df.index)
df['City'].value_counts()

## Distribution of restaraunt ratings, cost and votes in India 

In [None]:
fig = ff.create_distplot([df.Rating],['Rating'],bin_size=0.1)
fig.update_layout(title_text='Distribution of Restaraunt Ratings', 
                  title_font_color = 'medium turquoise', title_x = 0.47,
                  font_family="San Serif",
                  titlefont={'size': 20},)

iplot(fig, filename='Basic Distplot')

In [None]:
fig = ff.create_distplot([df.Cost],['Cost'],bin_size=100)
fig.update_layout(title_text='Distribution of Restaraunt Cost', 
                  title_font_color = 'medium turquoise', title_x = 0.5,
                  font_family="San Serif",
                  titlefont={'size': 20},)
iplot(fig, filename='Basic Distplot')

In [None]:
fig = ff.create_distplot([df.Votes],['Votes'],bin_size=200)
fig.update_layout(title_text='Distribution of Restaraunt Votes', 
                  title_font_color = 'medium turquoise', title_x = 0.5,
                  font_family="San Serif",
                  titlefont={'size': 20},)
iplot(fig, filename='Basic Distplot')

The above distribution **do not** provide analysis in **terms of states or cities**. The region-wise restaraunt performance is evaluated in following sections. 

<a id='2'></a>
# <u> Question #1: How are restaurants distributed across India? </u>

In [None]:
# Forming dataframes in term of cities and state
city_restnts = df.groupby('City').sum()
state_restnt = df.groupby('State').sum()

# List of states
restnt_state = df['State'].value_counts()
restnt_state

In [None]:
fig = px.bar(x = restnt_state.index, y=restnt_state)
fig.update_traces(marker_color ='rgb(12, 128, 128)', opacity=1)
fig.update_layout(xaxis_title = 'States', yaxis_title = 'Total Restaurants', 
                  title_text='Restaraunt Distribution Across States', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  )
                  )
fig.show()

In [None]:
restnt_city = df['City'].value_counts().sort_values(ascending = True) 

fig = px.bar(y = restnt_city.index, x=restnt_city, color=restnt_city, orientation = 'h',
            labels = {
                'color': 'Total' +'<br>'+ 'Restaurants'
            }) # color continuous scale
fig.update_layout(yaxis_title = 'States', xaxis_title = 'Total Restaurants', 
                  title_text='Restaraunt Distribution Across Cities', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  )
                  )
fig.show()

<a id='3'></a>
# <u>Question #2: How are average ratings distributed across India?</u>  

In [None]:
df.head()

## #2.1 State-Wise Distribution  

In [None]:
# Forming state-wise dataframe
df_state = df.groupby('State').mean()
df_state.reset_index(level=0, inplace=True)
df_state

In [None]:
fig = px.bar(df_state, x = 'State', y='Rating')
fig.update_traces(marker_color ='rgb(12, 128, 128)', opacity=1)
fig.update_layout(xaxis_title = 'States', yaxis_title = 'Average Rating', 
                  title_text='Rating Distribution Across States', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  ))
fig.show()

<center>The bar graph shows that <b>rating variation is small</b> for different states. </center>

### Comparing Ratings with Polar Bar Plot

In [None]:
labels = df_state['State']
x1 = df_state['Rating']

num_slices = len(x1)
theta = [(i+1.5)*360/num_slices for i in range(num_slices)]
r=x1
width = [360 / num_slices for _ in range(num_slices)]


barpolar_plots = [go.Barpolar(r=[r], theta=[t], width=[w], name=n)
for r, t, w, n in zip(r, theta, width, labels)]

fig = go.Figure(barpolar_plots)

fig.update_layout(#     template='ggplot2',
                    polar = dict(
                        radialaxis = dict(range=[3.5, 4.25], showticklabels=True),
                        angularaxis = dict(showticklabels=False, ticks='')
                        ),
                    title_text='Comparison of Ratings Across States', 
                    title_x=0.45,
                    font=dict(
                      family="Courier New, monospace",
                      size=12,
                  )
)
fig.show()

## #2.2 City-Wise Distribution

In [None]:
df_city = df.groupby('City').mean()
df_city.reset_index(level=0, inplace=True)
df_city

In [None]:
fig = px.bar(df_city, x = 'City', y='Rating')
fig.update_traces(marker_color ='rgb(12, 128, 128)', opacity=1)
fig.update_layout(xaxis_title = 'Cities', yaxis_title = 'Average Rating', 
                  title_text='Rating Distribution Across Cities', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  ))
fig.show()

<center>The bar graph shows that <b>rating variation is small</b> for different cities. </center>

### Comparing Ratings with Polar Bar Plot

In [None]:
labels = df_city['City']
x1 = df_city['Rating']

num_slices = len(x1)
theta = [(i+1.5)*360/num_slices for i in range(num_slices)]
r=x1
width = [360 / num_slices for _ in range(num_slices)]

barpolar_plots = [go.Barpolar(r=[r], theta=[t], width=[w], name=n)
for r, t, w, n in zip(r, theta, width, labels)]

fig = go.Figure(barpolar_plots)


fig.update_layout(#     template='ggplot2',
                    polar = dict(
                        radialaxis = dict(range=[3.5, 4.33], showticklabels=True),
                        angularaxis = dict(showticklabels=False, ticks='')
                        ),
                    yaxis_title = 'States', xaxis_title = 'Total Restaurants', 
                    title_text='Comparison of Ratings Across Cities', 
                    title_x=0.47,
                    font=dict(
                      family="Courier New, monospace",
                      size=12,
#                       color='rgb(12, 128, 128)'
                  )
)
fig.show()

<center>The bar graph shows that <b>rating variation is small</b> for different cities. </center>

<a id='4'></a>
# <u>Question #3: How is cost distributed across India?</u>  

## #3.1 State-wise Distribution

In [None]:
df_state

In [None]:
# Cost distribution across states
df_state.sort_values(by=['Cost'], inplace=True)

fig = px.bar(df_state, x = 'Cost', y='State', color = 'Cost', orientation = 'h',
            labels = {
                'Cost': 'Average' +'<br>'+ 'Cost'
            })
fig.update_layout(yaxis_title = 'States', xaxis_title = 'Average Cost', 
                  title_text='Average Cost Distribution Across States', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  )
                  )
fig.show()

## #3.2 City-wise Distribution

In [None]:
df_city

In [None]:
# Cost distribution across cities
df_city.sort_values(by=['Cost'], inplace=True)
df_city
fig = px.bar(df_city, x = 'Cost', y='City', color = 'Cost', orientation = 'h',
            labels = {
                'Cost': 'Average' +'<br>'+ 'Cost'
            })
fig.update_layout(yaxis_title = 'States', xaxis_title = 'Average Cost', 
                  title_text='Average Cost Distribution Across Cities', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  )
                  )
fig.show()

<a id='5'></a>
# <u>Question #4: How are votes distributed across India?</u>  

## #4.1 State-wise Distribution 

In [None]:
df_state

In [None]:
# Cost distribution across states
df_state.sort_values(by=['Votes'], inplace=True)

fig = px.bar(df_state, x = 'Votes', y='State', color = 'Votes', orientation = 'h',
            labels = {
                'Votes': 'Average' +'<br>'+ 'Votes'
            })
fig.update_layout(yaxis_title = 'States', xaxis_title = 'Average Votes', 
                  title_text='Votes Distribution Across States', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  )
                  )
fig.show()

## #4.2 City-wise Distribution

In [None]:
# Votes distribution across cities
df_city.sort_values(by=['Votes'], inplace=True)

fig = px.bar(df_city, x = 'Votes', y='City', color = 'Votes', orientation = 'h',
            labels = {
                'Votes': 'Average' +'<br>'+ 'Votes'
            })
fig.update_layout(yaxis_title = 'Cities', xaxis_title = 'Average Votes', 
                  title_text='Votes Distribution Across Cities', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  )
                  )
fig.show()

<a id='6'></a>
# <u>Question #5: How is the overall performance of restaurants across different states?</u>

## #5.1 Adding Attributes to State Dataframe

In [None]:
restnt_state

In [None]:
# Extracting total restaurants in each state and forming its dataframe
a = restnt_state.index
b = restnt_state
df_state_restnts = pd.DataFrame(list(zip(a,b)))
df_state_restnts.columns = ['State', 'Total Restaurants']
df_state_restnts = df_state_restnts.set_index('State')
display(df_state_restnts)

In [None]:
df_state

In [None]:
# Taking State column in dataframe as index
df_state = df_state.set_index('State')
df_state

In [None]:
# Matching indices of df_state_restnts with df_state 
df_state_restnts.reindex(df_state.index)

In [None]:
# Adding total restaurants column to state dataframe  
df_state['Total Restaurants'] = df_state_restnts['Total Restaurants']
df_state

In [None]:
# Normalizing columns with integer values
df_state_normalized = df_state.copy()
columns = ['Rating', 'Votes', 'Cost', 'Total Restaurants']

# apply normalization techniques
for column in columns:
    df_state_normalized[column] = (df_state_normalized[column] / df_state_normalized[column].abs().max())

# view normalized data
df_state_normalized.reset_index(level=0, inplace=True)
display(df_state_normalized)

## #5.2 Comparing Attributes of all States

In [None]:
# Comparing attributes of all states using polar scatter plots

fig = make_subplots(rows=6, cols=2, specs=[[{'type': 'polar'}]*2]*6, column_widths=[0.45, 0.45])

for index, state in enumerate(df_state_normalized['State']):
    if index % 2 == 0:
        row = int((index+2)/2)
        col = 1
    else: 
        row = int((index+1)/2) 
        col = 2
      
    fig.add_trace(go.Scatterpolar(
          name = df_state_normalized['State'][index],
          r = [df_state_normalized['Rating'][index], df_state_normalized['Votes'][index], df_state_normalized['Cost'][index], df_state_normalized['Total Restaurants'][index]],
          theta=['Rating', 'Votes', 'Cost', 'Total Restaurants'],
          fill = 'toself'    
        ), row, col)

fig.update_layout(height=2000, width=900, title_text="Comparison of Restaurants in Different States of India", title_x=0.5, title_font_color = '#4B0082')
fig.show()

<a id='7'></a>
# Question #6: What are top cuisines in India?  

## #6.1 Forming Cuisines Dataframe

In [None]:
df.head()

In [None]:
cuisines = df['Cuisine'].str.split(',').explode().unique().tolist()

In [None]:
# Forming cuisine dataframe
data = []
df_filtered = pd.DataFrame()
columns = ['Cuisine', 'Total Restaurants', 'Rating']
df_cuisine = pd.DataFrame(columns = columns)

for cuisine in cuisines:
    
    df['Cuisine Verification'] = df['Cuisine'].str.contains(cuisine, case=False, na=False).astype(int)
    df_filtered = df[df['Cuisine Verification'] == 1]
    total_restnt = len(df_filtered.index)
    df = df.drop(['Cuisine Verification'], axis=1)
    
    avg_rating = df_filtered['Rating'].sum()/total_restnt
    df_cuisine = df_cuisine.append({'Cuisine': cuisine, 'Total Restaurants': total_restnt, 'Rating':avg_rating, }, ignore_index=True)

In [None]:
df_cuisine.head(15)

In [None]:
df_cuisine.shape

## #6.2 Identifying Top Cuisines

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Total Restaurants', x=df_cuisine['Cuisine'], y=df_cuisine['Total Restaurants'])
])
fig.update_traces(marker_color ='rgb(12, 128, 128)', opacity=1)
fig.update_layout(xaxis_title = 'Cuisines', yaxis_title = 'Total Restaurants', 
                  title_text='Cuisine Distribution Across Restaurants', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  ))
fig.show()

<center> <b>Many cuisines</b> are served in <b>very few restaurants</b>. </center> 

### Filtering cuisines

In [None]:
# Taking cuisines that are atleast served in over 300 restaurants 
df_cuisine = df_cuisine[df_cuisine['Total Restaurants'] > 300]
df_cuisine.shape

In [None]:
fig = go.Figure(data=[
    go.Bar(name='Total Restaurants', x=df_cuisine['Cuisine'], y=df_cuisine['Total Restaurants'])
])

fig.update_traces(marker_color ='rgb(12, 128, 128)', opacity=1)
fig.update_layout(xaxis_title = 'Cuisines', yaxis_title = 'Total Restaurants', 
                  title_text='Distribution of Top Cuisines Across Restaurants', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  ))
fig.show()

* Cuisines dataframe consists of <b>duplicate values</b>. 
* <b>Multi-Cuisine</b> is not a <b>valid category</b>. 

In [None]:
# Printing some duplicate categories
df_cuisine.Cuisine[0], df_cuisine.Cuisine[13], df_cuisine.Cuisine[0], df_cuisine.Cuisine[64]

Double spacing before text is resulting in **dupicates**.

In [None]:
#  Reseting index and removing double space 
df_cuisine = df_cuisine.reset_index(drop = True)   
df_cuisine.Cuisine = df_cuisine.Cuisine.str.replace('  ', '')

# Verifying double space removal
df_cuisine.Cuisine[5], df_cuisine.Cuisine[13], df_cuisine.Cuisine[0], df_cuisine.Cuisine[3]

In [None]:
#  Identifying with duplicate values
duplicate_cuisine = df_cuisine.duplicated(subset = ['Cuisine'])

In [None]:
duplicate_cuisines = []
duplicate_cuisines = df_cuisine.loc[duplicate_cuisine]['Cuisine']
duplicate_cuisines

In [None]:
duplicate_indices = []

# Identifying indices dulplicate cuisines 
duplicate_bool = []
count = 0
for index, cuisine in enumerate(duplicate_cuisines):
    duplicate_bool = df_cuisine['Cuisine'].str.find(cuisine)

    for index, value in enumerate(duplicate_bool):
        if value == 0:
            duplicate_indices.append(index)
duplicate_indices

In [None]:
# Removing duplicate indices and updating attributes
i = 0
for index in duplicate_indices:
    
    if (i) % 2 == 0:
        count = 0
        # Updating attributes in first duplicate index (or Original Index)
        total_restnt_1 = (df_cuisine['Total Restaurants'][index])
        avg_rating_1 = df_cuisine['Rating'][index]
    
    else:
        count = 2
        total_restnt_2 = (df_cuisine['Total Restaurants'][index])
        avg_rating_2 = df_cuisine['Rating'][index]
    
    i += 1
    if count == 2:
        df_cuisine['Total Restaurants'][(index-1)] = (total_restnt_1 + total_restnt_2)
        df_cuisine['Rating'][(index-1)] = ((total_restnt_1*avg_rating_1) + (total_restnt_2*avg_rating_2))/(total_restnt_1 + total_restnt_2)
        
        # Removing second duplicate index
        df_cuisine = df_cuisine.drop(index)

In [None]:
df_cuisine = df_cuisine.reset_index(drop = True)
df_cuisine

In [None]:
# Dropping Multi-cuisine
df_cuisine = df_cuisine.drop(index = 6)
df_cuisine = df_cuisine.reset_index(drop = True)
df_cuisine

In [None]:
# Plotting cuisine with total restaurants
fig = go.Figure(data=[
    go.Bar(name='Total Restaurants', x=df_cuisine['Cuisine'], y=df_cuisine['Total Restaurants'])
])

fig.update_traces(marker_color ='rgb(12, 128, 128)', opacity=1)
fig.update_layout(xaxis_title = 'Cuisines', yaxis_title = 'Total Restaurants', 
                  title_text='Cuisine Distribution Across Restaurants', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  ))

fig.show()

## #6.3 Plotting Cuisines with Ratings

In [None]:
# Plotting rating with cuisines 
fig = go.Figure(data=[
    go.Bar(name='Rating', x=df_cuisine['Cuisine'], y=df_cuisine['Rating']),
])

fig.update_traces(marker_color ='rgb(12, 128, 128)', opacity=1)
fig.update_layout(xaxis_title = 'Cuisines', yaxis_title = 'Average Rating', 
                  title_text='Rating Distribution of Top Cuisines', 
                  title_x=0.5,
                  font=dict(
                      family="Courier New, monospace",
                      size=12,
                      color='rgb(12, 128, 128)'
                  ))

fig.show()

In [None]:
# Analysing with polar plot 
labels = df_cuisine['Cuisine']
x1 = df_cuisine['Rating']
num_slices = len(x1)
theta = [(i+1.5)*360/num_slices for i in range(num_slices)]
r=x1
width = [360 / num_slices for _ in range(num_slices)]

barpolar_plots = [go.Barpolar(r=[r], theta=[t], width=[w], name=n)
for r, t, w, n in zip(r, theta, width, labels)]

fig = go.Figure(barpolar_plots)

fig.update_layout(#     template='ggplot2',
                    polar = dict(
                        radialaxis = dict(range=[3.8, 4.25], showticklabels=True),
                        angularaxis = dict(showticklabels=False, ticks='')
                        ),
                    yaxis_title = 'States', xaxis_title = 'Total Restaurants', 
                    title_text='Comparison of Ratings of Different Cuisines', 
                    title_x=0.46,
                    font=dict(
                      family="Courier New, monospace",
                      size=12,
#                       color='rgb(12, 128, 128)'
                  )
)

fig.show()

<a id='8'></a>
# Question #7: How are the cuisines distributed among states? 

In [None]:
df.head()

In [None]:
# x = df[df['Cuisine'] == 'Multi-Cuisine']
# x.head()
# state_cuisines

## #7.1 Declaring Function for Obtaining Cuisine Information

In [None]:
df_state = pd.DataFrame()

# # Removing Multi-Cuisine
# df = df['Multi-Cuisine']

def cuisine_info(state):
    state_cuisines_clean =[]
    
#     Forming state dataframe
    filter = (df['State'] == state)
    df_state = df[filter].copy() 
    
#     Filtering cuisines
    state_cuisines = df_state['Cuisine'].str.split(',').explode().unique().tolist()
    
#     Removing 'Multi-Cuisine' category from cuisines    
    a = 'Multi-Cuisine'
    b = '  Multi-Cuisine'
    if a in state_cuisines:
        state_cuisines.remove('Multi-Cuisine')
    if b in state_cuisines:
        state_cuisines.remove('  Multi-Cuisine')

    
    for word in state_cuisines:
        word = word.replace('  ', '')
        state_cuisines_clean.append(word)
    
#     Removing duplicates from cuisines list
    state_cuisines_clean = np.unique(state_cuisines_clean)
    state_cuisines_clean
    
#     Forming state cuisine dataframe
    
    df_filtered = pd.DataFrame()
    df_cuisine_state = pd.DataFrame()
    # Forming cuisine df for state
    for cuisine in state_cuisines_clean:
        df_state['Cuisine Verification'] = df_state['Cuisine'].str.contains(cuisine, case=False, na=False).astype(int)
        df_filtered = df_state[df_state['Cuisine Verification'] == 1]

        total_restnt = len(df_filtered.index)
        total_votes = len(df_filtered.index)
        df_state = df_state.drop(['Cuisine Verification'], axis=1)

        avg_rating = df_filtered['Rating'].sum()/total_restnt

        df_cuisine_state = df_cuisine_state.append({'Cuisine': cuisine, 'Total Restaurants': total_restnt, 'Total Votes': total_votes, 'Rating':avg_rating}, ignore_index=True)
    
    return df_cuisine_state    

## #7.2 Forming Individual Cuisine Dataframes for all States

In [None]:
# Maharashtra cuisine dataframe
cuisine_maharashtra = cuisine_info('Maharashtra')
# Filtering top cusines 
top_cuisine_maharashtra = cuisine_maharashtra[cuisine_maharashtra['Total Votes']>50].reset_index(drop = True)
top_cuisine_maharashtra.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_maharashtra.reset_index(inplace = True, drop=True)
top_cuisine_maharashtra['Total Votes'] = top_cuisine_maharashtra['Total Votes'].astype('str') + ' votes'


# Delhi NCR cuisine dataframe
cuisine_delhi = cuisine_info('Delhi NCR')
# Filtering top cusines 
top_cuisine_delhi = cuisine_delhi[cuisine_delhi['Total Votes']>50].reset_index(drop = True)
top_cuisine_delhi.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_delhi.reset_index(inplace = True, drop=True)
top_cuisine_delhi['Total Votes'] = top_cuisine_delhi['Total Votes'].astype('str') + ' votes'


# Karnataka NCR cuisine dataframe
cuisine_karnataka = cuisine_info('Karnataka')
# Filtering top cusines 
top_cuisine_karnataka = cuisine_karnataka[cuisine_karnataka['Total Votes']>50].reset_index(drop = True)
top_cuisine_karnataka.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_karnataka.reset_index(inplace = True, drop=True)
top_cuisine_karnataka['Total Votes'] = top_cuisine_karnataka['Total Votes'].astype('str') + ' votes'


# Bengal cuisine dataframe
cuisine_bengal = cuisine_info('Bengal')
# Filtering top cusines 
top_cuisine_bengal = cuisine_bengal[cuisine_bengal['Total Votes']>50].reset_index(drop = True)
top_cuisine_bengal.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_bengal.reset_index(inplace = True, drop=True)
top_cuisine_bengal['Total Votes'] = top_cuisine_bengal['Total Votes'].astype('str') + ' votes'


# Telangana cuisine dataframe
cuisine_telangana = cuisine_info('Telangana')
# Filtering top cusines 
top_cuisine_telangana = cuisine_telangana[cuisine_telangana['Total Votes']>50].reset_index(drop = True)
top_cuisine_telangana.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_telangana.reset_index(inplace = True, drop=True)
top_cuisine_telangana['Total Votes'] = top_cuisine_telangana['Total Votes'].astype('str') + ' votes'


# Gujarat cuisine dataframe
cuisine_gujarat = cuisine_info('Gujarat')
# Filtering top cusines 
top_cuisine_gujarat = cuisine_gujarat[cuisine_gujarat['Total Votes']>50].reset_index(drop = True)
top_cuisine_gujarat.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_gujarat.reset_index(inplace = True, drop=True)
top_cuisine_gujarat['Total Votes'] = top_cuisine_gujarat['Total Votes'].astype('str') + ' votes'


# Tamil Nadu cuisine dataframe
cuisine_tamil = cuisine_info('Tamil Nadu')
# Filtering top cusines 
top_cuisine_tamil = cuisine_tamil[cuisine_tamil['Total Votes']>50].reset_index(drop = True)
top_cuisine_tamil.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_tamil.reset_index(inplace = True, drop=True)
top_cuisine_tamil['Total Votes'] = top_cuisine_tamil['Total Votes'].astype('str') + ' votes'


# Punjab cuisine dataframe
cuisine_punjab = cuisine_info('Punjab')
# Filtering top cusines 
top_cuisine_punjab = cuisine_punjab[cuisine_punjab['Total Votes']>50].reset_index(drop = True)
top_cuisine_punjab.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_punjab.reset_index(inplace = True, drop=True)
top_cuisine_punjab['Total Votes'] = top_cuisine_punjab['Total Votes'].astype('str') + ' votes'


# Rajasthan cuisine dataframe
cuisine_rajasthan = cuisine_info('Rajasthan')
# Filtering top cusines 
top_cuisine_rajasthan = cuisine_rajasthan[cuisine_rajasthan['Total Votes']>50].reset_index(drop = True)
top_cuisine_rajasthan.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_rajasthan.reset_index(inplace = True, drop=True)
top_cuisine_rajasthan['Total Votes'] = top_cuisine_rajasthan['Total Votes'].astype('str') + ' votes'


# Madhya Pradesh cuisine dataframe
cuisine_madhya = cuisine_info('Madhya Pradesh')
# Filtering top cusines 
top_cuisine_madhya = cuisine_madhya[cuisine_madhya['Total Votes']>50].reset_index(drop = True)
top_cuisine_madhya.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_madhya.reset_index(inplace = True, drop=True)
top_cuisine_madhya['Total Votes'] = top_cuisine_madhya['Total Votes'].astype('str') + ' votes'

# Uttar Pradesh cuisine dataframe
cuisine_uttar = cuisine_info('Uttar Pradesh')
# Filtering top cusines 
top_cuisine_uttar = cuisine_uttar[cuisine_uttar['Total Votes']>50].reset_index(drop = True)
top_cuisine_uttar.sort_values(by='Rating', ascending=False, inplace=True)
top_cuisine_uttar.reset_index(inplace = True, drop=True)
top_cuisine_uttar['Total Votes'] = top_cuisine_uttar['Total Votes'].astype('str') + ' votes'

# # Goa cuisine dataframe
# cuisine_goa = pd.DataFrame()
# cuisine_goa = cuisine_info('Goa')
# cuisine_goa[cuisine_goa['Total Restaurants']>50].head(25)

## #7.2 Printing State-wise Cuisine Table  

In [None]:
# Plotting Maharashtra cuisines
top_cuisine_maharashtra['State'] = 'Maharashtra'
fig = px.treemap(top_cuisine_maharashtra, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                 labels = {'Votes'}
                )
fig.update_layout( title_text = 'Favourite Cuisines in Maharshtra',
                  title_font_color = '#4B0082',
                  title_x = 0.5,
                 )
fig.show()


# Plotting Delhi cuisines
top_cuisine_delhi['State'] = 'Delhi'
fig = px.treemap(top_cuisine_delhi, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                )
fig.update_layout( title_text = 'Favourite Cuisines in Delhi',
                   title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()


# Plotting Karnataka cuisines
top_cuisine_karnataka['State'] = 'Karnataka'
fig = px.treemap(top_cuisine_karnataka, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating'
                )
fig.update_layout( title_text = 'Favourite Cuisines in Karnataka',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()


# Plotting Bengal cuisines
top_cuisine_bengal['State'] = 'Bengal'
fig = px.treemap(top_cuisine_bengal, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating'
                )
fig.update_layout(title_text = 'Favourite Cuisines in Bengal',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()


# Plotting Telangana cuisines
top_cuisine_telangana['State'] = 'Telangana'
fig = px.treemap(top_cuisine_telangana, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                )

fig.update_layout(title_text = 'Favourite Cuisines in Telangana',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()


# Plotting Gujarat cuisines
top_cuisine_gujarat['State'] = 'Gujarat'
fig = px.treemap(top_cuisine_gujarat, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                )
fig.update_layout(title_text = 'Favourite Cuisines in Gujarat',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()


# Plotting Tamil Nadu cuisines
top_cuisine_tamil['State'] = 'Tamil Nadu'
fig = px.treemap(top_cuisine_tamil, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                )
fig.update_layout(title_text = 'Favourite Cuisines in Tamil Nadu',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()


#  Plotting punjab cuisines 
top_cuisine_punjab['State'] = 'Punjab'
fig = px.treemap(top_cuisine_punjab, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating'
                )
fig.update_layout(title_text = 'Favourite Cuisines in Punjab',
                  title_font_color = '#4B0082',
#                  title_font_family = 'Times New Roman',
                  title_x = 0.5
                 )
fig.show()


# Plotting Rajasthan cuisines
top_cuisine_rajasthan['State'] = 'Rajasthan'
fig = px.treemap(top_cuisine_rajasthan, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating'
                )
fig.update_layout(title_text = 'Favourite Cuisines in Rajasthan',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()



# Plotting Madhya Pradesh cuisines
top_cuisine_madhya['State'] = 'Madhya Pradesh'
fig = px.treemap(top_cuisine_madhya, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating'
                )
fig.update_layout(title_text = 'Favourite Cuisines in Madhya Pradesh',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()


# Plotting Uttar Pradesh cuisines
top_cuisine_uttar['State'] = 'Uttar Pradesh'
fig = px.treemap(top_cuisine_uttar, 
                 path=['State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating'
                )
fig.update_layout(title_text = 'Favourite Cuisines in Uttar Pradesh',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )

fig.show()

In [None]:
frames = [top_cuisine_uttar, top_cuisine_madhya, top_cuisine_rajasthan, top_cuisine_punjab, top_cuisine_tamil, top_cuisine_gujarat, top_cuisine_telangana, top_cuisine_bengal, top_cuisine_karnataka, top_cuisine_delhi, top_cuisine_maharashtra]
top_cuisine_india = pd.concat(frames)
display(top_cuisine_india)

## #7.3 Plotting Consolidated Cuisine Table for India 

In [None]:
top_cuisine_india['Country'] = 'India'
fig = px.treemap(top_cuisine_india, 
                 path=['Country', 'State', 'Cuisine', 'Total Votes'], 
                 values='Rating',
                 color='Rating'
                )
fig.update_layout( title_text = 'State-wise Favourite Cuisines in India',
                  title_font_color = '#4B0082',
                  title_x = 0.5
                 )
fig.show()

<a id='9'></a>
# Question #8: What are top restaurant locations in Maharashtra, Delhi and Karnataka? 

In [None]:
df.head()

## #8.1 Forming Individual Dataframes for all States

In [None]:
df_maharashtra = df[df['State'] == 'Maharashtra']
df_delhi = df[df['State'] == 'Delhi NCR']
df_karnataka = df[df['State'] == 'Karnataka']
df_maharashtra

## #8.2 Defining Function to Return Votes in a Locality

In [None]:
def total_votes(locality):
    df_x = df[df['Locality'] == locality]
    total_votes = df_x['Votes'].sum()
    return total_votes

## #8.3 Obtaining Votes for all Localities in Maharashtra

In [None]:
# List of all localities
maharashtra_locations = df_maharashtra['Locality'].value_counts().index.tolist()

# Obtaining total votes 
total_votes_value = []
total_votes_list = []
for index, locality in enumerate(maharashtra_locations):
    total_votes_value = total_votes(locality) 
    total_votes_list.append(total_votes_value)
    
# Locality-wise total restuarants in Maharashtra 
maharashtra_location_counts = df_maharashtra['Locality'].value_counts()

# Zipping required lists and forming dataframe
list_of_tuples = list(zip(maharashtra_locations, maharashtra_location_counts, total_votes_list))
maharashtra_locations_df = pd.DataFrame(list_of_tuples, columns = ['Location', 'Total Restaurants', 'Total Votes'])
maharashtra_locations_df

## #8.4 Adding Attributes to the Localities Dataframe

In [None]:
df_statedf_location = pd.DataFrame()
rating_list = []
cost_list = []
location_rating_list = []
location_cost_list = []

for index, location in enumerate(maharashtra_locations_df['Location']):
    df_location = df[df['Locality'] == location]

#     Calculating average rating

    for rating in df_location["Rating"]:
        rating_list.append(rating)
    avg_rating = sum(rating_list)/len(rating_list)
    location_rating_list.append(avg_rating)
    
#     Calculating average cost

    for cost in df_location["Cost"]:
        cost_list.append(cost)
    avg_cost = sum(cost_list)/len(cost_list)
    location_cost_list.append(avg_cost)

#     Adding attributes to the dataframe
maharashtra_locations_df['Rating'] = location_rating_list
maharashtra_locations_df['Cost'] = location_cost_list
top_locations_maharashtra = maharashtra_locations_df[maharashtra_locations_df['Total Votes']>150]
top_locations_maharashtra['Total Votes'] = top_locations_maharashtra['Total Votes'].astype('str') + ' votes'

## #8.5 Similarly Obtaining Dataframes for Delhi and Karnataka

### Obtaining Dataframe for Delhi

In [None]:
# Obtaining total votes for all localities in Delhi

# List of all localities
delhi_locations = df_delhi['Locality'].value_counts().index.tolist()

# Obtaining total votes
total_votes_value = []
total_votes_list = []
for index, locality in enumerate(delhi_locations):
    total_votes_value = total_votes(locality) 
    total_votes_list.append(total_votes_value)
    
# Locality-wise total restuarants in Maharashtra 
delhi_location_counts = df_delhi['Locality'].value_counts()

# Zipping required lists and forming dataframe
list_of_tuples = list(zip(delhi_locations, delhi_location_counts, total_votes_list))
delhi_locations_df = pd.DataFrame(list_of_tuples, columns = ['Location', 'Total Restaurants', 'Total Votes'])
delhi_locations_df

In [None]:
#     Adding attributes to the localities dataframe
df_location = pd.DataFrame()
rating_list = []
cost_list = []
location_rating_list = []
location_cost_list = []

for index, location in enumerate(delhi_locations_df['Location']):
    df_location = df[df['Locality'] == location]

#     Calculating average rating

    for rating in df_location["Rating"]:
        rating_list.append(rating)
    avg_rating = sum(rating_list)/len(rating_list)
    location_rating_list.append(avg_rating)
    
#     Calculating average cost

    for cost in df_location["Cost"]:
        cost_list.append(cost)
    avg_cost = sum(cost_list)/len(cost_list)
    location_cost_list.append(avg_cost)

#     Adding attributes to the dataframe
delhi_locations_df['Rating'] = location_rating_list
delhi_locations_df['Cost'] = location_cost_list
delhi_locations_df.head(20)
top_locations_delhi = delhi_locations_df[delhi_locations_df['Total Votes']>150]
top_locations_delhi['Total Votes'] = top_locations_delhi['Total Votes'].astype('str') + ' votes'

### Obtaining dataframe for Karnataka

In [None]:
# Obtaining total votes for all localities in Delhi

# List of all localities
karnataka_locations = df_karnataka['Locality'].value_counts().index.tolist()

# Obtaining total votes
total_votes_value = []
total_votes_list = []
for index, locality in enumerate(karnataka_locations):
    total_votes_value = total_votes(locality) 
    total_votes_list.append(total_votes_value)
    
# Locality-wise total restuarants in Maharashtra 
karnataka_location_counts = df_karnataka['Locality'].value_counts()

# Zipping required lists and forming dataframe
list_of_tuples = list(zip(karnataka_locations, karnataka_location_counts, total_votes_list))
karnataka_locations_df = pd.DataFrame(list_of_tuples, columns = ['Location', 'Total Restaurants', 'Total Votes'])
karnataka_locations_df

In [None]:
#     Adding attributes to the localities dataframe
df_location = pd.DataFrame()
rating_list = []
cost_list = []
location_rating_list = []
location_cost_list = []

for index, location in enumerate(karnataka_locations_df['Location']):
    df_location = df[df['Locality'] == location]

#     Calculating average rating

    for rating in df_location["Rating"]:
        rating_list.append(rating)
    avg_rating = sum(rating_list)/len(rating_list)
    location_rating_list.append(avg_rating)
    
#     Calculating average cost

    for cost in df_location["Cost"]:
        cost_list.append(cost)
    avg_cost = sum(cost_list)/len(cost_list)
    location_cost_list.append(avg_cost)

#     Adding attributes to the dataframe
karnataka_locations_df['Rating'] = location_rating_list
karnataka_locations_df['Cost'] = location_cost_list
# karnataka_locations_df.head(20)

top_locations_karnataka = karnataka_locations_df[karnataka_locations_df['Total Votes']>150]
top_locations_karnataka['Total Votes'] = top_locations_karnataka['Total Votes'].astype('str') + ' votes'

In [None]:
top_locations_karnataka.head()

## #8.6 Plotting Treemaps

In [None]:
fig = px.treemap(top_locations_maharashtra, 
                 path=['Location', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                 labels = {'Votes'}
                )
fig.update_layout( title_text = 'Top Localities in Maharashtra',
                  title_font_color = '#4B0082',
                  title_x = 0.5,
                 )
fig.show()




fig = px.treemap(top_locations_delhi, 
                 path=['Location', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                 labels = {'Votes'}
                )
fig.update_layout( title_text = 'Top Localities in Delhi',
                  title_font_color = '#4B0082',
                  title_x = 0.5,
                 )
fig.show()



fig = px.treemap(top_locations_karnataka, 
                 path=['Location', 'Total Votes'], 
                 values='Rating',
                 color='Rating',
                 labels = {'Votes'}
                )
fig.update_layout( title_text = 'Top Localities in Karnataka',
                  title_font_color = '#4B0082',
                  title_x = 0.5,
                 )
fig.show()


<a id='10'></a>
# References
The following notebooks and tutorials have enabled me to develop this notebook. The work in following links is very much appreciated:

* EDA Inspiration: https://www.kaggle.com/kurazh/eda-game-sales
* Notebook Inspiration: https://www.kaggle.com/andreshg/timeseries-analysis-a-complete-guide
* Plotly Visualizations: https://www.kaggle.com/thebrownviking20/intermediate-visualization-tutorial-using-plotly
* Statistical Visualizations: https://www.kaggle.com/subinium/basic-of-statistical-viz-plotly-seaborn
* Polar Bar Plot: https://towardsdatascience.com/improving-plotlys-polar-bar-charts-43f6eec867b7
* Radar Plot: https://www.kaggle.com/ivannatarov/plotly-for-beginners-polar-charts-image
* Tree Maps: https://towardsdatascience.com/treemap-basics-with-python-777e5ed173d0
* Plotly Reference: https://plotly.com/python/