INDIAN FOOD - DATA ANALYSIS AND VISUALIZATION

![picture](https://thumbs.dreamstime.com/b/chicken-jalfrazy-indian-food-recipe-spices-wooden-table-92742377.jpg)

About the dataset :

The dataset contains information about various types of food , ingredients used , diet type , preparation time , cook time , flavour of food , course , state and region.

The Dataset Contains : 255 rows and 9 columns. Region column have 1 missing values and some of the columns have -1 as their value . After cleaning the data we get 180 rows and 9 columns. 75 Rows have missing and inappropriate value so here i have removed it. 

Contents :

[Import required libraries]

[Import the dataset]

[Data Exploration]

[Data Cleaning]

[Data Analysis and Visualization]

Import the required libraries

In [None]:
import pandas as pd
import numpy as np   
import matplotlib.pyplot as plt  
import seaborn as sns 
from wordcloud import WordCloud 


Import the dataset

In [None]:
df = pd.read_csv('../input/indian-food-101/indian_food.csv', na_values=['.','?','*','  '])
df

DATA EXPLORATION

In [None]:
df.shape

In [None]:
df.info()

In [None]:
df.region.unique()

In [None]:
df.flavor_profile.unique()

In [None]:
df.state.unique()

In [None]:
df.name.unique()

In [None]:
df.head()

In [None]:
df.tail()

In [None]:
df.describe()

In [None]:
df.corr()

DATA CLEANING

In [None]:
df.duplicated().sum()

In [None]:
df.isnull().sum()

In [None]:
df.dropna(inplace=True)

In [None]:
df.isnull().sum()

In [None]:
df.shape

There are still some unwanted values in our data we have to clean the data again.

Some of the rows contain -1 as their value we need to remove it.

In [None]:
df=df[df['prep_time'] >= 1] 
df=df[df['cook_time'] >= 1] 
df=df[df['flavor_profile'] != '-1'] 
df=df[df['course'] != '-1'] 
df=df[df['state'] != '-1'] 
df=df[df['region'] != '-1'] 
df

**Now the data is cleaned and ready to be used for analysis.**

DATA ANALYSIS AND VISUALIZATION

In [None]:
import plotly.graph_objects as go

In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(
    x = df['state'],
    marker_color='chocolate',
    opacity=1
))

fig.update_layout(
    title_text='STATE_DISTRIBUTION',
    xaxis_title_text='STATE',
    yaxis_title_text='COUNT', 
    bargap=0.05, 
    xaxis =  {'showgrid': False },
    yaxis = {'showgrid': False },
    width=600,
    height=600
)

fig.show()

The above graph shows thedistribution of Number of dishes over the state.

In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(
    x = df['diet'],
    marker_color='blue',
    opacity=1
))

fig.update_layout(
    title_text='DIET_DISTRIBUTION',
    xaxis_title_text='DIET',
    yaxis_title_text='COUNT', 
    bargap=0.8, 
    xaxis =  {'showgrid': False },
    yaxis = {'showgrid': False },
    width=600,
    height=600
)

fig.show()

The number of vegetarian dishes is more than non vegetarian.

In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(
    x = df['flavor_profile'],
    marker_color='firebrick',
    opacity=1
))

fig.update_layout(
    title_text='FLAVOR_DISTRIBUTION',
    xaxis_title_text='flavor_profile',
    yaxis_title_text='COUNT', 
    bargap=0.8,
    xaxis =  {'showgrid': False },
    yaxis = {'showgrid': False }
    ,
    width=600,
    height=600
)

fig.show()

There are 73 sweet , 102 spicy , 4 bitter and 1 sour dishes.

In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(
    x = df['state'],
    marker_color='forestgreen',
    opacity=1
))

fig.update_layout(
    title_text='STATE_DISTRIBUTION',
    xaxis_title_text='STATE',
    yaxis_title_text='COUNT', 
    bargap=0.8, 
    xaxis =  {'showgrid': False },
    yaxis = {'showgrid': False },
    width=600,
    height=600
)

fig.show()

The above graph shows State wise number of dishes.

In [None]:
fig = go.Figure()
fig.add_trace(go.Histogram(
    x = df['region'],
    marker_color='gold',
    opacity=1
))

fig.update_layout(
    title_text='REGION_DISTRIBUTION',
    xaxis_title_text='REGION',
    yaxis_title_text='COUNT', 
    bargap=0.8, 
    xaxis =  {'showgrid': False },
    yaxis = {'showgrid': False },
    width=600,
    height=600
)

fig.show()

The above graph shows number of region with number of dishes.

In [None]:
import plotly.express as px
fig = px.line(df, x="diet", y="flavor_profile", title='Flavour_profile v/s Diet',
    width=600,
    height=600)
fig.show()

Most of the spicy dishes are non vegetarian.

The sweet , spicy , bitter and sour dishes are vegetarian.

In [None]:
import plotly.express as px
fig = px.line(df, x="diet", y="course", title='Course v/s Diet',
    width=600,
    height=600)
fig.show()

Starter and main course are mostly non vegetarian.

Dessert , main course and snack are mostly vegetarian.

In [None]:
fig = px.scatter(df, x="flavor_profile", y="diet" ,color='diet',title="Flavour V/S Diet",
    width=600,
    height=600)
fig.show()

sweet dishes - vegetarian 

spicy dishes - non vegetarian and vegetarian

bitter dishes - vegetarian 

sour dishes - vegetarian 

In [None]:
fig = px.scatter(df, x="name", y="state" ,color='state',title="State V/S Name",
    width=600,
    height=600)
fig.show()

Above graph shows the distribution of dishes over states.

In [None]:
fig = px.scatter(df, x="name", y="region" ,color='region',title="Region V/S Name",
    width=600,
    height=600)
fig.show()

Above graph shows the distribution of dishes over region.

In [None]:
fig = px.scatter(df, x="name", y="flavor_profile" ,color='flavor_profile',title="Flavour V/S Name",
    width=600,
    height=600)
fig.show()

Above graph shows the distribution of dishes over flavour.

In [None]:
fig = px.scatter(df, x="name", y="diet" ,color='diet',title="Diet V/S Name",
    width=600,
    height=600)
fig.show()

Above graph shows the distribution of dishes over diet.

In [None]:
fig = px.scatter(df, x="state", y="diet" ,color='diet',title="Diet V/S State",
    width=600,
    height=600)
fig.show()

Above graph shows the distribution of diet over states.

In [None]:
fig = px.scatter(df, x="region", y="diet" ,color='diet',title="Diet V/S Region",
    width=600,
    height=600)
fig.show()

Above graph shows the distribution of diet over region.

In [None]:
fig = px.scatter(df, x="flavor_profile", y="prep_time" ,color='prep_time',title="Prep_time V/S Flavour",
    width=600,
    height=600)
fig.show()

Above graph gives idea about which flavour takes how much time for preparation.

In [None]:

fig = px.scatter(df, x="course", y="ingredients" ,color='course',title="Ingredients V/S Course",
    width=600,
    height=600)

fig.show()

Above graph gives idea about which food requires which indegridents.

In [None]:
fig = px.scatter(df, x="course", y="diet", color="course", hover_data=['prep_time']  ,  size_max=50 ,
    width=600,
    height=600)
fig.show()

Above graph gives idea about  preparation time , diet type and food type.

In [None]:
fig = px.scatter(df, x="course", y="diet", color="course", hover_data=['cook_time']  ,  size_max=50,
    width=600,
    height=600)
fig.show()

Above graph gives idea about  cook time , diet type and food type.

In [None]:
fig = px.scatter(df, x="flavor_profile", y="diet", color="flavor_profile",
                 hover_data=['prep_time'] ,  size_max=50,
    width=600,
    height=600)
fig.show()

Above graph gives idea about  preparation time , diet type and flavour.

In [None]:
fig = px.scatter(df, x="flavor_profile", y="diet", color="flavor_profile",
                  hover_data=['cook_time'] ,  size_max=50,
    width=600,
    height=600)
fig.show()

Above graph gives idea about  cook time , diet type and flavour.

In [None]:
fig = px.scatter(df, x="name", y="diet", color="state",
                  hover_data=['state'] ,  size_max=40,
    width=600,
    height=600)
fig.show()

Above graph give us idea about name of the food , state and its diet type.

Dishes Word art 

In [None]:
wordCloud = WordCloud(
    background_color='lightgrey',
    max_font_size = 50).generate(' '.join(df['name']))
plt.figure(figsize=(14,10))
plt.axis('off')
plt.imshow(wordCloud)
plt.show()

Ingredients word art

In [None]:
wordCloud = WordCloud(
    background_color='lightgrey',
    max_font_size = 50).generate(' '.join(df['ingredients']))
plt.figure(figsize=(14,10))
plt.axis('off')
plt.imshow(wordCloud)
plt.show()

State word art

In [None]:
wordCloud = WordCloud(
    background_color='lightgrey',
    max_font_size = 50).generate(' '.join(df['state']))
plt.figure(figsize=(14,10))
plt.axis('off')
plt.imshow(wordCloud)
plt.show()

In [None]:
pie_df = df.diet.value_counts().reset_index()
pie_df.columns = ['diet','count']
fig = px.pie(pie_df, values='count', names='diet', title='Proportion of Vegetarian and Non-Vegetarian dishes',
             color_discrete_sequence=['green', 'red'],
    width=500,
    height=500)
fig.show()

Above pie chart shows that number of vegetarian is more than non vegetarian.