   # **Indian Food Analysis**

The dataset consists of about 255 Indian dishes and 9 columns associated with each of them.

The 9 columns are as follows:-

name : name of the dish

ingredients : main ingredients used

diet : type of diet - either vegetarian or non vegetarian

prep_time : preparation time

cook_time : cooking time

flavor_profile : flavor profile includes whether the dish is spicy, sweet, bitter, etc

course : course of meal - starter, main course, dessert, etc

state : state where the dish is famous or is originated

region : region where the state belongs

In [64]:
# Necessary libraries
import pandas as pd
import numpy as np
import plotly.express as px
from plotly.offline import init_notebook_mode
import matplotlib.pyplot as plt
%matplotlib inline
from wordcloud import WordCloud , ImageColorGenerator

# **Loading the data**

In [6]:
#mount colab with mydrive
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [7]:
data = pd.read_csv("/content/drive/MyDrive/Great learning DA project/indian_food.csv")
data

Unnamed: 0,name,ingredients,diet,prep_time,cook_time,flavor_profile,course,state,region
0,Balu shahi,"Maida flour, yogurt, oil, sugar",vegetarian,45,25,sweet,dessert,West Bengal,East
1,Boondi,"Gram flour, ghee, sugar",vegetarian,80,30,sweet,dessert,Rajasthan,West
2,Gajar ka halwa,"Carrots, milk, sugar, ghee, cashews, raisins",vegetarian,15,60,sweet,dessert,Punjab,North
3,Ghevar,"Flour, ghee, kewra, milk, clarified butter, su...",vegetarian,15,30,sweet,dessert,Rajasthan,West
4,Gulab jamun,"Milk powder, plain flour, baking powder, ghee,...",vegetarian,15,40,sweet,dessert,West Bengal,East
...,...,...,...,...,...,...,...,...,...
250,Til Pitha,"Glutinous rice, black sesame seeds, gur",vegetarian,5,30,sweet,dessert,Assam,North East
251,Bebinca,"Coconut milk, egg yolks, clarified butter, all...",vegetarian,20,60,sweet,dessert,Goa,West
252,Shufta,"Cottage cheese, dry dates, dried rose petals, ...",vegetarian,-1,-1,sweet,dessert,Jammu & Kashmir,North
253,Mawa Bati,"Milk powder, dry fruits, arrowroot powder, all...",vegetarian,20,45,sweet,dessert,Madhya Pradesh,Central


In [8]:
data.head()

Unnamed: 0,name,ingredients,diet,prep_time,cook_time,flavor_profile,course,state,region
0,Balu shahi,"Maida flour, yogurt, oil, sugar",vegetarian,45,25,sweet,dessert,West Bengal,East
1,Boondi,"Gram flour, ghee, sugar",vegetarian,80,30,sweet,dessert,Rajasthan,West
2,Gajar ka halwa,"Carrots, milk, sugar, ghee, cashews, raisins",vegetarian,15,60,sweet,dessert,Punjab,North
3,Ghevar,"Flour, ghee, kewra, milk, clarified butter, su...",vegetarian,15,30,sweet,dessert,Rajasthan,West
4,Gulab jamun,"Milk powder, plain flour, baking powder, ghee,...",vegetarian,15,40,sweet,dessert,West Bengal,East


In [9]:
data.columns

Index(['name', 'ingredients', 'diet', 'prep_time', 'cook_time',
       'flavor_profile', 'course', 'state', 'region'],
      dtype='object')

# **Checking null values**

In [10]:
data.isnull().any()

name              False
ingredients       False
diet              False
prep_time         False
cook_time         False
flavor_profile    False
course            False
state             False
region             True
dtype: bool

Only the Region column has null values in the entire data set

**Check how many null values are there in any column**

In [11]:
data.isnull().sum()

name              0
ingredients       0
diet              0
prep_time         0
cook_time         0
flavor_profile    0
course            0
state             0
region            1
dtype: int64

Region column has null only in one place

In [12]:
data=data.replace(-1,np.nan)
data=data.replace('-1',np.nan)

Places containing (-1) in each column are replaced with nulls

In [13]:
data.head()

Unnamed: 0,name,ingredients,diet,prep_time,cook_time,flavor_profile,course,state,region
0,Balu shahi,"Maida flour, yogurt, oil, sugar",vegetarian,45.0,25.0,sweet,dessert,West Bengal,East
1,Boondi,"Gram flour, ghee, sugar",vegetarian,80.0,30.0,sweet,dessert,Rajasthan,West
2,Gajar ka halwa,"Carrots, milk, sugar, ghee, cashews, raisins",vegetarian,15.0,60.0,sweet,dessert,Punjab,North
3,Ghevar,"Flour, ghee, kewra, milk, clarified butter, su...",vegetarian,15.0,30.0,sweet,dessert,Rajasthan,West
4,Gulab jamun,"Milk powder, plain flour, baking powder, ghee,...",vegetarian,15.0,40.0,sweet,dessert,West Bengal,East


In [14]:
data.isnull().sum()

name               0
ingredients        0
diet               0
prep_time         30
cook_time         28
flavor_profile    29
course             0
state             24
region            14
dtype: int64

Since each column (-1) is replaced by null, all columns are incremented by null, and columns that did not have null values also have null values.

In [15]:
data.shape

(255, 9)

There are 255 rows and 9 columns

In [16]:
pie_data = data.diet.value_counts().reset_index()

In [17]:
pie_data.columns = ['diet','count']
fig = px.pie(pie_data, values='count', names='diet', title='Proportion of Vegetarian and Non-Vegetarian dishes',
             color_discrete_sequence=['green', 'red'])
fig.show()

We can see the diet vagetarian count is 226 out of 255 and diet non vagetarian count is 29. So we can see the majority dishes we have in **vagetarian**

# **We see which flavors take the longest and least time to prepare**

In [63]:
flav_data = data.flavor_profile.value_counts().reset_index()
flav_data.columns = ['flavor_profile', 'prep_time']
fig = px.bar(flav_data,x='flavor_profile',y='prep_time',title='variety of item according to the flavour',
color_discrete_sequence=['green'])
fig.show()

According to this chart it is seen that **spicy** food takes the longest time to cook and **sour** food takes the least time to cook.

# **Now see which dish took the longest and least time to cook**

In [49]:
cooking_time= data[['cook_time','name']]

Separated the two columns cook_time and name from the main data set

In [50]:
cooking_time.head()

Unnamed: 0,cook_time,name
0,25.0,Balu shahi
1,30.0,Boondi
2,60.0,Gajar ka halwa
3,30.0,Ghevar
4,40.0,Gulab jamun


In [51]:
cooking_time=cooking_time.sort_values(['cook_time'],ascending=True)

Food names are arranged in ascending order according to cooking time because to see how little time it takes to cook a food.

In [52]:
cooking_time.head()

Unnamed: 0,cook_time,name
109,2.0,Pani puri
111,5.0,Papad
11,5.0,Lassi
147,5.0,Papadum
212,6.0,Lilva Kachori


In [54]:
ten_cook_quickly=cooking_time.head(10)

In [55]:
ten_cook_quickly

Unnamed: 0,cook_time,name
109,2.0,Pani puri
111,5.0,Papad
11,5.0,Lassi
147,5.0,Papadum
212,6.0,Lilva Kachori
78,10.0,Chapati
169,10.0,Bajri no rotlo
195,10.0,Koshimbir
207,10.0,Surnoli
190,10.0,Keri no ras


The buttom 10 cook time foods are separated from the main data set

In [53]:
ten_cook_quickly

Unnamed: 0,cook_time,name
109,2.0,Pani puri
111,5.0,Papad
11,5.0,Lassi
147,5.0,Papadum
212,6.0,Lilva Kachori
78,10.0,Chapati
169,10.0,Bajri no rotlo
195,10.0,Koshimbir
207,10.0,Surnoli
190,10.0,Keri no ras


In [25]:
#cook_data = ten_cook_quickly.cook_time.value_counts().reset_index()

In [39]:
#cook_data.columns = ['cook_time', 'name']
fig = px.bar(ten_cook_quickly,x='cook_time',y='name',title='dishes based on cooking time',
color_discrete_sequence=['green'])
fig.show()

According to this chart we can see that Panipuri takes the least amount of time to cook

In [27]:
data.columns

Index(['name', 'ingredients', 'diet', 'prep_time', 'cook_time',
       'flavor_profile', 'course', 'state', 'region'],
      dtype='object')

In [56]:
cooking_time_longest=cooking_time.sort_values(['cook_time'],ascending=False)

Food names are arranged in descending order according to cooking time because to see how longest time it takes to cook a food.

In [59]:
tencooking_time_longest=cooking_time_longest.head(10)

In [60]:
tencooking_time_longest

Unnamed: 0,cook_time,name
62,720.0,Shrikhand
114,120.0,Pindi chana
27,120.0,Malapua
75,120.0,Biryani
130,90.0,Idli
115,90.0,Rajma chaval
128,90.0,Dosa
34,90.0,Rasgulla
142,90.0,Kuzhakkattai
144,90.0,Masala Dosa


In [61]:
#cook_data.columns = ['cook_time', 'name']
fig = px.bar(tencooking_time_longest,x='cook_time',y='name',title='dishes based on cooking time',
color_discrete_sequence=['green'])
fig.show()

According to this chart we can see that Shrikhand Biriyani takes the longest time to cook

#**Interpretation**
Among the food items in this data set, a higher percentage is vegetarian foods and a lower percentage is non-vegetarian foods.
If you look at the cooking time, spicy food takes the longest time compared to other foods.
And if we talk about particular food then it takes least time to make Panipuri and the longest time to make Shikhand Biryani.