# Analysing Indian Recipes

Food - one thing that no one can deny not liking. As a foodie myself, I love food, the spicier the merrier, and there is nothing more spice loaded than my very own Indian Food.

![Indian Food Buffet](https://comps.canstockphoto.com/indian-food-stock-photo_csp9792544.jpg)

Thanks to the effort of [Neha Prabhavalkar](https://www.kaggle.com/nehaprabhavalkar) we have a dataset containing recipes of various Indian cuisines along with ingredients. Let's get cooking!

# Data Preparation and Cleaning

Let's disect the dataset and see what we have in our kitchen before we start cookin



In [None]:
import numpy as np
import pandas as pd

In [None]:
indian_recipes = "../input/indian-food-101/indian_food.csv"
df_indian_recipes = pd.read_csv(indian_recipes)
print("shape", df_indian_recipes.shape, sep=": ")

print("column types",df_indian_recipes.dtypes, sep=":\n")

We have 255 recipes, with 9 columns. Each column stores following detail about a recipe:

* *name* : name of the dish
* *ingredients* : main ingredients used
* *diet* : type of diet - either vegetarian or non vegetarian
* *prep_time* : preparation time
* *cook_time* : cooking time
* *flavor_profile* : flavor profile includes whether the dish is spicy, sweet, bitter, etc
* *course* : course of meal - starter, main course, dessert, etc
* *state* : state where the dish is famous or is originated
* *region* : region where the state belongs

Note: Presence of -1 in any of the columns indicates NaN value.

Let's replace -1 to NaN so we have clear picture of the data and see how many different values we have.

In [None]:
df_indian_recipes.replace(-1, np.NaN, inplace = True)
df_indian_recipes.replace("-1", np.NaN, inplace = True)
df_indian_recipes.nunique()

In [None]:
df_indian_recipes.head()

In [None]:
ingredients = set()
for item in df_indian_recipes['ingredients']:
    ingredients.update(str(item).lower().split(","))
    
print("Total unique ingredients in dataset",len(ingredients),sep=": ")

In [None]:
print("Are there any NA values in any column", df_indian_recipes.isna().sum(), sep=":\n")

# Exploratory Analysis and Visualization

Let's explore which dish to cook.

Let's begin by importing`matplotlib.pyplot` and `seaborn`.

In [None]:
import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Let's check which state dominates with the dishes in the dataset.

In [None]:
recipe_by_state = df_indian_recipes.groupby('state').size().to_frame(name = "count").reset_index()
sns.barplot(x = 'count', y='state', data = recipe_by_state )

plt.title("Recipes by State")
plt.xlabel("State")
plt.ylabel("Count of recipes")

plt.show()

What's the veg vs non-veg ratio

In [None]:
df_diet_type = df_indian_recipes.diet.value_counts().reset_index()
plt.pie(df_diet_type.diet, labels = df_diet_type['index'],autopct='%1.1f%%')
plt.title("Vegetarian vs Non-Vegetarian recipes in dataset")
plt.show()

Let's review types of dishes in the dataset

In [None]:
df_course = df_indian_recipes.course.value_counts().reset_index()
sns.barplot(x = 'course', y = 'index', data = df_course)

plt.title("Cuisines")
plt.show()

Eating is the quickest part of food, let's see how much time it takes to make them

In [None]:
df_cook_time = (df_indian_recipes.prep_time + df_indian_recipes.cook_time).to_frame('total_time').reset_index()
plt.hist(df_cook_time['total_time'],np.arange(5,150,10))

plt.title("Cooking time")
plt.ylabel("Number of recipes")
plt.xlabel("Time in minutes")

plt.show()

# Asking and Answering Questions

Let's change from chef's clothes and get the science overcoat and grill the dataset



Which type of cuisine takes maximum cooking time?

In [None]:
df_temp = df_indian_recipes
df_temp['total_time'] = df_indian_recipes.prep_time + df_indian_recipes.cook_time
df_temp.sort_values('total_time',ascending = False).head()[['name','course','total_time']]

Which Uttar Pradesh dish has maximum ingredients ?

In [None]:
df_up_dishes = (df_indian_recipes[df_indian_recipes['state'] == "Uttar Pradesh"][['name','ingredients']])
def count_ingredient(column):
    return len(column.split(","))
df_up_dishes['ingredient_count'] = df_up_dishes['ingredients'].apply(count_ingredient)

In [None]:
df_up_dishes.sort_values('ingredient_count', ascending = False).head()

![](https://upload.wikimedia.org/wikipedia/commons/thumb/4/48/Jalebi_-_Closeup_View_of_Jalebis.JPG/320px-Jalebi_-_Closeup_View_of_Jalebis.JPG)

Which is most time consuming vegetarian main course?

In [None]:
df_temp[(df_temp['diet'] == 'vegetarian') & (df_temp['course'] == 'main course')].sort_values("total_time", ascending= False).head()

Pindi chana definetly has taken me quite long time to prepare.
![](https://snappygoat.com/b/9b7a98ab492ff24ebb95dbc9291763593ca55351)

Which is least time consuming non-vegetarian main course?

In [None]:
df_temp[(df_temp['diet'] == 'non vegetarian') & (df_temp['course'] == 'main course')].sort_values('total_time', ascending = False).head()

Which is the most time consuming dessert to prepare?

In [None]:
df_temp[(df_temp['course'] == 'dessert')].sort_values('total_time', ascending = False).head()

# References and Future Work

All this food will make and data scientist happy, but let's not stop at the first round of the buffet and review other [great notebooks](https://www.kaggle.com/nehaprabhavalkar/indian-cuisine-analysis/data) on the same dataset