# Recipe Rescuer 
by Team Pie-thon - Amy Kruzick, Jacqueline Vital, and Sagar Ali


#### *A kitchen assistant that suggests recipes based on what's already in your pantry.*

### Overview and Motivation

Have you ever hungrily looked inside your pantry or fridge, but you have no idea what you should eat? This is a (not very) serious issue affecting millions of people globally. The Recipe Rescuer system suggests recipes for you to cook based on ingredients that you already have on hand. The top 3 recipes matching the user's entered ingredients will be presented and ranked according to the number of additional ingredients that have to be purchased. Anyone who has ever needed a small push to start cooking a meal will recognize the need for, and will benefit from, a system like this.

The goals of this project were:

1. To create a recipe suggestion service based on ingredients that are already in the user's home, based on recipes collected from dozens of well-respected cooking websites.
2. To reduce food waste by encouraging people to make the most out of ingredients (especially perishable ones) in their home.

### Related Work

Researchers at the University of Michigan performed similar analysis in 2012. You can view their research paper [here](https://arxiv.org/pdf/1111.3919v3.pdf). This research paper analyzes different ingredients in order to find the relationships between them. We used this paper to give us a better understanding of where we needed to start for our analysis. 

### Initial Questions

The primary question we hoped to answer was, "Can you generate unique recipe suggestions based on a list of what ingredients are in your kitchen?" We wanted to maximize the usefulness of this question, so we focused on making sure that those recipes required a minimal amount of shopping for extra ingredients.

We also wanted to visualize our network of recipes. While we initially planned to use Plotly's 3D network graphs to do this, we determined that NetworkX better suited our needs. After initially graphing the network of a small subset of recipes, we quickly realized that graphing the entire network would result in an unreadable image. We modified our approach to this question by choosing to graph the subset of the network that contained the recipes that were suggested to a user.

As the project progressed, we decided that we also wanted to discover what the most versatile ingredients are - the ones that occur the most frequently across all of the recipes. We cared about answering this question primarily because it would allow us to tell people which ingredients they should always have on hand. Additionally, this allowed us to suggest recipes that use these food staples. Finally, we were also able to graph the most versatile ingredient and all of its immediate connections to demonstrate how important that ingredient is to cooking.

We were also curious about whether or not we could place ingredients into categories using Latent Dirichlet Allocation (LDA). This was an additional question that we attempted to answer near the end of the system's development. The extracted ingredient groupings could easily be identified as sweet or savory, but more detailed categorization wasn't possible.

### Our Data

The recipe data for this project comes from a recipe database called Open Recipes on github (https://github.com/fictivekin/openrecipes). There are approximately 173,000 recipes in this data that come from dozens of well-known cooking websites. This data was an excellent find, but it required a great deal of cleanup to prepare it for our analysis.

Since there were no identifiable patterns to how the ingredients were listed, we had to search through every listed ingredient in every recipe to extract out the food item (ex. low fat no GMO organic milk -> milk). This was possible by creating a list of 1600 food items and checking to see if one of these food items were a substring of the listed ingredient. If none of the food items were a substring of the listed ingredient then that food item was still added. This was done so that data would not be completely lost.

This proccess took a great deal of time. The algorithm took approximately 25 hours even though we split up and ran different subsets of the data between three people. After running the data through this algorithm, finishing touches were added using regex patterns.

After the cleaning proccess, we were only left with approximately 90,0000 recipe entries since many of the data entries had incorrect data (ex. no food items were listed in ingredient column).

### Exploratory Data Analysis

While the large dataset was being cleaned, we cleaned approximately 200 of the recipes by hand so that we could begin our analysis. All of the analysis shown here was first performed on that small dataset and was finalized after testing it with the full cleaned dataset.

First, we imported the csv file containing the clean recipes into a dataframe:

In [1]:
# Converts to csv file to Pandas dataframe
import pandas as pd

clean_df =  pd.read_csv('Clean_Recipes.csv')
clean_df.head()

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Unnamed: 0.1.1,url,ingredients,name,after,Regex
0,0,0,0,http://thepioneerwoman.com/cooking/2013/03/dro...,"Biscuits,3 cups All-purpose Flour,2 Tablespoon...",Drop Biscuits and Sausage Gravy,"biscuits,all-purpose flour,baking powder,salt,...","butter,whole milk,sausage,butermilk,biscuits,b..."
1,1,1,1,http://thepioneerwoman.com/cooking/2013/03/hot...,12 whole Dinner Rolls Or Small Sandwich Buns (...,Hot Roast Beef Sandwiches,"sandwich,beef,provolone,mayonnaise,onion,poppy...","poppy seeds,sandwich,beef,onion,provolone,hors..."
2,2,2,2,http://www.101cookbooks.com/archives/moroccan-...,"Dressing:,1 tablespoon cumin seeds,1/3 cup / 8...",Morrocan Carrot and Chickpea Salad,"dressing,cumin seeds,olive oil,lemon juice,hon...","olive oil,mint,carrots,cayenne pepper,honey,le..."
3,3,3,3,http://thepioneerwoman.com/cooking/2013/03/mix...,"Biscuits,3 cups All-purpose Flour,2 Tablespoon...",Mixed Berry Shortcake,"biscuits,all-purpose flour,baking powder,sugar...","butter,yogurt,biscuits,almond,sugar,baking pow..."
4,4,4,4,http://www.101cookbooks.com/archives/pomegrana...,"For each bowl: ,a big dollop of Greek yogurt,2...",Pomegranate Yogurt Bowl,"for each bowl,yogurt,pomegranate juice,honey,q...","pomegranate juice,honey,quinoa,for each bowl,s..."


Next, we built a network of all the recipes using NetworkX. Every node in the network is a unique ingredient; that is, no two nodes represent the same ingredient. An edge between 2 nodes indicates that those two ingredients are co-located in a recipe. 

As shown by the code below, there were **15,850** unique ingredients represented in the 90,000 recipes in the dataset with **398,428** edges between them. After initially graphing this network using Matplotlib, we determined that the visualization was far too densely connected to be truly useful.

In [4]:
# Creates network of all the recipes using NetworkX
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib qt

all_recipes = nx.Graph()
recipes = []

for index,row in clean_df.iterrows():
    ingredient_list = row[7]
    recipes.append([index,row[5], row[7], row[3]])
    ingredient_list = ingredient_list.replace('[','').replace(']','')
    ingredients = ingredient_list.split(",")
    
    #remove extra whitespaces:
    for ingr in range(0, len(ingredients)):
        ingredients[ingr] = ingredients[ingr].strip(" ")
        
    for ingr in ingredients: #add nodes for all unique ingredients
        if not (ingr in all_recipes):
            all_recipes.add_node(ingr) 
    for ingr in range(0, len(ingredients)-1): #add edges between co-ingredients
        for index in range(ingr+1, len(ingredients)):
            all_recipes.add_edge(ingredients[ingr], ingredients[index])

print "There are",all_recipes.number_of_nodes(), "unique ingredients in the dataset and there are",all_recipes.number_of_edges(),"connections between them."

There are 15850 unique ingredients in the dataset and there are 398428 connections between them.


Next, we created our recipe suggestion algorithm. The basic steps of the algorithm are the following:

1. For every recipe, count how many ingredients the user would have to buy and how many ingredients of theirs are utilized.
2. Eliminate recipes that the user doesn't have any ingredients for.
3. Sort the remaining recipes according to how many ingredients the user is missing.
4. Find the recipe that has the least number of ingredients missing.
5. Find another recipe that has the least number of ingredients missing, ensuring that it is not the same recipe as in step 4.
6. Find the final recipe that has the least number of ingredients missing, ensuring that it is not the same recipe as in step 4 or 5.
7. Return the 3 recipes.

The code for the suggestRecipes function can be seen below.

In [5]:
import heapq

# Main suggestor function:
def suggestRecipes(ingredients, recipe_list):
    num_missing = [0]*len(recipe_list) #number of ingredients still missing
    num_used = [0]*len(recipe_list) #number of user ingredients included
    ingr_count = [0]*len(recipe_list) #total ingredients in the recipe
    
    #iterate over every recipe and update the above counts
    for r in range(0, len(recipe_list)):
        recipe_list[r][2] = recipe_list[r][2].replace('[','').replace(']','')
        recipe = recipe_list[r][2].split(",")
        for ingr in range(0, len(recipe)):
            recipe[ingr] = recipe[ingr].strip(" ")
        for ingr in recipe:
            ingr_count[r] += 1
            if not(ingr in ingredients):
                num_missing[r] += 1
            else:
                num_used[r]+=1
    
    #eliminate any recipes that you have NO ingredients for
    for i in range(0,len(num_missing)):
        if num_missing[i] == ingr_count[i]:
            num_missing[i] = 100000000
    
    #find 10000 recipes with smallest # of ingredients missing (ensures we stay in bounds)
    recommended = heapq.nsmallest(10000,num_missing)
    r_index = 0
    
    #locate 3 recipes that have that the least ingredients missing, given that at least 4 ingredients must be used
    first_index = num_missing.index(recommended[r_index])
    first_recipe = [recipe_list[num_missing.index(recommended[0])],recommended[0]]
    
    first_used_ingrs = num_used[recommended[r_index]]

    while (first_used_ingrs < 4):
        for i in range (0, len(recipe_list)):
            if((num_missing[i] == recommended[r_index]) and num_used[i] >= 4):
                first_recipe = [recipe_list[i],recommended[r_index]]
                first_index = recipe_list.index(first_recipe[0])
                first_used_ingrs = num_used[i]
                break
        r_index += 1
    
    second_recipe=[]
    third_recipe=[]
    
    #ensure that you don't accidentally repeat a recipe!
    second_index = num_missing.index(recommended[r_index])
    if (second_index != first_index): #if 2nd recipe != 1st
        second_used_ingrs = num_used[recommended[r_index]]
        if (second_used_ingrs >= 4):
            for i in range(0, len(recipe_list)):
                if (i != first_index):
                    if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                        second_recipe = [recipe_list[i], recommended[r_index]]
                        
        while (second_used_ingrs < 4): #make sure 4 or more ingredients are used
            for i in range(0, len(recipe_list)):
                if (i != first_index):
                    if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                        second_recipe = [recipe_list[i],recommended[r_index]]
                        second_used_ingrs = num_used[i]
                        second_index = i
                        break
            r_index += 1

        #repeat for 3rd recipe
        third_index = num_missing.index(recommended[r_index])
        if ((third_index != first_index) and (third_index != second_index)):
            third_used_ingrs = num_used[recommended[r_index]]
            while (third_used_ingrs < 4): #make sure 4 or more ingredients used
                for i in range(0, len(recipe_list)):
                    if ((i != first_index) and (i != second_index)):
                        if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                            third_recipe = [recipe_list[i], recommended[r_index]]
                            third_used_ingrs = num_used[i]
                            break
                r_index += 1
        else:
            third_used_ingrs = num_used[recommended[r_index]]
            if (third_used_ingrs >= 4):
                for i in range(0, len(recipe_list)):
                    if ((i != first_index) and (i != second_index)):
                        if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                            third_recipe = [recipe_list[i], recommended[r_index]]
            
            while (third_used_ingrs < 4): #make sure 4 or more ingredients used
                for i in range(0, len(recipe_list)):
                    if ((i != first_index) and (i != second_index)):
                        if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                            third_recipe = [recipe_list[i], recommended[r_index]]
                            third_used_ingrs = num_used[i]
                            break
                r_index += 1
            
    else: #ensure 2nd recipe is unique
        second_used_ingrs = num_used[recommended[r_index]]        
        if (second_used_ingrs >= 4):
            for i in range(0, len(recipe_list)):
                if (i != first_index):
                    if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                        second_recipe = [recipe_list[i], recommended[r_index]]
                        
        while (second_used_ingrs < 4): #make sure 4 or more ingredients are used
            for i in range(0, len(recipe_list)):
                if (i != first_index):
                    if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                        second_recipe = [recipe_list[i],recommended[r_index]]
                        second_used_ingrs = num_used[i]
                        break
            r_index += 1
        
        second_index = recipe_list.index(second_recipe[0])
        third_index = num_missing.index(recommended[r_index])
        
        if ((third_index != first_index) and (third_index != second_index)):
            third_used_ingrs = num_used[recommended[r_index]]
            if (third_used_ingrs >= 4):
                for i in range(0, len(recipe_list)):
                    if ((i != first_index) and (i != second_index)):
                        if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                            third_recipe = [recipe_list[i], recommended[r_index]]
                        
            while (third_used_ingrs < 4): #make sure 2 or more ingredients used
                for i in range(0, len(recipe_list)):
                    if ((i != first_index) and (i != second_index)):
                        if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                            third_recipe = [recipe_list[i], recommended[r_index]]
                            third_used_ingrs = num_used[i]
                            break
                r_index += 1
                
        else:
            third_used_ingrs = num_used[recommended[r_index]]
            if (third_used_ingrs >= 4):
                for i in range(0, len(recipe_list)):
                    if ((i != first_index) and (i != second_index)):
                        if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                            third_recipe = [recipe_list[i], recommended[r_index]]
            
            while (third_used_ingrs < 4): #make sure 2 or more ingredients used
                for i in range(0, len(recipe_list)):
                    if ((i != first_index) and (i != second_index)):
                        if ((num_missing[i] == recommended[r_index]) and (num_used[i] >= 4)):
                            third_recipe = [recipe_list[i], recommended[r_index]]
                            third_used_ingrs = num_used[i]
                            break
                r_index += 1
    
    #finally, return the top 3 recommended recipes
    return [first_recipe,second_recipe, third_recipe]

Additionally, we created helper functions to list the ingredients that are still missing from a recipe and to print all of the suggestions to the user:

In [6]:
# Helper functions:

def listMissingIngredients(user_ingr,recipe):
    still_missing = []
    recipe = recipe.replace('[','').replace(']','')
    recipe_ingr = recipe.split(',')
    for i in range(0, len(recipe_ingr)):
        recipe_ingr[i] = recipe_ingr[i].strip(" ")
    for ingr in recipe_ingr:
        if ingr not in user_ingr:
            still_missing.append(ingr)
    return still_missing


def printSuggested(user_ingr,suggested):
    # replace "&amp;" with "&" to clean up appearance
    if ("&amp;" in suggested[0][0][1]):
        suggested[0][0][1] = suggested[0][0][1].replace("&amp;",'&')
    if ("&amp;" in suggested[1][0][1]):
        suggested[1][0][1] = suggested[1][0][1].replace("&amp;",'&')
    if ("&amp;" in suggested[2][0][1]):
        suggested[2][0][1] = suggested[2][0][1].replace("&amp;",'&')
        
    #print out the top recipes:
    print "\n\n******* RECOMMENDED RECIPES WITH THESE INGREDIENTS *******"
    print "Top suggested recipe: ", suggested[0][0][1]
    missing_ingr = listMissingIngredients(user_ingr, suggested[0][0][2])
    if (suggested[0][1] == 0):
        print "You don't need any extra ingredients!"
    else:
        print "You still need", suggested[0][1],"ingredients:"
    for ingr in missing_ingr:
        print "\t",ingr
    print "Full instructions can be found at:", suggested[0][0][3]

    print "\nSecond suggested recipe: ", suggested[1][0][1]
    missing_ingr = listMissingIngredients(user_ingr, suggested[1][0][2])
    if (suggested[1][1] == 0):
        print "You don't need any extra ingredients!"
    else:
        print "You still need", suggested[1][1],"ingredients:"
    for ingr in missing_ingr:
        print "\t",ingr
    print "Full instructions can be found at:", suggested[1][0][3]

    print "\nThird suggested recipe: ", suggested[2][0][1]
    missing_ingr = listMissingIngredients(user_ingr, suggested[2][0][2])
    if (suggested[2][1] == 0):
        print "You don't need any extra ingredients!"
    else:
        print "You still need", suggested[2][1],"ingredients:"
    for ingr in missing_ingr:
        print "\t",ingr
    print "Full instructions can be found at:", suggested[2][0][3]

To obtain the user's ingredients, we allow them to type in what ingredients they would like to use. After hitting enter, the above functions are run and the user is presented with the top 3 suggested recipes. 

In [11]:
print "Please enter 4 or more ingredients you'd like to use, separated by commas.\nHit enter when finished.\n"
user_ingredients = raw_input().split(",")
for i in range(0, len(user_ingredients)):
    user_ingredients[i] = user_ingredients[i].strip(" ")

print "\nSearching for recipes..."
suggested_recipes = suggestRecipes(user_ingredients, recipes)
printSuggested(user_ingredients,suggested_recipes)

Please enter 4 or more ingredients you'd like to use, separated by commas.
Hit enter when finished.

butter, chicken, salt, wine, pepper, beef

Searching for recipes...


******* RECOMMENDED RECIPES WITH THESE INGREDIENTS *******
Top suggested recipe:  Roasted Beef Tenderloin
You still need 2 ingredients:
	whole peppercorns
	olive oil
Full instructions can be found at: http://thepioneerwoman.com/cooking/2007/07/roasted_beef_te/

Second suggested recipe:  Roast Turkey with Giblet Gravy
You still need 2 ingredients:
	flour
	giblets
Full instructions can be found at: http://www.epicurious.com/recipes/food/views/Roast-Turkey-with-Giblet-Gravy-51120010

Third suggested recipe:  Oven Brown Rice
You still need 2 ingredients:
	brown rice
	garlic
Full instructions can be found at: http://allrecipes.com/Recipe/Oven-Brown-Rice/Detail.aspx


The recommender system tells the user the names of the recipes, what additional ingredients they would need to purchase (if any), and provides a link to the full instructions.

As mentioned previously, we also present the user with a graphical representation of the recommended recipes. This visualization shows the high connectivity of a tiny subset of the overall recipe network. The more connections that a node has, the more green it appears. The fewer connections that a node has, the more white it appears.

In [13]:
import numpy as np

def drawGraph(g,showLabels):
    pos = nx.spring_layout(g)
        
    degrees = g.degree()
    node_colors = np.asarray([degrees[n] for n in g.nodes()])
    
    #make background color gray
    fig = plt.figure()
    ax = fig.add_subplot(111)
    #recolor nodes based on degree
    nx.draw(g,pos, node_color=node_colors, cmap=plt.cm.Greens,font_color="#bbbbbb",with_labels=showLabels)
    fig.set_facecolor("#666666")
    plt.show()

#show graph of just the top 3 recipes
fitted_graph = nx.Graph()
top_recipes = [suggested_recipes[0][0][2],suggested_recipes[1][0][2],suggested_recipes[2][0][2]]

for recipe in top_recipes:
    ingr_list = recipe.split(",")
    for ingr in range(0, len(ingr_list)):
        ingr_list[ingr] = ingr_list[ingr].strip(" ")
    
    for ingr in ingr_list:
        if not (ingr in fitted_graph): #add nodes for all unique ingredients
            fitted_graph.add_node(ingr)
        for ingr in range(0, len(ingr_list)-1): #add edges between co-ingredients
            for index in range(ingr+1, len(ingr_list)):
                fitted_graph.add_edge(ingr_list[ingr],ingr_list[index])
        
drawGraph(fitted_graph,True)

<img src="user_ingr.png">


We also performed analysis on the recipe network to determine what the most useful ingredients are. These are the ingredients that have the most connections to other ingredients, meaning that they are used in the most recipes. Take a look at the top 30 most common ingredients below. You might be wondering what the best recipes would be with those ingredients - we've answered that, too.

In [14]:
#use network of all recipes to find the top 30 MVP ingredients
from operator import itemgetter
      
neighbors = []
for node in all_recipes.nodes():
    neighbors.append([node, len(all_recipes.neighbors(node)), all_recipes.neighbors(node)])

sorted_neighbors = sorted(neighbors, key=itemgetter(1),reverse=True)

top_ingredients = list(sorted_neighbors[0:30])

ingr_names = []
for ingr in top_ingredients:
    ingr_names.append(ingr[0])
    
print "The 30 most common ingredients used in recipes are:"
for i in range(0, len(ingr_names)):
    print (i+1), "-", ingr_names[i]
    
top_recipes = suggestRecipes(ingr_names, recipes)
printSuggested(ingr_names,top_recipes)

The 30 most common ingredients used in recipes are:
1 - salt
2 - butter
3 - olive oil
4 - garlic
5 - sugar
6 - water
7 - onion
8 - pepper
9 - black pepper
10 - eggs
11 - egg
12 - milk
13 - flour
14 - tomatoes
15 - cheese
16 - parsley
17 - lemon
18 - vegetable oil
19 - onions
20 - lemon juice
21 - ginger
22 - cinnamon
23 - thyme
24 - brown sugar
25 - all purpose flour
26 - cream
27 - chicken
28 - vanilla
29 - vinegar
30 - celery


******* RECOMMENDED RECIPES WITH THESE INGREDIENTS *******
Top suggested recipe:  Mini Dutch Babies
You don't need any extra ingredients!
Full instructions can be found at: http://www.aspicyperspective.com/2010/06/multi-purpose-meal.html

Second suggested recipe:  Vanilla Ice Cream Base
You don't need any extra ingredients!
Full instructions can be found at: http://www.epicurious.com/recipes/food/views/Vanilla-Ice-Cream-Base-51172800

Third suggested recipe:  Cream Cheese Arepas
You don't need any extra ingredients!
Full instructions can be found at: http://al

Finally, we took a look at the most common ingredient - **salt**. It's directly connected to **7,906** other ingredients! We graphed its node and 500 of its immediate connections to visualize how important it is to cooking.

In [17]:
#Let's look at what the most common ingredient and its neighbors look like:

most_common = nx.Graph()
most_common.add_node(top_ingredients[0][0]) #add node for top ingredient
for neighbor in top_ingredients[0][2][0:500]:
    most_common.add_edge(top_ingredients[0][0],neighbor)
    
print top_ingredients[0][0],"is the most commonly used ingredient. It's directly connected to", len(top_ingredients[0][2]),"other ingredients!"
drawGraph(most_common, False)

salt is the most commonly used ingredient. It's directly connected to 7906 other ingredients!


<img src="mvp_ingr.png">

Finally, Recipe Rescuer also performs topic modelling using Latent Dirichlet Allocation (LDA). The topics found represent clusters of ingredients that co-occur with one another across recipes. In our case, we extracted 20 topics and the top 8 most characteristic ingredients from each topic. We found that the collection of ingredients in each of the extracted topics were either sweet or savory. The results of our LDA analysis on the data can be seen in the image below. 
<img src ="LDA.PNG">

### Final Analysis & Discussion

The primary goal of Recipe Rescuer was to create a recipe suggestion service based on ingredients that are already in the user's home, based on recipes collected from dozens of well-respected cooking websites. We were able to successfully achieve this goal through building a network of ingredients from 90,000 recipes. A user can enter a handful of ingredients they would like to use, and Recipe Rescuer uses this network to find and visualize 3 recipes that require the least amount of additional shopping. The network is also used to determine what the most commonly used ingredients are across all of the recipes. Finally, LDA was used to classify ingredients according to whether they are sweet or savory.

We learned that there are many diverse ingredients and recipes available on the internet. We also learned that staple ingredients such as salt, butter, and eggs can be combined in many different ways with just a few other ingredients to create many different meals. Through our network analysis, we learned how different ingredients are connected. Using the results of Recipe Rescuer, we realized how easy it can be to cook meals using just a few simple ingredients that you already have lying around.


If you would like to visit our website, which includes all of our discussion here and all of our visualizations, please go to [this link](https://www.youtube.com/watch?v=cK_lCIINXWM&feature=youtu.be).

If you would like to view a summary video of Recipe Rescuer, please go to [this link](https://github.tamu.edu/pages/sagar794/RecipeRescuer/).