# FOOD RECIPE RECOMMENDATION ENGINE

## Part 2d: Market Basket Analysis

In [1]:
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from src.content_based import *
import warnings
warnings.filterwarnings("ignore")

### Load data

In [2]:
# Load recipes
recipes = pd.read_feather("./data/recipes.feather")
recipes.drop(columns=["level_0", "index"], axis=1, inplace=True)
recipes = recipes.sample(n=10000, random_state=42).reset_index(drop=True) # Use a sample of 10,000 recipes

# Load interactions
interactions = pd.read_feather("./data/interactions.feather")
interactions = interactions[interactions["recipe_id"].isin(recipes["id"])]

### Preprocess text data

In [3]:
recipes["tags"] = recipes["tags"].apply(lambda x: " ".join(x))
recipes["steps"] = recipes["steps"].apply(lambda x: " ".join(x))
recipes["ingredients"] = recipes["ingredients"].apply(lambda x: " ".join(x))

# Get bag of words
cols = ["name", "tags", "steps", "ingredients"]
recipes = preprocess(recipes, cols)
recipes["bag_of_words"] = recipes["name"] + " " + recipes["tags"] + " " + recipes["steps"] + " " + recipes["ingredients"]
words = recipes[["name", "bag_of_words"]]
words.head()

Unnamed: 0,name,bag_of_words
0,crab spinach casserole,crab spinach casserole 60 minutes or less time...
1,curried beef,curried beef weeknight time to make course mai...
2,delicious steamed whole artichokes,delicious steamed whole artichokes 60 minutes ...
3,pork tenderloin with hot mustard sauce,pork tenderloin with hot mustard sauce 60 minu...
4,mixed barbecue sauce,mixed barbecue sauce 15 minutes or less time t...


### Get keywords

In [4]:
words["keywords"] = words["bag_of_words"].apply(get_keywords)
words = words[["name", "keywords"]]
words.set_index("name", inplace=True)
words.head()

Unnamed: 0_level_0,keywords
name,Unnamed: 1_level_1
crab spinach casserole,baking dish top make course main ingredient pr...
curried beef,less heat oil tender meat thickened vegetable ...
delicious steamed whole artichokes,bring may put opinion heart done salt eat pull...
pork tenderloin with hot mustard sauce,pork tenderloin serve sauce combine horseradis...
mixed barbecue sauce,tender less time mixed barbecue sauce 15 minut...


In [5]:
# Vectorize
vectorizer = CountVectorizer()
count_matrix = vectorizer.fit_transform(words["keywords"])

### Calculate similarities

In [6]:
# Get cosine similarities
similarities = cosine_similarity(count_matrix, count_matrix)

### Make recommendations

We pick a random user browsing a random cake recipe. Let's see what our algorithm recommends. Feel free to remove `, random_state=42` in line 2.

In [7]:
# Simulate user browsing one recipe
current_recipe_id = interactions.sample(1, random_state=42)["recipe_id"] # Feel free to remove random_state
current_recipe_id = int(current_recipe_id.values)
current_recipe = recipes[recipes["id"] == current_recipe_id]
current_recipe

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients,bag_of_words
6458,best ever banana cake with cream cheese frosting,67256,75,82367,2003-07-24,weeknight time to make course main ingredient ...,"[503.5, 31.0, 222.0, 15.0, 11.0, 61.0, 25.0]",18,preheat oven to 275f grease and flour a 9 x 13...,this is one of (if not) the best banana cake i...,bananas lemon juice flour baking soda salt but...,13,best ever banana cake with cream cheese frosti...


Our recommender gives us 4 other cakes and a toffee blondie! It is not very hard to see that these recipes share a lot in common with the current recipe and why our user may like them!

In [8]:
# Content-based recommendations
name = "best ever banana cake with cream cheese frosting"
recommend(words, recipes, name, similarities)

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
521,chocolate sauerkraut cake,8348,65,179133,2000-03-16,time to make course main ingredient preparatio...,"[283.4, 18.0, 106.0, 12.0, 8.0, 35.0, 14.0]",16,sift together the flour cocoa baking powder ba...,i adopted this recipe from the recipezaar acco...,unbleached flour baking cocoa baking powder ba...,16
1276,banana toffee blondies,66089,45,64642,2003-07-04,60 minutes or less time to make course main in...,"[234.2, 14.0, 98.0, 7.0, 5.0, 28.0, 11.0]",12,preheat the oven to 350 grease and flour a 9 b...,a lovely recipe from the food section of the h...,all purpose flour baking powder salt butter br...,10
5934,chocolate chip banana snack cake,55843,50,57787,2003-03-07,60 minutes or less time to make course prepara...,"[400.8, 23.0, 154.0, 9.0, 10.0, 44.0, 21.0]",13,preheat oven to 350 degrees grease and flour a...,this cake is so moist you don't need to frost ...,all purpose flour baking powder baking soda sa...,11
7765,retro orange kiss me cake,117795,65,177443,2005-04-18,time to make course preparation for large grou...,"[252.5, 15.0, 93.0, 12.0, 8.0, 9.0, 12.0]",18,preheat oven to 350 degrees grease and flour a...,this cake took the country by storm in 1950 wh...,orange raisins walnuts all purpose flour sugar...,12
9576,beer spice cake,15177,65,21705,2001-12-04,weeknight time to make course main ingredient ...,"[363.1, 26.0, 107.0, 12.0, 9.0, 39.0, 15.0]",10,preheat oven to 375 degrees f grease and flour...,a nice quick spice cake. i usually make it at ...,butter brown sugar egg all purpose flour bakin...,12


### Conclusion

The content-based recommender uses an NLP approach and does a very good job recommending recipes with similar ideas to an existing recipe. The great thing about a content-based recommender is that we do not need users to have previously rated or even viewed any recipes. We can do this for any recipe that exists on the platform. However, this approach assumes that a user would like another recipe with similar content to a recipe that they are already using. It does not take into account the user's historical behavior. In reality, people would not mind (or would even prefer) seeing more diverse recipes, since nobody would simply make the same kind of food over and over! In that case, instead of choosing the most similar recipes, we can choose recipes that are further down in the similarity matrix.