# FOOD RECIPE RECOMMENDATION ENGINE

## Part 2b: Memory-Based Collaborative Filtering

In [1]:
import pandas as pd
import numpy as np
from src.memory_based import *

### Load data

In [2]:
# Load recipes
recipes = pd.read_feather("./data/recipes.feather")
recipes.drop(["level_0", "index"], axis=1, inplace=True)
                          
# Load interactions
interactions = pd.read_feather("./data/interactions.feather")

### User-based collaborative filtering

To save memory, we only choose recipes with more than 30 ratings and users who have given more than 100 ratings. We first build a rating matrix between users and recipes. One of the limitations with user-based collaborative filtering is that the rating matrix is very sparse for large datasets, as a given user usually would not have rated most of the items. This makes the model susceptible to capturing noise as opposed to meaningful signal. However, it is still useful to see what some users like with the information we do have.

In [3]:
# Get a sample of interactions
filtered_ratings = filter_ratings(interactions)
filtered_ratings.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
462,28649,33096,2002-07-29,5,This was very simple and very refreshing. Thi...
463,30298,33096,2002-07-30,4,Light and refreshing! I used a reduced fat gr...
465,22973,33096,2003-08-11,5,"Merlot,\r\n I took the ingredients for making..."
466,37449,33096,2003-08-31,5,So easy and so good! My husband and son scarfe...
467,89831,33096,2004-03-15,5,Merlot...this is the second time that I made y...


In [4]:
# Create rating matrix
rating_matrix = get_rating_matrix(filtered_ratings, "user_id", "recipe_id", n1=30, n2=100)
rating_matrix.head()

Unnamed: 0,user_id,153,198,246,346,376,432,445,456,519,...,501041,501408,505862,508302,513216,514423,514791,515167,517764,517863
0,1533,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,1535,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,1634,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,1676,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,1792,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


For user-based collaborative filtering, we pick a random user and find their closest neighbors using cosine similarity. We can then take the `user_id`s of these users and find the recipes they have rated that our user of interest has not. Next, we get an average rating across all of these users for each recipe, and recommend the top rated recipes out of these to our user of interest. You can try it out yourself below.

In [5]:
# User-based recommendations
np.random.seed(42) # For reproducibility; feel free to comment this out
user_id = np.random.choice(filtered_ratings["user_id"])
user_recommend(user_id, rating_matrix, recipes)

The top 5 recipes recommended for user 353579 are:


Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
42106,Chicago Italian Beef,30484,1090,11158,2002-06-05,"[weeknight, time-to-make, course, main-ingredi...","[1519.4, 90.0, 27.0, 85.0, 243.0, 113.0, 39.0]",9,[cook first 5 ingredients in a crockpot on low...,"you'll be sorry you didn't grow up in chicago,...","[rump roast, beef consomme, good seasonings it...",7
57323,Cookie Jar Peanut Butter Cookies,90136,27,35701,2004-04-26,"[30-minutes-or-less, time-to-make, course, pre...","[102.2, 8.0, 29.0, 3.0, 3.0, 9.0, 3.0]",10,"[heat oven to 350, sift together the flour , b...",my favorite peanut butter cookies! crispy on t...,"[all-purpose flour, baking powder, baking soda...",10
82017,Fannie Farmer's Classic Baked Macaroni Cheese,135350,40,148316,2005-08-29,"[60-minutes-or-less, time-to-make, course, mai...","[836.2, 80.0, 11.0, 37.0, 55.0, 162.0, 21.0]",17,"[preheat oven to 400f, cook and drain macaroni...","to me fannie farmer's recipe is the only ""real...","[macaroni, butter, flour, milk, cream, salt, f...",9
118160,Kittencal's Italian Melt In Your Mouth Meatballs,69173,50,89831,2003-08-20,"[60-minutes-or-less, time-to-make, course, mai...","[1312.6, 129.0, 8.0, 108.0, 214.0, 174.0, 8.0]",5,"[mix all ingredients together in a large bowl,...",cooking the meatballs in simmering pasta sauce...,"[ground beef, egg, parmesan cheese, breadcrumb...",10
119696,Lasagna Cheese Soup,275854,75,407338,2008-01-03,"[time-to-make, course, main-ingredient, cuisin...","[477.6, 45.0, 36.0, 60.0, 76.0, 69.0, 4.0]",8,"[in large 5 quart saucepan , brown ground beef...","hearty, cheesy and just plain good!! the crus...","[ground beef, onion, chicken broth, diced toma...",15


### Item-based collaborative filtering

We can also find the most similar recipes based on rating and recommend them to a user based on the recipes they already like. This time, we compute the cosine similarity between recipes and return the closest matches. One use case for this is when a user has rated a particular recipe, we can recommend similar recipes at the bottom of the page.

In [6]:
# Create rating matrix
rating_matrix = get_rating_matrix(filtered_ratings, "recipe_id", "user_id", n1=30, n2=100)
rating_matrix.head()

Unnamed: 0,recipe_id,1533,1535,1634,1676,1792,1891,2178,2310,2312,...,2000498330,2000943999,2001047423,2001102678,2001297534,2001330613,2001356926,2001362355,2001436530,2001453193
0,153,0.0,5.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,198,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,246,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,376,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [7]:
# Item-based recommendations
np.random.seed(42) # For reproducibility; feel free to comment this out
user_id = np.random.choice(filtered_ratings["user_id"])
recipe_id = get_recipe(filtered_ratings, user_id)
item_recommend(user_id, recipe_id, rating_matrix, recipes, k=5)

User 353579 is currently viewing: Kittencal's Chicken Crescent Roll Casserole
The top 5 recipes recommended for user 353579 are:


Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
45001,Chicken Or Turkey Tetrazzini Casserole,190135,75,89831,2006-10-13,"[time-to-make, course, main-ingredient, prepar...","[722.2, 62.0, 14.0, 47.0, 78.0, 105.0, 16.0]",12,"[set oven to 350 degrees f, butter a 13 x 9-in...",three flavors of canned soup takes the place o...,"[thin spaghetti, butter, onion, garlic, jalape...",16
73393,Dr Pepper Taco Soup,186693,30,257941,2006-09-18,"[30-minutes-or-less, time-to-make, course, mai...","[2205.6, 124.0, 443.0, 461.0, 268.0, 141.0, 87.0]",8,"[brown meat and onions in large stock pot , dr...",the stuff that dreams are made of...don't let ...,"[ground beef, white onion, tomato sauce, corn,...",14
106256,Home Style Beef N Noodles W Mushrooms Onions,138239,135,113941,2005-09-21,"[time-to-make, course, main-ingredient, cuisin...","[911.3, 45.0, 13.0, 37.0, 175.0, 67.0, 24.0]",28,"[try to find two 4 lb, chuck roasts , and cut ...","wish you could smell my kitchen right now, i'm...","[boneless beef chuck roast, fresh coarse groun...",14
118271,Kittencal's Salisbury Steak,118373,80,89831,2005-04-21,"[time-to-make, course, main-ingredient, prepar...","[286.3, 25.0, 9.0, 25.0, 41.0, 34.0, 4.0]",14,[in a bowl mix together all ingredients for th...,"this is total comfort food at it's best, i hav...","[ground beef, fresh garlic, egg, dry onion sou...",15
154654,Paula Deen's Corn Salad,230035,10,116864,2007-05-24,"[15-minutes-or-less, time-to-make, course, mai...","[325.8, 31.0, 22.0, 30.0, 20.0, 37.0, 9.0]",2,"[mix first 5 ingredients and chill, stir in co...",this was a delightful salad. tasty and crunch...,"[whole kernel corn, cheddar cheese, mayonnaise...",6


### Conclusion

User-based and item-based recommendations both belong to memory-based collaborative filtering. The two techniques share many things in common. We create a user-item/item-user matrix with both approaches, and find recipes to recommend by evaluating the cosine similarities of a third variable, `rating`. With memory-based collaborative filtering, the assumption is that a user is similar to other users if their ratings are similar, or that a user would like similar items to what they already like based on rating alone. This assumption is often but not always true. In memory-based collaborative filtering, we do not take into account any intrinsic properties of the recipes that may have attracted users in the first place. If someone likes a a dish because there are mushrooms in it, we should try to recommend other mushroom dishes. Memory-based collaborative filtering does not do this, as we cannot tell which dishes have mushroom and which do not, simply by looking at the ratings.