<h1> Recommendations </h1>

My first thought is to try and make some sort of cosine similarity system to match each user to recipes where they have the highest similarity scores with that recipes ingredients. 

In [1]:
# Importing the variables I need to use from previous notebook + libraries
%store -r mealdf
%store -r userdf
import pandas as pd
import numpy as np

In [2]:
# Using sklearn to employ cosine similarity
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer



In order to get similarity scores, we need to employ TF-IDF (term frequency - inverse document frequency). I learned some of this at my internship working on some NLP last summer. TF-IDF will give a score for each ingredient measuring its importance in a document. By making a document matrix using TF-IDF for both datasets, we will be able to compare them using cosine similarity and thus make meal recommendations.

In [3]:
# Setting up the document-term matrices for both datasets

meal_ingredients = [] # initialize an empty list for meal ingredients
# iterate through each element of the ingredients column in the meal dataset
for ingredients in mealdf['ingredients'] :
    meal_ingredients.extend(ingredients) # adding all of the ingredients to the empty list

unique_ingredients = set(meal_ingredients) # making a set of unique ingredients

# it's only 'stringredients' because I like terrible puns...
# joining ingredients of each meal into a single string and separating by spaces
mealdf['stringredients'] = mealdf['ingredients'].apply(lambda x : ' '.join(x))

# using the tfidf module preprocesses the ingredients (lists of strings)
tfidf = TfidfVectorizer(stop_words='english', vocabulary = unique_ingredients)
doc_mat_meals = tfidf.fit_transform(mealdf['stringredients']) # setting up document matrix for meal data

userdf['liked_stringredients'] = userdf['ingredients'].apply(lambda x: ' '.join(x))
doc_mat_users = tfidf.transform(userdf['liked_stringredients']) # setting up document matrix for user data



In [4]:
# calculating similarity scores using cosine similarity across the two different document term matrices
similarity_scores = cosine_similarity(doc_mat_users, doc_mat_meals)

In [5]:
# creating a dataframe that allows us to visualize the similarity scores comparing the users and the meals
similarity_df = pd.DataFrame(similarity_scores, columns = mealdf['recipeName'], index = userdf['user_name'])
similarity_df

recipeName,Revolutionary Mac & Cheese,Chicago Chicken,Pork Chops with Balsamic Glaze,Chicken Avocado Burgers,Country Fried Steak Recipe With Gravy,Best Basic Burger,Easy Garlic and Lemon Shrimp,Crockpot BBQ Beer Chicken,BBQ Chicken Quesadillas,Black Pepper Steak,...,Easy Thai Red Curry Shrimp,Thai Coconut Curry Soup,Thai Chicken with Basil Stir Fry,Thai Mango and Chicken Curry,Larb Gai Thai Chicken Skillet,Fish Sticks with Thai Peanut Sauce,Grain-free Thai Chicken Meatballs with Coconut Red Curry Sauce {Paleo & Gluten-Free},Thai Coconut Chicken Red Lentil Soup,Thai Mushroom Curry,Thai Red Curry Mussels
user_name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Apple John,0.023973,0.142430,0.024170,0.052136,0.051356,0.019094,0.015908,0.062247,0.059654,0.021644,...,0.012887,0.022430,0.026948,0.033204,0.137468,0.009792,0.007167,0.023275,0.021943,0.013500
Shanelly Bazaldua,0.000000,0.164345,0.000000,0.437344,0.000000,0.000000,0.000000,0.131098,0.077777,0.000000,...,0.000000,0.047240,0.056754,0.110435,0.060196,0.000000,0.000000,0.049020,0.072983,0.000000
Peyton Joseph,0.040113,0.073982,0.008544,0.138437,0.000000,0.006750,0.006655,0.066917,0.090034,0.015930,...,0.005391,0.036633,0.028969,0.044997,0.038666,0.004096,0.027293,0.049806,0.027537,0.010319
Harshani Dharmadasa,0.020418,0.121305,0.020585,0.044403,0.043739,0.016262,0.013549,0.053015,0.050806,0.018434,...,0.010975,0.031848,0.022951,0.028279,0.133319,0.008340,0.028312,0.033048,0.018689,0.011498
Koma Gandy,0.011551,0.104341,0.016567,0.100245,0.049491,0.029746,0.007665,0.082532,0.057214,0.069735,...,0.017216,0.067887,0.080505,0.056907,0.077656,0.043827,0.044398,0.051525,0.069154,0.030244
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Melvin Skochdopole,0.032441,0.097203,0.032790,0.106793,0.057586,0.017325,0.043241,0.068904,0.053357,0.062715,...,0.049117,0.079502,0.090564,0.062133,0.095946,0.032025,0.064257,0.085990,0.079258,0.020766
Stacey Peterson,0.021313,0.081095,0.040161,0.103086,0.050797,0.024148,0.013253,0.018545,0.037508,0.066375,...,0.010736,0.056597,0.066051,0.007273,0.127151,0.030216,0.060899,0.045983,0.000000,0.016212
Mark Cartier,0.020039,0.147364,0.023148,0.124754,0.045344,0.029173,0.029960,0.116721,0.091694,0.070157,...,0.033816,0.076651,0.091174,0.087474,0.105627,0.038015,0.041060,0.078979,0.055272,0.031034
Tasha Plonka,0.039602,0.138302,0.028904,0.115332,0.066629,0.033691,0.036510,0.105558,0.066583,0.058598,...,0.039097,0.067057,0.073068,0.071741,0.147556,0.041995,0.037928,0.063306,0.045977,0.034250


In [6]:
# Forming a dictionary that stores each user from the user data set and their top 5 meals based on similarity score df

N = 5

# create a dictionary to store the top N recommendations for each user
user_recommendations = {}

# iterate through each user and get their top 5 recommendations
for i, user in enumerate(userdf['user_name']):
    scores = list(enumerate(similarity_scores[:, i])) # creating a list of tuples, with an index and similarity score 
    scores.sort(key = lambda x: x[1], reverse = True) # sort scores in descending order, define the key as index 
    top_N = [mealdf.iloc[score[0]]['recipeName'] for score in scores[:N]] # get the top 5 recommendations 
    user_recommendations[user] = top_N # storing the recommendations in the dictionary

In [7]:
applejohn = user_recommendations['Apple John']
applejohn

['Dijon Pork Loin',
 'Duck Buffalo Wings',
 'Old-Fashioned Stuffed Baked Clams',
 'Crawfish Etouffee',
 'Grouper with Tomatillo-and-Green Chile Chutney']