**RECOMMENDATION TOOLS GROUP PROJECT**



Philip Borchert


MBD 2022-2023


NANDINI GANTAYAT, PRIYA YADAV AND YASHWANTH THONUKUNURU

**Project Description**

Our project involves building a recommender system that recommends top 2 ingredients and corresponding recipes, as well as healthy recipes based on the nutrition values and ingredient alternatives for Food.com customers. To accomplish this task, we are using Collaborative Filtering technique with the Implicit Python library. This library is built for sparse matrices and provides a fast implementation of the Alternating Least Squares algorithm. It also includes evaluation metrics to measure the performance of the model.

In [None]:
pip install implicit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [None]:
import numpy as np
import pandas as pd
from implicit.nearest_neighbours import bm25_weight
from implicit.als import AlternatingLeastSquares
from implicit.evaluation import train_test_split, ranking_metrics_at_k

**Project Setup**


Our project involves the analysis of two datasets: metadata and train. These datasets contain information about customers of food.com, their feedback, recipes, ingredients, nutritional values, ratings, and other related details. We will be working with these datasets to extract insights and build a recommendation system.

**We will read in the metadata dataset into a dataframe. The dataset contains information about recipes such as the recipe name, recipe ID, cooking time in minutes, contributor ID, tags associated with the recipe, nutrition values for the recipe, cooking steps for the recipe, recipe description and list of ingredients used in the recipe. The format of the dataset is as follows:**

name : String

id : String

minutes : Float

contributor_id : String

tags : List(String)

nutrition : List(Float)

steps : List(String)

description : String

ingredients : List(String)



**We will read in the train dataset into a dataframe. The dataset contains information about users of food.com. The format of the dataset is as follows:**


user_id: a string representing the unique identifier for the user who rated a recipe

recipe_id: a string representing the unique identifier for the recipe that was rated

date: a date representing the date when the rating was given

rating: a float representing the rating given by the user for the recipe on a scale of 1 to 6

review: a string representing the text review given by the user for the recipe.






**Preprocessing**

After reading the metadata, we are renaming the 'id' column in our metadata dataset to 'recipe_id' for consistency with our other dataset. Then, we are using the map() function to apply the eval() function to the 'nutrition' column, which converts the string representation of a list of floats into an actual list of floats.

Finally, we are creating new columns in our metadata dataset for each of the nutritional values (calories, fat, sugar, sodium, protein, saturated fat, and carbs), and populating these columns with the corresponding values from our 'nutrition' list. We are doing this using the tolist() function to convert the 'nutrition' column to a list, and the pd.DataFrame() function to create a new dataframe from this list. We are then assigning this new dataframe to the appropriate columns in our metadata dataframe.

In [None]:
metadata = pd.read_csv('/content/metadata.csv')
metadata = metadata.rename(columns={'id': 'recipe_id'})
metadata['nutrition'] = metadata['nutrition'].map(eval)
metadata[['calories', 'fat', 'sugar', 'sodium', 'protien', 'saturated_fat', 'carbs']] = pd.DataFrame(metadata['nutrition'].tolist(), index=metadata.index)
metadata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 231637 entries, 0 to 231636
Data columns (total 19 columns):
 #   Column          Non-Null Count   Dtype  
---  ------          --------------   -----  
 0   name            231636 non-null  object 
 1   recipe_id       231637 non-null  object 
 2   minutes         231637 non-null  int64  
 3   contributor_id  231637 non-null  object 
 4   submitted       231637 non-null  object 
 5   tags            231637 non-null  object 
 6   nutrition       231637 non-null  object 
 7   n_steps         231637 non-null  int64  
 8   steps           231637 non-null  object 
 9   description     226658 non-null  object 
 10  ingredients     231637 non-null  object 
 11  n_ingredients   231637 non-null  int64  
 12  calories        231637 non-null  float64
 13  fat             231637 non-null  float64
 14  sugar           231637 non-null  float64
 15  sodium          231637 non-null  float64
 16  protien         231637 non-null  float64
 17  saturated_

In [None]:
train = pd.read_csv('/content/train.csv')
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 165226 entries, 0 to 165225
Data columns (total 5 columns):
 #   Column     Non-Null Count   Dtype 
---  ------     --------------   ----- 
 0   user_id    165226 non-null  object
 1   recipe_id  165226 non-null  object
 2   date       165226 non-null  object
 3   rating     165226 non-null  int64 
 4   review     165226 non-null  object
dtypes: int64(1), object(4)
memory usage: 6.3+ MB


Merging metadata and train, using the common column 'recipe_id'. The resulting merged_df dataframe contains columns for recipe_id, ingredients, nutritional information (calories, fat, sugar, sodium, protein, saturated fat, and carbs), user_id, and rating. The merge is done in such a way that only rows with matching recipe_ids in both dataframes are included in the merged_df. The resulting merged_df dataframe provides a consolidated view of recipe information and user ratings, which will be used for building the recommendation system.

In [None]:
merged_df = metadata[['recipe_id', 'ingredients', 'calories', 'fat', 'sugar', 'sodium', 'protien', 'saturated_fat', 'carbs']].merge(train[['user_id', 'recipe_id', 'rating']], on = 'recipe_id')
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 165226 entries, 0 to 165225
Data columns (total 11 columns):
 #   Column         Non-Null Count   Dtype  
---  ------         --------------   -----  
 0   recipe_id      165226 non-null  object 
 1   ingredients    165226 non-null  object 
 2   calories       165226 non-null  float64
 3   fat            165226 non-null  float64
 4   sugar          165226 non-null  float64
 5   sodium         165226 non-null  float64
 6   protien        165226 non-null  float64
 7   saturated_fat  165226 non-null  float64
 8   carbs          165226 non-null  float64
 9   user_id        165226 non-null  object 
 10  rating         165226 non-null  int64  
dtypes: float64(7), int64(1), object(3)
memory usage: 15.1+ MB


A new DataFrame called df is been created by selecting only the user_id and ingredients columns from merged_df.
The ingredients column is converted from a string representation of a list to an actual list of ingredients using the eval() function.
The df DataFrame is then "exploded" so that each row contains only one ingredient, effectively creating a new row for each ingredient in the original ingredients list.
Finally, a pivot table is created from the exploded df DataFrame, where each row represents a user and each column represents an ingredient. The values in the pivot table are the counts of each ingredient for each user, with a default value of 0 for users who did not use a particular ingredient. This pivot table will be used to build the ingredients recommendation system.

In [None]:
df = merged_df[['user_id', 'ingredients']]
df['ingredients'] = df['ingredients'].map(eval)
df = df.explode('ingredients')
pivot_df = pd.pivot_table(df, index='user_id', columns='ingredients', values='ingredients', aggfunc=lambda x: len(x), fill_value=0)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df['ingredients'] = df['ingredients'].map(eval)


In [None]:
pivot_df.shape

(11346, 9958)

In this step, we are working with the merged_df data to create a recipe level dataset. We start by grouping the dataset by recipe ID and calculating the average rating and the number of ratings received by each recipe. This is done to identify popular and well-rated recipes. We also merge the metadata dataset with the recipe-level dataset to add the nutritional information and the list of ingredients for each recipe.

To clean the data, we convert the 'ingredients' column from a string format to a set format, which makes it easier to work with. Finally, we obtain a recipe-level dataset with columns such as 'mean_rating', 'num_rating', 'ingredients', 'calories', 'fat', 'sugar', 'sodium', 'protein', 'saturated_fat', and 'carbs'. This dataset is used to recommend recipes to users based on the recommended ingredients and thus the healthy recipes by considering the nutritional requirements/values.

In [None]:
recipe_df = merged_df.groupby('recipe_id').agg(
mean_rating = pd.NamedAgg(column='rating', aggfunc='mean'),
num_rating = pd.NamedAgg(column='rating', aggfunc='count')
)

recipe_df = recipe_df.merge(merged_df.loc[~merged_df['recipe_id'].duplicated(), ['recipe_id', 'ingredients', 'calories', 'fat', 'sugar', 'sodium', 'protien', 'saturated_fat', 'carbs']], left_index=True, right_on='recipe_id')
recipe_df['ingredients'] = recipe_df['ingredients'].map(lambda x: set(eval(x)))
recipe_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 62517 entries, 163227 to 68572
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   mean_rating    62517 non-null  float64
 1   num_rating     62517 non-null  int64  
 2   recipe_id      62517 non-null  object 
 3   ingredients    62517 non-null  object 
 4   calories       62517 non-null  float64
 5   fat            62517 non-null  float64
 6   sugar          62517 non-null  float64
 7   sodium         62517 non-null  float64
 8   protien        62517 non-null  float64
 9   saturated_fat  62517 non-null  float64
 10  carbs          62517 non-null  float64
dtypes: float64(8), int64(1), object(2)
memory usage: 5.7+ MB


In order to work with the data in our recommendation system, we created two mapping tables. The first, user_map, maps the user IDs to a sequential ID, which allows us to use a more efficient numeric representation of the users in our model.(The user_map dataframe maps the original user IDs to new, sequential IDs starting from zero. This is useful because the sequential IDs can be used to index into a matrix representing the user-ingredient data, which is more efficient than using the original IDs.) 

The second, ingredient_map, maps the ingredient names to another sequential ID, which again allows for more efficient processing in our model.(The ingredient_map dataframe maps each ingredient to a new, sequential ID. This is necessary because the ingredient names are strings, and cannot be used as indices in a matrix. The sequential IDs can be used instead to represent each ingredient in the matrix.)

These mapping tables will be useful in our later stages of model training and evaluation, as well as in the implementation of our final recommendation system.

In [None]:
user_map = pd.DataFrame({'userid': pivot_df.index, 'id': range(len(pivot_df.index))})
ingredient_map = pd.DataFrame({'ingredient': pivot_df.columns, 'id': range(len(pivot_df.columns))})

Now, the pivot_df, which contains the user-ingredients matrix, is transformed into a weighted matrix using the BM25 weighting algorithm, which is commonly used for text search. 

The BM25 weighting scheme is a popular method for weighting term frequencies in information retrieval. It is designed to take into account both the frequency of the term in the document and the frequency of the term in the entire corpus of documents. In our case, the documents are the users and the terms are the ingredients they use. The resulting weighted_df dataframe contains the BM25 weights for each ingredient used by each user.

The resulting weighted matrix is then split into training and testing sets, where 80% of the data is used for training and the rest is used for testing. This split is performed randomly using a seed value of 42 to ensure consistency in the results. 

The resulting X_train and X_test matrices are in compressed sparse row (csr) format, which is an efficient way of storing large sparse matrices. These matrices contain the BM25 weights for each ingredient used by each user in the training and testing sets, respectively.



In [None]:
weighted_df = bm25_weight(pivot_df.T, K1=100, B=0.8)
X = weighted_df.T.tocsr()
X_train, X_test = train_test_split(X, train_percentage=0.8, random_state= 42)

We are using the Alternating Least Squares (ALS) algorithm to generate a recommendation model by training the matrix factorization model on the training set (X_train). The model is created using the AlternatingLeastSquares class and is initialized with three hyperparameters: the number of factors (factors), the regularization coefficient (regularization), and the confidence parameter (alpha).(To build the model, we first set the number of factors to 300 and regularization to 0.05 to control for overfitting. We also set the alpha parameter to 2.0 to control the confidence of the model )

Once the model is created, it is trained on the training set using the fit() method. This method performs an iterative process that alternates between updating the user and item factors in order to minimize the reconstruction error between the predicted and actuals. After training, the model has learned the latent factors for both the users and items in the training set.
(This process calculates the user and item factors matrices that best approximate the observed top prefered ingredients matrix. These factors are then used to predict the top ingredients for the items(ingredients) that the user has not yet used.)


This type of model is commonly used for collaborative filtering recommendation systems, where it is used to predict how the user would give interact to an item they have not yet interacted with, based on the user's past behavior and the behavior of similar users.



In [None]:
model = AlternatingLeastSquares(factors=300, regularization=0.05, alpha=2.0)
model.fit(X_train)

  0%|          | 0/15 [00:00<?, ?it/s]

Now the performance of the model is evaluated using the ranking metrics at K. Ranking metrics are used to evaluate how well the model is able to rank the items for a given user based on their preferences. The model parameter is the trained ALS model we just created. The X_train and X_test parameters are the training and testing datasets respectively that we split earlier. Finally, the K parameter specifies the top K items that we want to consider for evaluation.

By running this code, we can get metrics such as precision, recall, and MAP (mean average precision) that tell us how well the model is performing at recommending items to users. These metrics are important for evaluating the effectiveness of a recommendation system, as they allow us to measure how well the system is able to predict items that users will actually like.


ranking_metrics_at_k(model, X_train, X_test, K=2) is calculating ranking metrics for the trained model. Specifically, it is calculating precision, recall, and F1-score at cutoff K=2, which means we are evaluating the top 2 recommendations made by the model for each user.

[We tried K=3, 4, 5 to recommend top 3,4,5 ingredients to the user but after that we were not getting the recipes corresponding to those ingredients so after trial and error we decided to go with K=2.]

In [None]:
ranking_metrics_at_k(model, X_train, X_test, K=2)

  0%|          | 0/11339 [00:00<?, ?it/s]

{'precision': 0.3916629907366564,
 'map': 0.35662315900873093,
 'ndcg': 0.40392977715568995,
 'auc': 0.5220302820637847}

We randomly select a user from our dataset to test the performance of our model. The test_user variable contains the ID of the selected user, which we will use to generate ingredients-recipe recommendations for them.
(test_user is a randomly selected user from the user_map dataframe. It is obtained by using the sample() function on user_map with an argument of 1, which returns a random sample of 1 row. squeeze() is used to convert the resulting DataFrame into a Series.
In other words, test_user represents a single user in the recommendation system, which will be used to generate personalized recipe recommendations.)

In [None]:
test_user = user_map.sample(1).squeeze()
test_user

userid    U4901702
id            4839
Name: 4839, dtype: object

Now after selecting the random users personalized ingredient is recommended for the selected test user. It first passes the test user's ID to the trained model, which returns two recommended ingredient IDs and their corresponding scores. The recommend() function is called on the trained model to generate a list of ingredient IDs and corresponding scores for the selected user. The N parameter specifies the number of recommendations to generate, in our case 2.
Finally, the resulting list of recommended ingredient IDs and scores is converted to a pandas DataFrame,  with additional columns indicating whether each recommended ingredient is already liked by the user in the past or not.

In [None]:
ids, scores = model.recommend(test_user['id'], X_train[test_user['id']], N=2, filter_already_liked_items=False)
pd.DataFrame({"ingredient": ingredient_map.loc[ids, 'ingredient'].values, "score": scores, "already_liked_train": np.in1d(ids, X_train[test_user['id']].indices), "already_liked_whole": np.in1d(ids, X[test_user['id']].indices)})

Unnamed: 0,ingredient,score,already_liked_train,already_liked_whole
0,beef broth,1.070998,True,True
1,cumin,1.021546,True,True


After obtaining ingredient recommendations for a user, we use the similar_items method of the AlternatingLeastSquares model to find similar items to the recommended ones. This help us to suggest replacements or additions to the user's preferred ingredients.

The code sim_item_ids, sim_score = model.similar_items(ids, 3) computes the 3(2 IN OUR CASE) most similar items to the recommended ones in terms of user-item interactions. The resulting arrays sim_item_ids and sim_score contain the indices of the similar items and their similarity scores, respectively.

To make the output more readable, we create a pandas DataFrame that maps the indices back to the original ingredient names using the ingredient_map DataFrame. Specifically, we use loc indexing to select the rows with the indices in sim_item_ids and the ingredient column to obtain the names. We then create new columns in the DataFrame to display the recommended ingredients and their two closest replacements, using similar indexing with sim_item_ids and the appropriate columns in the ingredient_map DataFrame.

In [None]:
sim_item_ids, sim_score = model.similar_items(ids, 3)
pd.DataFrame({"recommended_ingredient": ingredient_map.loc[sim_item_ids[:,0], 'ingredient'].values
            , "replacement_ingredient_1": ingredient_map.loc[sim_item_ids[:,1], 'ingredient'].values
            , "recommended_ingredient_2": ingredient_map.loc[sim_item_ids[:,2], 'ingredient'].values})

Unnamed: 0,recommended_ingredient,replacement_ingredient_1,recommended_ingredient_2
0,beef broth,onions,beef stew meat
1,cumin,chili powder,oregano


After generating the top 2 recommended ingredients for the test user, we want to extract the names of these ingredients for further analysis. To do this, we use the "loc" method on the "ingredient_map" dataframe(that we created which contains a mapping between ingredient names and their corresponding IDs in the matrix used by the model.)

First, we extract the "ingredient" column of the "ingredient_map" dataframe using the "loc" method and pass in the "ids" variable, which contains the IDs of the recommended ingredients. This returns a dataframe containing the names of the top 2 ingredients along with their corresponding IDs.

Next, we use the "set" method to convert this dataframe to a set of unique ingredient names, which we store in the variable "top2_ingredients". This set is used for further analysis, to recommend the recipes are most frequently associated with these top recommended ingredients to user.

In [None]:
top2_ingredients = set(ingredient_map.loc[ids, 'ingredient'])

Creating a DataFrame called recommended_recipes which contains recipes that have the top 2 recommended ingredients for a particular user. The variable top2_ingredients is a set that contains the names of the top 2 recommended ingredients for the user. The set.difference() method is used to compare this set of ingredients with the ingredients listed in the recipe_df DataFrame.

The lambda function is used to count the number of ingredients that are not in the top2_ingredients set for each recipe, and then filters the recipe_df DataFrame based on whether or not the count is equal to 0.

In other words, the recommended_recipes DataFrame contains only those recipes that contain both of the user's top 2 recommended ingredients.

In [None]:
recommended_recipes = recipe_df[recipe_df['ingredients'].map(lambda x: len(top2_ingredients.difference(x))==0)]
recommended_recipes

Unnamed: 0,mean_rating,num_rating,recipe_id,ingredients,calories,fat,sugar,sodium,protien,saturated_fat,carbs
145489,6.0,1,R1243305,"{flour, lean boneless pork, ground cinnamon, g...",328.4,11.0,27.0,27.0,62.0,12.0,11.0
145583,6.0,1,R1342396,"{cilantro, cooked rice, fresh parsley, spicy s...",516.0,9.0,6.0,38.0,38.0,5.0,31.0
70212,5.184211,38,R1723657,"{diced tomatoes, oil, onions, lean ground beef...",705.7,29.0,144.0,90.0,94.0,33.0,28.0
28546,6.0,1,R1958350,"{red chili powder, lemons, cornmeal, onion, le...",420.2,45.0,8.0,37.0,62.0,50.0,2.0
122314,6.0,2,R2281343,"{tomato sauce, dried basil, salsa, seasoning s...",127.7,17.0,13.0,20.0,4.0,16.0,2.0
73413,6.0,1,R2996867,"{green onion, flour, tomato paste, ground beef...",806.8,81.0,22.0,104.0,137.0,100.0,4.0
151526,6.0,2,R310522,"{instant minced onion, diced tomatoes, 96% lea...",323.4,8.0,14.0,16.0,57.0,9.0,14.0
92693,6.0,2,R3217358,"{chuck roast, diced tomatoes, chili powder, re...",17.4,0.0,5.0,13.0,1.0,0.0,1.0
124585,6.0,1,R348953,"{fresh garlic, diced tomatoes, salsa, boneless...",337.6,19.0,32.0,64.0,86.0,27.0,5.0
124555,6.0,1,R3572892,"{parmesan cheese, au jus sauce, flour, brown s...",569.5,54.0,25.0,40.0,47.0,68.0,13.0


Now according to us we sorts the recommended recipes in ascending order based on the nutritional values(that we found should be consider first compared to other for heathy meal) such that we can recommend the healthier recipes to our users. 
Specifically, it first sorts the recipes based on their calories (in ascending order), followed by their protein content (in descending order), sugar content (in ascending order), sodium content (in ascending order), fat content (in ascending order), and finally carbohydrate content (in ascending order).

By sorting the recommended recipes based on their nutritional values, we can ensure that the recommended recipes are not only relevant to the user's taste preferences, but also can have the choice to have healthy recipes.

In [None]:

recommended_recipes.sort_values(['calories', 'protien', 'sugar', 'sodium', 'fat', 'carbs'], ascending=[True, False, True, True, True, True ])

Unnamed: 0,mean_rating,num_rating,recipe_id,ingredients,calories,fat,sugar,sodium,protien,saturated_fat,carbs
92693,6.0,2,R3217358,"{chuck roast, diced tomatoes, chili powder, re...",17.4,0.0,5.0,13.0,1.0,0.0,1.0
14003,5.0,1,R6878421,"{beef bouillon granules, garlic salt, tomato s...",24.4,1.0,7.0,29.0,3.0,0.0,1.0
122314,6.0,2,R2281343,"{tomato sauce, dried basil, salsa, seasoning s...",127.7,17.0,13.0,20.0,4.0,16.0,2.0
148101,4.5,2,R757762,"{minced garlic cloves, jalapenos, bay leaves, ...",237.5,20.0,18.0,9.0,41.0,22.0,2.0
151526,6.0,2,R310522,"{instant minced onion, diced tomatoes, 96% lea...",323.4,8.0,14.0,16.0,57.0,9.0,14.0
145489,6.0,1,R1243305,"{flour, lean boneless pork, ground cinnamon, g...",328.4,11.0,27.0,27.0,62.0,12.0,11.0
124585,6.0,1,R348953,"{fresh garlic, diced tomatoes, salsa, boneless...",337.6,19.0,32.0,64.0,86.0,27.0,5.0
140342,6.0,1,R9339302,"{minced garlic cloves, tomato juice, cajun sea...",348.3,28.0,28.0,28.0,35.0,31.0,9.0
161233,6.0,1,R961200,"{green chili peppers, diced tomatoes, garlic c...",351.8,23.0,17.0,32.0,33.0,23.0,12.0
59222,5.686275,51,R4176875,"{chuck roast, beef broth, chili powder, chipot...",364.4,22.0,18.0,41.0,98.0,31.0,2.0


**Coverage, Diversity And Serendipity Metric**

In [70]:
def IngredientCoverage(user_id, top_k, X_train, model, ingredient_map):
    # Get top k recommended item IDs for the given user
    ids, _ = model.recommend(user_id, X_train[user_id], N=top_k, filter_already_liked_items=False)
    
    # Get the names of the recommended ingredients
    top_k_ingredients = set(ingredient_map.loc[ids, 'ingredient'])
    
    # Calculate the coverage as the percentage of recipes that contain all of the recommended ingredients
    num_recipes = len(recipe_df)
    num_matching_recipes = sum(recipe_df['ingredients'].map(lambda x: top_k_ingredients.issubset(x)))
    coverage = num_matching_recipes / num_recipes
    
    return coverage

user_id = 10
top_k = 2
coverage = IngredientCoverage(user_id=user_id, top_k=top_k, X_train=X_train, model=model, ingredient_map=ingredient_map)

print("Ingredient coverage for user {}: {:.2%}".format(user_id, coverage))


Ingredient coverage for user 10: 0.00%


In [61]:

import itertools

# Calculate diversity score for a list of recommended ingredients
def Diversity(recommended_ingredients, model, X):
    n = 0
    total = 0
    
    for pair in itertools.combinations(recommended_ingredients, 2):
        ingredient1 = ingredient_map.loc[ingredient_map['ingredient'] == pair[0]].iloc[0]['id']
        ingredient2 = ingredient_map.loc[ingredient_map['ingredient'] == pair[1]].iloc[0]['id']
        if ingredient1 in sim_item_ids and ingredient2 in sim_item_ids:
            sim_score = model.similar_items(ingredient1, N=X)
            sim_items = [i[0] for i in sim_score]
            if ingredient2 in sim_items:
                total += 1
            n += 1
    return (1 - total/n)

diversity_score = Diversity(list(top2_ingredients), model, 2)
print("Diversity score:", diversity_score)



Diversity score: 1.0


In [67]:
# import numpy as np

# # Calculate serendipity score for a list of recommended ingredients
# def Serendipity(recommended_ingredients, model, X):
#     n = 0
#     total = 0
    
#     # Find recommended recipes
#     top2_ingredients = set(recommended_ingredients)
#     recommended_recipes = recipe_df[recipe_df['ingredients'].map(lambda x: len(top2_ingredients.difference(x))==0)]
    
#     # Get ratings of recommended recipes
#     recommended_ratings = merged_df[merged_df['recipe_id'].isin(recommended_recipes.index)]['rating']
    
#     # Calculate mean rating of recommended recipes
#     mean_rating = recommended_ratings.mean()
    
#     # Get ratings of all other recipes
#     other_ratings = merged_df[~merged_df['recipe_id'].isin(recommended_recipes.index)]['rating']
    
#     # Calculate mean rating of all other recipes
#     other_mean_rating = other_ratings.mean()
    
#     # Calculate serendipity score
#     if np.isnan(mean_rating) or np.isnan(other_mean_rating):
#         return 0
#     else:
#         return abs(mean_rating - other_mean_rating)
    
# serendipity_score = Serendipity(list(top2_ingredients), model, 2)
# print("Serendipity score:", serendipity_score)



In [64]:
# Get all ingredients in recommended recipes
all_recommended_ingredients = set([ingredient for ingredients in recommended_recipes for ingredient in ingredients])

# Calculate serendipity
serendipity = 1 - len(top2_ingredients.intersection(all_recommended_ingredients)) / len(all_recommended_ingredients)
print('Serendipity:', serendipity)

Serendipity: 1.0


**REFRENCES**

https://benfred.github.io/implicit/tutorial_lastfm.html

CHAT GPT

https://github.com/ZeeTsing/Recipe_reco/blob/master/3_recommendation_with_SVD.ipynb


Thank You
