Notes: Stage 3 is Re ranking. This will re arrage based on Serendipity Score so the output will be Top 20.

## **Import Library**

In [29]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
import random

In [30]:
df = pd.read_csv('./dataset/final_candidates.csv')

In [31]:
top_50_df = pd.read_csv('./dataset/top_50_recipes.csv')

## **Diversity Score, Serendipity Score**

Enhance Diversity and Serendipity:


1. Calculate Diversity Scores: Use cosine similarity to calculate a diversity score for each item.
2. Calculate Serendipity Scores: Combine relevance scores with diversity scores using a weight (alpha).
3. Rank Items by Serendipity: Rank items based on the combined serendipity scores and select the top-N recommendations.

In [33]:
# Function to calculate diversity scores
def calculate_diversity_scores(similarity_matrix):
    diversity_scores = np.sum(similarity_matrix, axis=1) / similarity_matrix.shape[1]
    return diversity_scores

In [34]:
# Function to calculate serendipity scores
def calculate_serendipity_scores(relevance_scores, diversity_scores, alpha=0.5):
    serendipity_scores = (1 - alpha) * relevance_scores + alpha * diversity_scores
    return serendipity_scores

In [35]:
# Function to rank items by serendipity
def rank_items_by_serendipity(serendipity_scores, top_n=20):
    ranked_indices = np.argsort(serendipity_scores)[::-1]
    return ranked_indices[:top_n]

In [36]:
# Compute TF-IDF vectors for the top 50 foods
tfidf_vectorizer = TfidfVectorizer(stop_words='english')
tfidf_matrix_top_50 = tfidf_vectorizer.fit_transform(top_50_df['Combined'].fillna(''))

In [37]:
# Compute cosine similarity matrix for the top 50 foods
cosine_sim_top_50 = cosine_similarity(tfidf_matrix_top_50, tfidf_matrix_top_50)

In [38]:
# Calculate diversity scores for the top 50 foods
diversity_scores_top_50 = calculate_diversity_scores(cosine_sim_top_50)

In [39]:
# Use the relevance scores from the neural network
relevance_scores_top_50 = top_50_df['relevance_score'].values

In [40]:
# Calculate serendipity scores
alpha = 0.5  # Weight for combining relevance and diversity (tune as needed)
serendipity_scores_top_50 = calculate_serendipity_scores(relevance_scores_top_50, diversity_scores_top_50, alpha)

## **Ranking The Recommendation Final**

Combine and Finalize Recommendations: Extract and display the final recommendations based on the serendipity ranking.

In [41]:
# Rank items by serendipity
top_n = 20  # Number of final recommendations
ranked_indices = rank_items_by_serendipity(serendipity_scores_top_50, top_n)

In [42]:
# Extract the final recommendations DataFrame
final_recommendations_df = top_50_df.iloc[ranked_indices]

In [49]:
# Print the final recommendations
final_recommendations_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 20 entries, 1 to 17
Data columns (total 19 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   RecipeId                    20 non-null     int64  
 1   Calories                    20 non-null     float64
 2   FatContent                  20 non-null     float64
 3   SaturatedFatContent         20 non-null     float64
 4   CholesterolContent          20 non-null     float64
 5   SodiumContent               20 non-null     float64
 6   CarbohydrateContent         20 non-null     float64
 7   FiberContent                20 non-null     float64
 8   SugarContent                20 non-null     float64
 9   ProteinContent              20 non-null     float64
 10  NameClean                   20 non-null     object 
 11  RecipeIngredientPartsClean  20 non-null     object 
 12  RecipeInstructionsClean     20 non-null     object 
 13  Combined                    20 non-null   

## **Export The Final Recommendation**

In [50]:
final_recommendations_df.to_csv('./dataset/final_recommendations.csv', index=False)