### Project Overview

Exploring the dynamics of user sentiments, behavior patterns, and recipe interactions, this project delves into the 'Recipe Reviews and User Feedback Dataset' to derive insights, develop a personalized recipe recommendation system, and enhance the culinary experience on online platforms.

## Background

The "Recipe Reviews and User Feedback Dataset" is a comprehensive repository of data encompassing various aspects of recipe reviews and user interactions. It includes essential information such as the recipe name, its ranking on the top 100 recipes list, a unique recipe code, and user details like user ID, user name, and an internal user reputation score. Each review comment is uniquely identified with a comment ID and comes with additional attributes, including the creation timestamp, reply count, and the number of up-votes and down-votes received. Users' sentiment towards recipes is quantified on a 1 to 5 star rating scale, with a score of 0 denoting an absence of rating.  It offers a window into the dynamics of recipe reviews and user feedback within the culinary website domain.

## Business Problem

 Bay Bistro food company wants to enhance user engagement and satisfaction on its recipe platform by providing personalized recipe recommendations to users. The existing platform has a vast collection of recipes, but users often struggle to discover new recipes that match their preferences.
 Through this Miles group has been tasked to address this challenge by developing a recommendation system that analyzes user interactions and feedback to suggest relevant recipes tailored to each user's tastes and preferences.

### Objectives
**Detailed Sentiment Analysis**: Utilize the star ratings and review comments to conduct a nuanced sentiment analysis, exploring the relationship between user sentiment and review attributes such as up-votes/down-votes and reply counts.

**User Behavior Analysis**:
Understand user preferences and behavior by analyzing recipe reviews, ratings, and interactions.
Identify popular recipes and trending ingredients based on user feedback.

**Personalized Recipe Recommendations:**
Develop a recommendation algorithm to suggest recipes tailored to each user's taste and preferences.
Utilize collaborative filtering and content-based filtering techniques to enhance personalized recommendations.

**User Interface and Experience Design:**
Develop an intuitive and user-friendly interface for users to easily browse recipes, read reviews, and receive recommendations.


## Data Understanding
The data set contains the following columns:
1. recipe name: {name of the recipe the comment was posted on}
2. recipe number: {placement of the recipe on the top 100 recipes list}
3. recipe code: {unique id of the recipe used by the site}
4. comment id: {unique id of the comment}
5. user id: {unique id of the user who left the comment}
6. user name: {name of the user}
7. user reputation: {internal score of the site, roughly quantifying the past behavior of the user}
8. create at: {time at which the comment was posted as a Unix timestamp}
9. reply count: {number of replies to the comment}
10. thumbs up: {number of up-votes the comment has received}
11. thumbs down: {number of down-votes the comment has received}
12. stars: {the score on a 1 to 5 scale that the user gave to the recipe. A score of 0 means that no score was given}
13. best score: {score of the comment, likely used by the site the help determine the order in the comments that appear in}
14. text: {the text content of the comment}

In [1]:
!pip install surprise

Collecting surprise
  Obtaining dependency information for surprise from https://files.pythonhosted.org/packages/61/de/e5cba8682201fcf9c3719a6fdda95693468ed061945493dea2dd37c5618b/surprise-0.1-py2.py3-none-any.whl.metadata
  Downloading surprise-0.1-py2.py3-none-any.whl.metadata (327 bytes)
Collecting scikit-surprise (from surprise)
  Downloading scikit-surprise-1.1.3.tar.gz (771 kB)
     ---------------------------------------- 0.0/772.0 kB ? eta -:--:--
      --------------------------------------- 10.2/772.0 kB ? eta -:--:--
      ------------------------------------ 20.5/772.0 kB 330.3 kB/s eta 0:00:03
     -- ---------------------------------- 61.4/772.0 kB 465.5 kB/s eta 0:00:02
     ------ ----------------------------- 143.4/772.0 kB 853.3 kB/s eta 0:00:01
     -------------- ----------------------- 286.7/772.0 kB 1.4 MB/s eta 0:00:01
     ------------------------ ------------- 501.8/772.0 kB 2.0 MB/s eta 0:00:01
     ----------------------------------- -- 727.0/772.0 kB 2.4 MB/

  error: subprocess-exited-with-error
  
  python setup.py bdist_wheel did not run successfully.
  exit code: 1
  
  [101 lines of output]
  C:\Users\odhiambo rodgers bon\AppData\Local\Temp\pip-install-9b726nmj\scikit-surprise_ed61f9f92b6c4d26b93323e4dfe9f377\setup.py:65: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  !!
  
          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************
  
  !!
    dist.Distribution().fetch_build_eggs(["numpy>=1.17.3"])
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-311
  creating build\lib.win-amd64-cpython-311\surprise
  copying surprise\accuracy.py -> build\lib.win-amd64-cpython-311\surprise
  copy

In [2]:
#importing neccesary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from surprise import Dataset, Reader
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split,cross_validate
from surprise import KNNWithMeans



ModuleNotFoundError: No module named 'surprise'

In [None]:
#loading the data set
df=pd.read_csv('Recipe.csv',index_col=0)
df.head()

In [None]:
#checking data shape
df.shape

In [None]:
#checking data information
df.info()

### Data Preprocessing

Data Cleaning

To ensure the development of a robust and accurate model, several data cleaning techniques will be applied to the dataset. The following techniques will be employed:

- Completeness: This technique involves addressing missing values within the dataset. Steps will be taken to identify and handle missing data appropriately, either through imputation or removal, to ensure that the dataset is complete.

- Consistency: The consistency of the data will be examined to identify any discrepancies or irregularities. Inconsistencies in variables, such as conflicting formats or conflicting information within the dataset, will be addressed and resolved to maintain data integrity.

- Validity: Validity refers to the accuracy and relevance of the data. Data validation techniques will be applied to verify that the values within each variable align with expected ranges or predefined criteria. Any invalid or erroneous data points will be rectified or removed from the dataset.

- Uniformity: Uniformity is crucial during the data cleaning process to ensure consistency and accurate analysis. Inconsistent or non-uniform data can introduce errors and bias into the modeling process, leading to unreliable results. Robust techniques will be employed to detect and handle non-uniformity effectively, ensuring that the data is standardized and aligned.

By applying these data cleaning techniques, the dataset will be refined and prepared, ensuring the reliability and accuracy of the data before proceeding with the subsequent stages of analysis and modeling.

### Completeness
To achieve completeness in our data, I will be checking for missing values in the data.

In [None]:
# Check for null values
print(f'The data has {df.isna().sum().sum()} missing values')

In [None]:
# Define a function to explore missing data
def missing_data(df):
    missing_data = df.isna().sum()
    missing_data = missing_data[missing_data>0]
    return missing_data.to_frame()

In [None]:
# expanding the number of visible columns
pd.set_option('display.max_columns', None)

In [None]:
# Apply missing_data function to the dataframe
missing_data(df).T

In [None]:
# Fill missing values in 'reply_count' column with 0
df['reply_count'].fillna(0, inplace=True)

In [None]:
# Droping rows where the "text" column is missing
df = df.dropna(subset=['text'])

In [None]:
# converting 'created_at' to datetime
df['created_at'] = pd.to_datetime(df['created_at'], unit='s')

In [None]:
# checking to see if missing values have been replaced
print(f'The data has {df.isna().sum().sum()} missing values')

### Consistency
For the data to be constistent, I need to resolve any inconsistencies by checking for duplicate values in our data.

In [None]:
# checking for duplicates
print(f'The data has {df.duplicated().sum()} duplicates')

- The data has no duplicate values

### Validity

For our data to be valid, I have to verify that every column is accurate and appropriate for this analysis and remove those that are invalid.

In [None]:
#validity checks
df.head(2)

### Uniformity
- For our data to be uniform, I have to verify that every column is correct and convert them to there appropriate data type.

In [None]:
#checking for data types
df.dtypes

In [None]:
#Converting data types 'reply_count' to integer
df['reply_count'] = df['reply_count'].astype(int)

In [None]:
# Convert user_id to integer values
df['user_id'] = df['user_id'].rank(method='dense').astype(int)

## Feature engineering

In [None]:
# Rename the "stars" column to "ratings"
df.rename(columns={'stars': 'ratings'}, inplace=True)

In [None]:
# Extract the year from the 'created_at' column and create a new 'month' column
df['month'] = df['created_at'].dt.month


In [None]:
# # User Interaction Features
df['thumbs_up_ratio'] = df['thumbs_up'] / (df['thumbs_up'] + df['thumbs_down'] + 1e-6)

In [None]:
df.head()

EXPLORATORY DATA ANALYSIS

#### Visualization of rating distribution

In [None]:
# Plotting with seaborn and matplotlib
plt.figure(figsize=(8, 6))
plt.hist(df['ratings'], bins=10, color='skyblue', edgecolor='black')
plt.title('Distribution of ratings')
plt.xlabel('ratings')
plt.ylabel('Frequency')
plt.show()

`Observation`

There seems to be a concentration of bars on the right side of the graph, suggesting a higher frequency of recipes receiving positive ratings.However, there's also a range of ratings, indicating some variation in user preferences.

#### Visualizing rating trends over months

In [None]:
# Grouping by month to calculate average ratings
ratings_trend = df.groupby('month')['ratings'].mean().reset_index()

# Plotting
plt.figure(figsize=(10, 6))
plt.plot(ratings_trend['month'], ratings_trend['ratings'], marker='o')
plt.title('Average Ratings Trend Over Time')
plt.xlabel('month')
plt.ylabel('Average Ratings')
plt.xticks(rotation=45)
plt.grid(False)
plt.tight_layout()
plt.show()




`Observation`

- The line shows some fluctuations in the average rating over time.There's a possibility of a slight downward trend in the average rating as the months progress

### Top 5 recipes

In [None]:
# Group by recipe and count the number of stars, then sort in descending order
top_10_recipes = df.groupby('recipe_name')['ratings'].count().sort_values(ascending=False).head()
# list of custom colors for the bars
custom_colors = ['#06837f', '#02cecb', '#b4ffff', '#f8e16c', '#fed811']
# bar plot with custom colors
plt.figure(figsize=(10, 6))
ax = top_10_recipes.plot(kind='bar', color=custom_colors)
plt.title('Top 5 Recipes')
plt.xlabel('Recipes')
plt.ylabel('Number of ratings')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()

# Add the number of ratings on top of each bar
for i, v in enumerate(top_10_recipes):
 ax.text(i, v, str(v), ha='center', va='bottom', fontsize=16, color='black')
plt.show()

### Visualization Top 10 recipe

In [None]:
# Group by recipe_name and count the number of ratings
top_10_recipes = df['recipe_name'].value_counts().head(10)

# Plotting
plt.figure(figsize=(10, 6))
top_10_recipes.plot(kind='bar', color='skyblue')
plt.title('Top 10 Recipes ')
plt.xlabel('Recipe Name')
plt.ylabel('Count of Ratings')
plt.xticks(rotation=45, ha='right')
plt.tight_layout()
plt.show()

 `Observation`

 The graph suggests that Cheeseburger Soup,Creamy White Chili,Best Ever Banana Bread,Enchilada Casserole-Ole!,Basic Homemade Bread,Favorite Chicken Potpie,Flavorful Chicken Fajitas,Amish Breakfast Casserole,Zucchini Pizza Casserole,Cauliflower Soupthese  are the most popular recipes among users based on the number of ratings they have received

#### Multivariate Analysis

Our aim here is to look for the relationship between different features

First we look at the correlation of the numeric columns using Pearson's coefficient

In [None]:
# Pearson coefficient of numeric columns
numerical_columns_df = df[[
    'user_reputation',
    'reply_count',
    'thumbs_up',
    'thumbs_down',
    'ratings',
    'best_score',
    'thumbs_up_ratio'
]]
numerical_columns_df.corr()

`Observations`

 The correlation matrix provides valuable insights into the relationships between different features.
 For instance, thumbs up and best score have a strong positive correlation, suggesting that users who give more thumbs up tend to have higher best scores. On the other hand, reply count and ratings have a weak negative relationship, implying that users who reply more may give slightly lower ratings.

### MODELLING

## Item-Based Collaborative Filtering

its recommendation technique that focuses on the similarity between items rather than between users.

In [None]:
#preparing the data by converting the provided DataFrame to a surprise dataset
ratings_df = pd.DataFrame(df)

# Define the reader object
reader = Reader(rating_scale=(0, 100))

# Load data from DataFrame
data = Dataset.load_from_df(ratings_df[['user_id', 'recipe_code', 'ratings']], reader)


In [None]:
from surprise import KNNWithMeans
from surprise.model_selection import train_test_split

# Split the dataset into train and test sets
trainset, testset = train_test_split(data, test_size=0.25)

# Use item-based collaborative filtering
algo_item_based = KNNWithMeans(k=5, sim_options={'name': 'pearson_baseline', 'user_based': False})
algo_item_based.fit(trainset)

# Predict ratings for the test set
predictions_item_based = algo_item_based.test(testset)

# Print RMSE (Root Mean Square Error)
from surprise import accuracy
print("Item-Based CF RMSE:", accuracy.rmse(predictions_item_based))
print("Item-Based CF MAE:", accuracy.mae(predictions_item_based))

### User-Based Collaborative Filtering
 The recommendation technique used in information filtering systems to provide personalized recommendations to users

In [None]:
# Use user-based collaborative filtering
algo_user_based = KNNWithMeans(k=5, sim_options={'name': 'pearson_baseline', 'user_based': True})
algo_user_based.fit(trainset)

# Predict ratings for the test set
predictions_user_based = algo_user_based.test(testset)

# Print RMSE (Root Mean Square Error)
print("User-Based CF RMSE:", accuracy.rmse(predictions_user_based))
print("User-Based CF MAE:",accuracy.mae(predictions_user_based))

## Single Value Decomposition

In [None]:
from surprise import SVD
# Split the dataset into train and test sets
trainset, testset = train_test_split(data, test_size=0.25)

# Use SVD
model = SVD()

# Fit the model
model.fit(trainset)

# Predict ratings for the test set
predictions = model.test(testset)

# Print RMSE (Root Mean Square Error)
print("SVD RMSE:", accuracy.rmse(predictions))
print("SVD MAE:", accuracy.mae(predictions))

### Recommendations based on SVD

To generate recommendations using SVD, a userID to whom which recommendations are to be made is taken an input. SVD model
is used to predicted rating for each recipe which represent how much the user might like each recipe. The ratings are sorted in a
descending order and recommendations is given to the user.


In [None]:
# Recommend movies for a specific user (user_id = 1 in this example)
user_id = 13114
user_recipe = df[df['user_id'] == user_id]['recipe_code'].unique()
# Generate recommendations for the user
recommended_recipe = []
for recipe_code in df['recipe_code'].unique():
 if recipe_code not in user_recipe:
  predicted_ratings = model.predict(user_id, recipe_code).est
  recommended_recipe.append((recipe_code, predicted_ratings))

# Sort recommended movies by predicted rating
recommended_recipe.sort(key=lambda x: x[1], reverse=True)

# Print top 10 recommended recipes
for recipe_code, predicted_rating in recommended_recipe[:10]:
 recipe_name = df[df['recipe_code'] == recipe_code]['recipe_name'].iloc[0]
 print(f"Recipe Name: {recipe_name}, Recipe code: {recipe_code}, Predicted Rating: {predicted_rating}")

`Observation`

The output represents top recommended recipes along with their predicted ratings. These recommendations  help users discover popular and potentially enjoyable recipes, enhancing user engagement and satisfaction with the recipe platform or service.

### Hyperparameter tunning

In [None]:
from surprise.model_selection import GridSearchCV
# Define parameter grid
param_grid = {'n_factors': [50, 100, 150], 'n_epochs': [20, 30, 40], 'lr_all': [0.001, 0.002, 0.005]}
# Instantiate SVD
model = SVD()

# Perform grid search with cross-validation
grid_search = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=5, n_jobs=-1)
grid_search.fit(data)
# Get best parameters
best_params = grid_search.best_params['rmse']
print("Best parameters:", best_params)
#getting rmse score
cv_results = grid_search.cv_results

# Extract RMSEand MAE values
rmse_values = cv_results['mean_test_rmse']
mae_values = cv_results['mean_test_mae']

# Print RMSE and MAE values
print("RMSE values:", rmse_values)
print("MAE values:", mae_values)

In [None]:
# Initialize and train the SVD model
model = SVD(**best_params)
trainset = data.build_full_trainset()
model.fit(trainset)

# Get the list of all recipe codes
all_recipe_codes = df['recipe_code'].unique()

# Generate top-10 recipe recommendations for each user
top_n = {}
for uid in df['user_id'].unique():
    # Exclude recipes already rated by the user
    user_recipe_rated = df.loc[df['user_id'] == uid, 'recipe_code']
    recipe_to_predict = [rid for rid in all_recipe_codes if rid not in user_recipe_rated]

    # Predict ratings for recipes not yet rated by the user
    predictions = [model.predict(uid, rid) for rid in recipe_to_predict]

    # Sort predictions by estimated rating
    sorted_predictions = sorted(predictions, key=lambda x: x.est, reverse=True)

    # Get top 10 recipe recommendations
    top_n[uid] = [(pred.iid, pred.est) for pred in sorted_predictions[:10]]

# Print the top 10 recommendations for a specific user

print(f"Top 10 recipe recommendations for user {uid}:")
for rank, (recipe_code, estimated_rating) in enumerate(top_n[uid], start=1):
    print(f"{rank}: Recipe Code {recipe_code} (Estimated Rating: {estimated_rating})")

- The estimated ratings are numerical values generated by the SVD model, representing how much the model predicts the user would like each recipe.
- Higher estimated ratings indicate recipes that the model believes the user is more likely to enjoy or rate highly.

#### Non- negative Matrix Factorization

This is a matrix Factorization technique that factors the user-item interaction matrix into non-negative matrices.

Non-negative Matrix Factorization (NMF) offers a parts-based, interpretable representation of the data, making it particularly useful for recommendation systems. It can capture latent features or topics that represent underlying preferences or characteristics of users and items.

NMF will help enhance the performance and scalability of recommendation systems

In [None]:
from surprise import NMF
from surprise.model_selection import cross_validate
import matplotlib.pyplot as plt
import numpy as np
np.random.seed(42)

# NMF algorithm
nmf = NMF()

# Perform cross-validation
cv_results = cross_validate(nmf, data, measures=['RMSE', 'MAE'], cv=5, verbose=True)
# Extract RMSE scores
rmse_scores = cv_results['test_rmse']

print("Cross-validation results:", cv_results)

# Plot RMSE scores
plt.figure(figsize=(8, 6))
plt.plot(range(1, 6), rmse_scores, marker='o', linestyle='-', color='b')
plt.title('Cross-Validated RMSE Scores for NMF')
plt.xlabel('Fold')
plt.ylabel('RMSE')
plt.xticks(range(1, 6))
plt.grid(True)
plt.show()


The cross-validation results indicate the average RMSE and MAE values obtained across all folds, which are 1.5308 and 1.0460, respectively.

The NMF algorithm has an average RMSE of approximately 1.5308 and an average MAE of approximately 1.0460 across the 5 folds. The model takes an average of 1.48 seconds to fit and 0.03 seconds to test on each fold.

  Below we give recipe recommendation  for user_ id `13695` based on ratings

In [None]:
# getting a list of all recipe ids
np.random.seed(42)
all_recipe_ids=np.unique(df['recipe_code'])

# predicted ratings for all recipe in your dataset for use Id 3.
user_id = 13695

# Create a list to store predicted ratings
predicted_ratings = []
for recipe_code in all_recipe_ids:
 predicted_rating = nmf.predict(user_id, recipe_code).est
 predicted_ratings.append((recipe_code, predicted_rating))

#sorting predicted_ratings in descending order
predicted_ratings.sort(key=lambda x:x[1], reverse=True)

In [None]:
# getting the top 5 recommendations
top_5_recommendations=predicted_ratings[:5]
top_5_recommendations

In [None]:
for recipe_code, predicted_rating in top_5_recommendations:
 recipe_title = df[df['recipe_code'] == recipe_code]['recipe_name'].values[0]
 print(f"Recipe Name: {recipe_name}, Predicted Rating: {predicted_rating}")


## Deep Learning Matrix Facrorization

Matrix Factorization with Embedding Layer

Matrix factorization with embeddings is a popular approach in recommendation systems to model user-item interactions. It aims to
decompose the user-item interaction matrix into lower-dimensional embeddings for users and items (recipe in this case). These
embeddings capture latent features that represent users' preferences and items' characteristics. By learning these embeddings, the
model can predict how users would rate unseen items, enabling personalized recommendations.

splitting the data into training and validation sets for recomendation model

In [None]:
from sklearn.model_selection import train_test_split
# 'rating'becomes the target variable 'y'
y = df['ratings']
# 'userId' and 'recipe_code' are feature data 'X'
X = df[['user_id', 'recipe_code']]
# Split the data into training and validation sets
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)


calculating number of unique users and recipe to determine the dimensions of the embendding layers

In [None]:
# the maximum user ID in the dataset and add 1 to account for 0-based indexing
num_users = df['user_id'].max() + 1
# the maximum movie ID in the dataset and add 1 to account for 0-based indexing
num_recipe = df['recipe_code'].max() + 1

 Neural Collaborative Filtering Model

In [None]:
from keras.layers import Input, Embedding, Flatten,Concatenate, Dense
from keras.models import Model
import tensorflow as tf
tf.random.set_seed(42)

# Define embedding dimensions
embedding_dim = 40
# Define input layers
user_input = Input(shape=(1,), name='User_Input')
recipe_input = Input(shape=(1,), name='Recipe_Input')

# Define embedding layers
user_embedding = Embedding(input_dim=num_users, output_dim=embedding_dim, name='User_Embedding')(user_input)
recipe_embedding = Embedding(input_dim=num_recipe, output_dim=embedding_dim, name='Recipe_Embedding')(recipe_input)

# Flatten the embeddings
user_flat = Flatten(name='User_Vector')(user_embedding)
recipe_flat = Flatten(name='Recipe_Vector')(recipe_embedding)

# Concatenate user and recipe embeddings
concatenated = Concatenate(name='Concatenate')([user_flat, recipe_flat])
# Define your model's architecture
dense_layer = Dense(100, activation='relu', name='dense')(concatenated)
output_layer = Dense(1, activation='linear', name='Output')(dense_layer)
# Create the model
model = Model(inputs=[user_input, recipe_input], outputs=output_layer)

In [None]:
model.summary()

`Observation`

This model takes user and recipe IDs, converts them into embeddings, concatenates these embeddings, passes them through dense layers, and finally predicts a rating. The model is relatively large due to the high number of parameters, especially in the embedding layers.








In [None]:
#TensorFlow Model Compilation with Custom MAE Metric
import tensorflow as tf
# Custom mse metric function
def mae(y_true, y_pred):
    return tf.reduce_mean(tf.abs(y_pred - y_true))
# Compile the model
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001), loss='mean_squared_error', metrics=['mean_absolute_error', mae])


In [None]:
# Training the Model on User and Recipe_code with Ratings
history = model.fit(
 [X_train['user_id'].values, X_train['recipe_code'].values], # Input data for User and recipe_code
 y_train , # Target values (ratings)
 epochs=20, # Number of training epochs
 batch_size=128, # Batch size for training
 validation_data=([X_val['user_id'].values, X_val['recipe_code'].values], y_val) # Validation data
)


In [None]:
# training and test loss histories
training_loss = history.history['loss']
test_loss = history.history['val_loss']
# count of the number of epochs
epoch_count = range(1, len(training_loss) + 1)
# Visualize loss history
plt.figure(figsize=(8, 4))
plt.plot(epoch_count, training_loss, 'r--', label='Training Loss')
plt.plot(epoch_count, test_loss, 'b-', label='Test Loss')
plt.legend()
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.title('Training and Test Loss History')
plt.grid(True)
plt.show()


The graph suggests that there's no simple relationship between a recipe's complexity and its average rating in this dataset. Complex recipes can be highly rated, and simple recipes can also be well-received

Generating Top Recipe Recommendations for a User `13695 `

In [None]:
#Generating Top Recipe Recommendations for a User `13695 `
user_id = 13695
all_recipe_ids = np.unique(ratings_df['recipe_code'])

predicted_ratings = []

for recipe_code in all_recipe_ids:
    # Predict the rating for the user and recipe
    predicted_rating = model.predict([np.array([user_id]), np.array([recipe_code])])[0][0]


    # Clip the predicted rating to the range of 0.5 to 5.0
    predicted_rating = max(0.5, min(predicted_rating, 5.0))

    predicted_ratings.append((recipe_code, predicted_rating))

# Sort the predicted ratings to find top recommendations
predicted_ratings.sort(key=lambda x: x[1], reverse=True)
top_recommendations = predicted_ratings[:10]

# Print top recommendations
print("Top 10 Recommended Recipes for User", user_id)

for i, (recipe_code, predicted_rating) in enumerate(top_recommendations):
    recipe_name = df.loc[df['recipe_code'] == recipe_code, 'recipe_name'].iloc[0]
    print(f"Top {i+1}: {recipe_name} (Predicted Rating: {predicted_rating})")


The recommendations are based on the predicted ratings generated by the recommendation model for the  user. The predicted ratings are sorted in descending order to identify the top 10 recipes that the user is most likely to enjoy. Recipes with higher predicted ratings are considered to be more aligned with the user's preferences, while those with lower ratings may be less preferred.

In [None]:
import pickle


# Save top recommendations as a pickle file
with open('top_recommendations.pkl', 'wb') as f:
    pickle.dump(top_recommendations, f)
