<a href="https://colab.research.google.com/github/nicolemichaud03/Recipe-Recommender-System/blob/main/NNnotebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Neural Network Recipe Recommendation System with Embeddings for Dietary Restriction

### Pre-Processing and Data Exploration:

Loading the data and necessary packages:

In [1]:
# # TF's recommender imports
# !pip install -q tensorflow-recommenders
# !pip install -q tensorflow_ranking
# !pip install -q --upgrade tensorflow-datasets
# !pip install -q scann

In [2]:
import tensorflow as tf
print("TensorFlow version:", tf.__version__)

TensorFlow version: 2.13.1


In [3]:
import pandas as pd
import nltk
nltk.download('punkt')
from nltk.corpus import stopwords
import re
import numpy as np
import pickle
from nltk.tokenize import RegexpTokenizer, word_tokenize
import io
from collections import defaultdict
import os
import pprint
import tempfile
from typing import Dict, Text
import numpy as np
import tensorflow as tf
import tensorflow_datasets as tfds
import tensorflow_recommenders as tfrs
import tensorflow_ranking as tfr

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!


In [4]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


Loading the user and item data and viewing it:

In [5]:
user_data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/capstone_data/RAW_interactions.csv')
recipe_data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/capstone_data/RAW_recipes.csv')

In [6]:
user_data.head()

Unnamed: 0,user_id,recipe_id,date,rating,review
0,38094,40893,2003-02-17,4,Great with a salad. Cooked on top of stove for...
1,1293707,40893,2011-12-21,5,"So simple, so delicious! Great for chilly fall..."
2,8937,44394,2002-12-01,4,This worked very well and is EASY. I used not...
3,126440,85009,2010-02-27,5,I made the Mexican topping and took it to bunk...
4,57222,85009,2011-10-01,5,"Made the cheddar bacon topping, adding a sprin..."


In [7]:
user_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1132367 entries, 0 to 1132366
Data columns (total 5 columns):
 #   Column     Non-Null Count    Dtype 
---  ------     --------------    ----- 
 0   user_id    1132367 non-null  int64 
 1   recipe_id  1132367 non-null  int64 
 2   date       1132367 non-null  object
 3   rating     1132367 non-null  int64 
 4   review     1132198 non-null  object
dtypes: int64(3), object(2)
memory usage: 43.2+ MB


In [8]:

recipe_data.head()

Unnamed: 0,name,id,minutes,contributor_id,submitted,tags,nutrition,n_steps,steps,description,ingredients,n_ingredients
0,arriba baked winter squash mexican style,137739,55,47892,2005-09-16,"['60-minutes-or-less', 'time-to-make', 'course...","[51.5, 0.0, 13.0, 0.0, 2.0, 0.0, 4.0]",11,"['make a choice and proceed with recipe', 'dep...",autumn is my favorite time of year to cook! th...,"['winter squash', 'mexican seasoning', 'mixed ...",7
1,a bit different breakfast pizza,31490,30,26278,2002-06-17,"['30-minutes-or-less', 'time-to-make', 'course...","[173.4, 18.0, 0.0, 17.0, 22.0, 35.0, 1.0]",9,"['preheat oven to 425 degrees f', 'press dough...",this recipe calls for the crust to be prebaked...,"['prepared pizza crust', 'sausage patty', 'egg...",6
2,all in the kitchen chili,112140,130,196586,2005-02-25,"['time-to-make', 'course', 'preparation', 'mai...","[269.8, 22.0, 32.0, 48.0, 39.0, 27.0, 5.0]",6,"['brown ground beef in large pot', 'add choppe...",this modified version of 'mom's' chili was a h...,"['ground beef', 'yellow onions', 'diced tomato...",13
3,alouette potatoes,59389,45,68585,2003-04-14,"['60-minutes-or-less', 'time-to-make', 'course...","[368.1, 17.0, 10.0, 2.0, 14.0, 8.0, 20.0]",11,['place potatoes in a large pot of lightly sal...,"this is a super easy, great tasting, make ahea...","['spreadable cheese with garlic and herbs', 'n...",11
4,amish tomato ketchup for canning,44061,190,41706,2002-10-25,"['weeknight', 'time-to-make', 'course', 'main-...","[352.9, 1.0, 337.0, 23.0, 3.0, 0.0, 28.0]",5,['mix all ingredients& boil for 2 1 / 2 hours ...,my dh's amish mother raised him on this recipe...,"['tomato juice', 'apple cider vinegar', 'sugar...",8


In [9]:
recipe_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 231637 entries, 0 to 231636
Data columns (total 12 columns):
 #   Column          Non-Null Count   Dtype 
---  ------          --------------   ----- 
 0   name            231636 non-null  object
 1   id              231637 non-null  int64 
 2   minutes         231637 non-null  int64 
 3   contributor_id  231637 non-null  int64 
 4   submitted       231637 non-null  object
 5   tags            231637 non-null  object
 6   nutrition       231637 non-null  object
 7   n_steps         231637 non-null  int64 
 8   steps           231637 non-null  object
 9   description     226658 non-null  object
 10  ingredients     231637 non-null  object
 11  n_ingredients   231637 non-null  int64 
dtypes: int64(5), object(7)
memory usage: 21.2+ MB


In [10]:
recipe_data = recipe_data.rename(columns={"id": "recipe_id"})

In [11]:
print(len(user_data['user_id'].unique()))
print(len(recipe_data['recipe_id'].unique()))

226570
231637


With 226,570 users and 231,637 recipes, there are less users than there are recipes. Therefore, it is probably best for our recommender system to be user-user based.

Dropping the columns I know I won't be working with:

In [12]:
user_recipe_ratings = user_data.drop(columns=['review', 'date'])

In [13]:
recipe_data = recipe_data.drop(columns=['contributor_id', 'submitted', 'nutrition', 'steps', 'minutes', 'n_steps', 'n_ingredients'])
recipe_data = recipe_data.rename(columns={"id": "recipe_id"})

In [14]:
# Converting id values to strings to be compatible for
# later use with tensorflow
user_data['recipe_id'] = user_data['recipe_id'].astype(str)
recipe_data['recipe_id'] = recipe_data['recipe_id'].astype(str)
user_data['user_id'] = user_data['user_id'].astype(str)

In [15]:
# Making sure text features are strings so that they can be
# cleaned appropriately
recipe_data['name'] = recipe_data['name'].astype(str)
recipe_data['description'] = recipe_data['description'].astype(str)
recipe_data['tags'] = recipe_data['tags'].astype(str)

In [16]:
# Creating a function to perform cleaning steps at once
# (Removes numbers and unnecessary characters,
# makes all letters lowercase, removes stopwords)
nltk.download('stopwords')
stopwords_list = stopwords.words('english')

no_bad_chars = re.compile('[!\"#$%&()*+-./:;<=>?@[\]^_`{|}~\n - ]')
no_nums = re.compile('[\d-]')

def clean_text(text):
    text = no_nums.sub('', text)
    text = no_bad_chars.sub(' ', text)
    text = text.lower()
    text = ' '.join(word for word in text.split() if word not in stopwords_list)
    return text

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


In [17]:
#Applying text cleaning function to text columns
recipe_data_cleaned = recipe_data.copy()
recipe_data_cleaned['name'] = (recipe_data['name']).apply(clean_text)
recipe_data_cleaned['description'] = (recipe_data['description']).apply(clean_text)
recipe_data_cleaned['tags'] = (recipe_data['tags']).apply(clean_text)
recipe_data_cleaned.head()

Unnamed: 0,name,recipe_id,tags,description,ingredients
0,arriba baked winter squash mexican style,137739,'minutesorless' 'timetomake' 'course' 'maining...,autumn favorite time year cook recipe prepared...,"['winter squash', 'mexican seasoning', 'mixed ..."
1,bit different breakfast pizza,31490,'minutesorless' 'timetomake' 'course' 'maining...,recipe calls crust prebaked bit adding ingredi...,"['prepared pizza crust', 'sausage patty', 'egg..."
2,kitchen chili,112140,'timetomake' 'course' 'preparation' 'maindish'...,modified version 'mom's' chili hit christmas p...,"['ground beef', 'yellow onions', 'diced tomato..."
3,alouette potatoes,59389,'minutesorless' 'timetomake' 'course' 'maining...,super easy great tasting make ahead side dish ...,"['spreadable cheese with garlic and herbs', 'n..."
4,amish tomato ketchup canning,44061,'weeknight' 'timetomake' 'course' 'mainingredi...,dh's amish mother raised recipe much prefers s...,"['tomato juice', 'apple cider vinegar', 'sugar..."


Feature Engineering to categorize each recipe as different diet types (Vegetarian, Vegan, and/or Gluten-Free):

In [18]:
# creating a new column to classify recipes as gluten-free or not
GF = []
#tags column contains the most information on gluten-free recipes
#(see miscellaneous notebook)
for row in recipe_data_cleaned['tags']:
    if "gluten-free" in row : GF.append("Gluten-Free")
    elif "gluten free" in row : GF.append("Gluten-Free")
    else: GF.append("None")

In [19]:
recipe_data_cleaned['GF'] = GF

In [20]:
#Ingredient lists for diet filtering:
vegan = ['ham', 'beef', 'meat', 'chicken', 'pork', 'bacon', 'sausage', 'lamb', 'veal', 'turkey', 'steak', 'rib', 'frankfurter', 'duck', 'poultry', 'goat', 'liver', 'hen', 'quail', 'brisket', 'goose','fish', 'shrimp', 'seafood', 'crab', 'lobster', 'clam', 'oyster', 'scallop', 'mussel', 'cod', 'salmon', 'halibut', 'shellfish', 'roe', 'tuna', 'caviar', 'pollock', 'yellowtail', 'squid', 'calamari', 'octopus', 'crawfish', 'crayfish', 'sardine', 'trout', 'flounder', 'anchovy', 'bass', 'haddock', 'sole','egg', 'honey','milk', 'cheese', 'yogurt', 'mayonnaise', 'butter', 'margarine', 'cream']

vegetarian = ['ham', 'beef', 'meat', 'chicken', 'pork', 'bacon', 'sausage', 'lamb', 'veal', 'turkey', 'steak', 'rib', 'frankfurter', 'duck', 'poultry', 'goat', 'liver', 'hen', 'quail', 'brisket', 'goose','fish', 'shrimp', 'seafood', 'crab', 'lobster', 'clam', 'oyster', 'scallop', 'mussel', 'cod', 'salmon', 'halibut', 'shellfish', 'roe', 'tuna', 'caviar', 'pollock', 'yellowtail', 'squid', 'calamari', 'octopus', 'crawfish', 'crayfish', 'sardine', 'trout', 'flounder', 'anchovy', 'bass', 'haddock', 'sole']

In [21]:
# creating two new columns to classify recipes as vegetarian or not
# and as vegan or not:

recipe_data_cleaned['vegetarian'] = None
recipe_data_cleaned['vegan'] = None

In [22]:
# Filtering through the 'ingedients' column for ingredients that
# aren' t vegetarian or vegan
vege_pattern = '|'.join(vegetarian)
vegan_pattern = '|'.join(vegan)


recipe_data_cleaned.vegetarian = recipe_data_cleaned.ingredients.str.contains(vege_pattern)
recipe_data_cleaned.vegan = recipe_data_cleaned.ingredients.str.contains(vegan_pattern)

In [23]:
# Changing Boolean values to 0 and 1 for easy counting later
recipe_data_cleaned['vegetarian'] = recipe_data_cleaned['vegetarian'].astype(str)
recipe_data_cleaned['vegetarian'] = recipe_data_cleaned['vegetarian'].replace({'False': 'Vegetarian', 'True': 'None'})

In [24]:
recipe_data_cleaned['vegan'] = recipe_data_cleaned['vegan'].astype(str)
recipe_data_cleaned['vegan'] = recipe_data_cleaned['vegan'].replace({'False': 'Vegan', 'True': 'None'})

In [25]:
#Making column names match and merging dfs to classify user diets based on the recipes they've used
recipe_data_cleaned = recipe_data_cleaned.rename(columns = {'id': 'recipe_id'})

#user_diets = pd.merge(user_data, recipe_data_cleaned, on='recipe_id', how='left')

In [26]:
#user_diets.head()

In [27]:
recipe_data_cleaned['diets_combined'] = recipe_data_cleaned[['vegetarian', 'vegan', 'GF']].values.tolist()
recipe_data_cleaned = recipe_data_cleaned.drop(columns=['GF', 'vegetarian', 'vegan'])
recipe_data_cleaned.head()

Unnamed: 0,name,recipe_id,tags,description,ingredients,diets_combined
0,arriba baked winter squash mexican style,137739,'minutesorless' 'timetomake' 'course' 'maining...,autumn favorite time year cook recipe prepared...,"['winter squash', 'mexican seasoning', 'mixed ...","[Vegetarian, None, None]"
1,bit different breakfast pizza,31490,'minutesorless' 'timetomake' 'course' 'maining...,recipe calls crust prebaked bit adding ingredi...,"['prepared pizza crust', 'sausage patty', 'egg...","[None, None, None]"
2,kitchen chili,112140,'timetomake' 'course' 'preparation' 'maindish'...,modified version 'mom's' chili hit christmas p...,"['ground beef', 'yellow onions', 'diced tomato...","[None, None, None]"
3,alouette potatoes,59389,'minutesorless' 'timetomake' 'course' 'maining...,super easy great tasting make ahead side dish ...,"['spreadable cheese with garlic and herbs', 'n...","[Vegetarian, None, None]"
4,amish tomato ketchup canning,44061,'weeknight' 'timetomake' 'course' 'mainingredi...,dh's amish mother raised recipe much prefers s...,"['tomato juice', 'apple cider vinegar', 'sugar...","[Vegetarian, Vegan, None]"


In [28]:
# #combining the diet-classified recipe data with the user data based on which recipes each user has rated

# user_diets = (user_diets.groupby(['user_id']).agg({'recipe_id': lambda x: x.tolist(), 'rating': lambda x: x.tolist(), 'GF': sum, 'vegetarian':sum, 'vegan': sum}).reset_index())
# user_diets.head()

In [29]:
# # creating new columns to classify users as the different diet types
# user_diets['is_vegetarian'] = None
# user_diets['is_vegan'] = None
# user_diets['is_GF'] = None
# user_diets.head()

In [30]:
# #getting a total count of the recipes that each user has rated
# user_diets['recipe_totals'] = user_diets['recipe_id'].str.len()

In [31]:
# #classifying a user as vegetarian if at least 75% of the recipes they've rated are vegetarian
# user_diets['is_vegetarian'] = np.where(user_diets['vegetarian'] >= ((user_diets['recipe_totals'])*(.75)), 1, 0)


In [32]:
# #classifying a user as vegan if at least 75% of the recipes they've rated are vegan
# user_diets['is_vegan'] = np.where(user_diets['vegan'] >= ((user_diets['recipe_totals'])*(.75)), 1, 0)

In [33]:
# #classifying a user as gluten-free if at least 75% of the recipes they've rated are gluten-free
# user_diets['is_GF'] = np.where(user_diets['GF'] >= ((user_diets['recipe_totals'])*(.75)), 1, 0)

### Modeling

Creating a baseline model without extra features ?

In [34]:
recipe_data_cleaned['diets_combined'] = recipe_data_cleaned['diets_combined'].astype(str)
recipe_data_cleaned['recipe_id'] = recipe_data_cleaned['recipe_id'].astype(str)
recipe_data_cleaned['description'] = recipe_data_cleaned['description'].astype(str)
user_recipe_ratings['user_id'] = user_recipe_ratings['user_id'].astype(int)

In [35]:
user_ds = tf.data.Dataset.from_tensor_slices(dict(user_recipe_ratings))

recipe_ds = tf.data.Dataset.from_tensor_slices(dict(recipe_data_cleaned))

In [36]:
#creating a multitask model
#preparing the data


# Select the basic features.
user_ratings = user_ds.map(lambda x: {
    #"recipe_id": x["recipe_id"],
    "user_id": x["user_id"],
    "rating": x["rating"],
})
recipes = recipe_ds.map(lambda x: {
    "recipe_id": x["recipe_id"],
    "diets_combined": x["diets_combined"],
    "description": x["description"]})

In [37]:
# recipe_ds['recipe_id'] = recipe_ds['recipe_id'].astype(str)
# recipe_ds['description'] = recipe_ds['description'].astype(str)
# recipe_ds['diets_combined'] = recipe_ds['description'].astype(str)

In [38]:
#preparations to build vocab and split data
# Randomly shuffle data and split between train and test.
tf.random.set_seed(42)
shuffled = user_ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

recipe_ids = recipes.batch(1_000).map(lambda x: x["recipe_id"])
user_ids = user_ratings.batch(1_000_000).map(lambda x: x["user_id"])


diets = np.concatenate(list(recipes.map(lambda x: x["diets_combined"]).batch(100)))
descriptions = np.concatenate(list(recipes.map(lambda x: x["description"]).batch(100)))


unique_recipe_ids = np.unique(np.concatenate(list(recipe_ids)))
unique_user_ids = np.unique(np.concatenate(list(user_ids)))

unique_diets = np.unique((list(diets)))


# unique_movie_titles = np.unique(np.concatenate(list(movies.batch(1000))))
# unique_user_ids = np.unique(np.concatenate(list(ratings.batch(1_000).map(
#     lambda x: x["user_id"]))))

In [39]:
unique_user_ids = unique_user_ids.astype(str)
unique_recipe_ids = unique_recipe_ids.astype(str)
unique_diets = unique_diets.astype(str)
# diets = diets.astype(str)
# descriptions = descriptions.astype(str)

In [40]:
# #ranking task (combined model without extra embeddings)
# tfrs.tasks.Ranking(
#     loss=tf.keras.losses.MeanSquaredError(),
#     metrics=[tf.keras.metrics.RootMeanSquaredError()],
# )

In [41]:
# #retrieval task (combined model without extra embeddings)
# tfrs.tasks.Retrieval(
#     metrics=tfrs.metrics.FactorizedTopK(
#         candidates=recipes.batch(128)
#     )
# )

In [42]:
##(combined model without extra embeddings)

# # "since we have two tasks and two losses - we need to decide on how important each loss is
# # We can do this by giving each of the losses a weight, and treating these weights as hyperparameters"


# class UserRecipesModel(tfrs.models.Model):

#   def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
#     # We take the loss weights in the constructor: this allows us to instantiate
#     # several model objects with different loss weights.

#     super().__init__()

#     embedding_dimension = 32

#     # User and recipe models.
#     self.recipe_model: tf.keras.layers.Layer = tf.keras.Sequential([
#       tf.keras.layers.StringLookup(
#         vocabulary=unique_recipe_ids, mask_token=None),
#       tf.keras.layers.Embedding(len(unique_recipe_ids) + 1, embedding_dimension)
#     ])
#     self.user_model: tf.keras.layers.Layer = tf.keras.Sequential([
#       tf.keras.layers.StringLookup(
#         vocabulary=unique_user_ids, mask_token=None),
#       tf.keras.layers.Embedding(len(unique_user_ids) + 1, embedding_dimension)
#     ])

#     # A small model to take in user and recipe embeddings and predict ratings.
#     # We can make this as complicated as we want as long as we output a scalar
#     # as our prediction.
#     self.rating_model = tf.keras.Sequential([
#         tf.keras.layers.Dense(256, activation="relu"),
#         tf.keras.layers.Dense(128, activation="relu"),
#         tf.keras.layers.Dense(1),
#     ])

#     # The tasks.
#     self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
#         loss=tf.keras.losses.MeanSquaredError(),
#         metrics=[tf.keras.metrics.RootMeanSquaredError()],
#     )
#     self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
#         metrics=tfrs.metrics.FactorizedTopK(
#             candidates=recipes.batch(128).map(self.recipe_model)
#         )
#     )

#     # The loss weights.
#     self.rating_weight = rating_weight
#     self.retrieval_weight = retrieval_weight

#   def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
#     # We pick out the user features and pass them into the user model.
#     user_embeddings = self.user_model(features["user_id"])
#     # And pick out the recipe features and pass them into the recipe model.
#     recipe_embeddings = self.recipe_model(features["recipe_id"])

#     return (
#         user_embeddings,
#         recipe_embeddings,
#         # We apply the multi-layered rating model to a concatentation of
#         # user and recipe embeddings.
#         self.rating_model(
#             tf.concat([user_embeddings, recipe_embeddings], axis=1)
#         ),
#     )

#   def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

#     ratings = features.pop("rating")

#     user_embeddings, recipe_embeddings, rating_predictions = self(features)

#     # We compute the loss for each task.
#     rating_loss = self.rating_task(
#         labels=ratings,
#         predictions=rating_predictions,
#     )
#     retrieval_loss = self.retrieval_task(user_embeddings, recipe_embeddings)

#     # And combine them using the loss weights.
#     return (self.rating_weight * rating_loss
#             + self.retrieval_weight * retrieval_loss)

In [43]:
# #Ranking specialized model
# model = UserRecipesModel(rating_weight=1.0, retrieval_weight=0.0)
# model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [44]:
# #Then shuffle, batch, and cache the training and evaluation data:
# cached_train = train.shuffle(100_000).batch(8192).cache()
# cached_test = test.batch(4096).cache()

In [45]:
# model.fit(cached_train, epochs=3)
# metrics = model.evaluate(cached_test, return_dict=True)

# print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
# print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")

The RMSE of this Ranking-only model was 1.267.

Retrieval top-100 accuracy: 0.001

In [46]:
# #Retrieval specialized model
# model = UserRecipesModel(rating_weight=0.0, retrieval_weight=1.0)
# model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [47]:
# model.fit(cached_train, epochs=3)
# metrics = model.evaluate(cached_test, return_dict=True)

# print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
# print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")

The RMSE of this Retrieval-only model was: 4.528

Retrieval top-100 accuracy: 0.036

In [48]:
# #joint model
# model = UserRecipesModel(rating_weight=1.0, retrieval_weight=1.0)
# model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))

In [49]:
# model.fit(cached_train, epochs=3)
# metrics = model.evaluate(cached_test, return_dict=True)

# print(f"Retrieval top-100 accuracy: {metrics['factorized_top_k/top_100_categorical_accuracy']:.3f}.")
# print(f"Ranking RMSE: {metrics['root_mean_squared_error']:.3f}.")

The RMSE of this combined (ranking and retrieval) model was: 1.342

Retrieval top-100 accuracy: 0.037

In [50]:
# #make predictions
# trained_recipe_embeddings, trained_user_embeddings, predicted_rating = model({
#       #insert specific user and item IDs here:
#       "user_id": np.array([" "]),
#       "recipe_id": np.array([" "])
#   })
# print("Predicted rating:")
# print(predicted_rating)

In [51]:
#recipe_ds_plus = tf.data.Dataset.from_tensor_slices(dict(recipe_data_cleaned))

Model with extra embeddings:

In [73]:
#user model/query model, extra embeddings

class UserModel(tf.keras.Model):
  def __init__(self):
    super().__init__()

    self.user_embedding = tf.keras.Sequential([
        tf.keras.layers.StringLookup(
            vocabulary=unique_user_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_user_ids)+1, 32)
    ])
    # self.rating_task = tfrs.tasks.Ranking(
    #     loss=tf.keras.losses.MeanSquaredError(),
    #     metrics=[tf.keras.metrics.RootMeanSquaredError()],
    # )

  # def call(self, user_ratings):
  #   return self.user_embedding(user_ratings["user_id"])
  def call(self):
    # We pick out the user features and pass them into the user model.
    self.user_embedding(user_id)


In [74]:
# recipe_id_strings = tf.strings.as_string(recipes.batch(128).map(self.recipe_embedding))
# diets_combined_strings = tf.strings.as_string(recipes.batch(128).map(self.diets_combined))
# description_strings = tf.strings.as_string(recipes.batch(128).map(self.description))


In [75]:
unique_recipe_ids = unique_recipe_ids.astype(str)

In [76]:
#recipe model/candidate model, extra embeddings

class RecipeModel(tf.keras.Model):
  def __init__(self, use_embeds):
    super().__init__()



    self._use_embeds = use_embeds

    self.recipe_embedding = tf.keras.Sequential([
        tf.keras.layers.StringLookup(
            vocabulary=unique_recipe_ids, mask_token=None),
        tf.keras.layers.Embedding(len(unique_recipe_ids)+1, 32)
    ])

    max_tokens = 10_000
    if use_embeds:
      self.diet_embedding = tf.keras.Sequential([
          tf.keras.layers.TextVectorization(max_tokens=max_tokens),
          tf.keras.layers.Embedding(len(unique_diets)+1, 32)
          ])
      # self.descr_vectorizer = tf.keras.layers.TextVectorization(
      #     max_tokens=max_tokens
      # )
      self.descr_embedding = tf.keras.Sequential([
      tf.keras.layers.TextVectorization(max_tokens=max_tokens),
      tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
      # We average the embedding of individual words to get one embedding vector
      # per title.
      tf.keras.layers.GlobalAveragePooling1D(),
      # self.descr_vectorizer,
      # tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
      # tf.keras.layers.GlobalAveragePooling1D(),
    ])
  def call(self, recipes):
    return tf.concat([
    self.recipe_embedding(recipes["recipe_id"]),
    tf.reshape(self.diet_embedding(recipes["diets_combined"]), (-1, 1)),
    #self.descr_embedding(recipes["description"])
    ], axis=1)


In [77]:
#defining task:


In [144]:
class User_recipe_Model(tfrs.models.Model):

  def __init__(self, use_embeds) -> None:
    super().__init__()
    self.query_model = tf.keras.Sequential([
      UserModel(),
      tf.keras.layers.Dense(32)
    ])
    self.candidate_model = tf.keras.Sequential([
      RecipeModel(use_embeds),
      tf.keras.layers.Dense(32)
    ])

    #the tasks
    # self.rating_task = tfrs.tasks.Ranking(
    #     loss=tf.keras.losses.MeanSquaredError(),
    #     metrics=[tf.keras.metrics.RootMeanSquaredError()],
    # )
    # self.retrieval_task = tfrs.tasks.Retrieval(
    #     metrics=tfrs.metrics.FactorizedTopK(
    #         candidates = recipes.batch(128).map(self.candidate_model)
    #         ))
    # self.task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(
    #       recipes.batch(128).map(self.candidate_model),
    #   ),
    # )
    # self.rating_model = tf.keras.Sequential([
    #     tf.keras.layers.Dense(256, activation="relu"),
    #     tf.keras.layers.Dense(128, activation="relu"),
    #     tf.keras.layers.Dense(1),
#    ])
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=recipes.batch(128).map(self.candidate_model),
        ))
    # #The loss weights.
    # self.rating_weight = rating_weight
    # self.retrieval_weight = retrieval_weight

    # self.recipe_id_strings = tf.strings.as_string(recipes.batch(128).map(self.recipe_embedding))
    # self.diets_combined_strings = tf.strings.as_string(recipes.batch(128).map(self.diets_combined))
    # self.description_strings = tf.strings.as_string(recipes.batch(128).map(self.description))

  # def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
  #   # We pick out the user features and pass them into the user model.
  #   user_embedding = self.query_model(features["user_id"])
  #   # And pick out the movie features and pass them into the movie model.
  #   recipe_embedding = self.candidate_model(features["recipe_id"])
  #   diet_embedding = self.candidate_model(features["diets_combined"])
  #   descr_embedding = self.candidate_model(features["description"])

  #   return (
  #       user_embedding,
  #       recipe_embedding,
  #       diet_embedding,
  #       descr_embedding,
  #       # We apply the multi-layered rating model to a concatentation of
  #       # user and movie embeddings.
  #       self.rating_model(
  #           tf.concat([user_embedding, recipe_embedding, diet_embedding, descr_embedding], axis=1)
  #       )
  #   )

  def compute_loss(self, features:Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # We only pass the user id and timestamp features into the query model. This
    # is to ensure that the training inputs would have the same keys as the
    # query inputs. Otherwise the discrepancy in input structure would cause an
    # error when loading the query model after saving it.
    query_embeddings = self.query_model({
        "user_id":features["user_id"]})

    recipe_embeddings = self.candidate_model(
        {"recipe_id": features["recipe_id"],
         "diets": features["diets_combined"]})
        # "diets": features["diets_combined"],
        #"descriptions": features["description"]
        #})
    return self.task(query_embeddings, recipe_embeddings)


  # def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

  #     ratings = features.pop("rating")

  #     query_embeddings, recipe_embeddings, rating_predictions = self(features)

  #     # We compute the loss for each task.
  #     rating_loss = self.rating_task(
  #         labels=ratings,
  #         predictions=rating_predictions,
  #     )
  #     retrieval_loss = self.retrieval_task(query_embeddings, recipe_embeddings)
  #     # And combine them using the loss weights.
  #     return (self.rating_weight * rating_loss
  #             + self.retrieval_weight * retrieval_loss)

In [145]:
tf.random.set_seed(42)
shuffled = user_ratings.shuffle(100_000, seed=42, reshuffle_each_iteration=False)

train = shuffled.take(80_000)
test = shuffled.skip(80_000).take(20_000)

cached_train = train.shuffle(100_000).batch(2048)
cached_test = test.batch(4096).cache()

In [146]:
model = User_recipe_Model(use_embeds=True)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.1))


In [147]:
model.fit(cached_train, epochs=3)

train_accuracy = model.evaluate(
    cached_train, return_dict=True)["factorized_top_k/top_100_categorical_accuracy"]
test_accuracy = model.evaluate(
    cached_test, return_dict=True)["factorized_top_k/top_100_categorical_accuracy"]

print(f"Top-100 accuracy (train): {train_accuracy:.2f}.")
print(f"Top-100 accuracy (test): {test_accuracy:.2f}.")

Epoch 1/3


TypeError: in user code:

    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1338, in train_function  *
        return step_function(self, iterator)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1322, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 1303, in run_step  **
        outputs = model.train_step(data)
    File "/usr/local/lib/python3.10/dist-packages/tensorflow_recommenders/models/base.py", line 68, in train_step
        loss = self.compute_loss(inputs, training=True)
    File "<ipython-input-144-7d8f30b2ae4e>", line 69, in compute_loss
        query_embeddings = self.query_model({
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None

    TypeError: Exception encountered when calling layer 'sequential_117' (type Sequential).
    
    in user code:
    
    
        TypeError: outer_factory.<locals>.inner_factory.<locals>.tf__call() takes 1 positional argument but 2 were given
    
    
    Call arguments received by layer 'sequential_117' (type Sequential):
      • inputs={'user_id': 'tf.Tensor(shape=(None,), dtype=int64)'}
      • training=None
      • mask=None


In [128]:
# class User_recipe_Model(tfrs.models.Model):

#   def __init__(self, rating_weight: float, retrieval_weight: float) -> None:
#     super().__init__()
#     max_tokens = 10_000
#     # self.query_embedding = tf.keras.Sequential([
#     #   UserModel(),
#     #   tf.keras.layers.Dense(32)
#     # ])
#     # self.candidate_model = tf.keras.Sequential([
#     #   RecipeModel(use_embeds),
#     #   tf.keras.layers.Dense(32)
#     # ])
#     self.user_embedding: tf.keras.layers.Layer = tf.keras.Sequential([
#       UserModel(),
#       tf.keras.layers.Dense(32)
#     ])
#     self.candidate_model: tf.keras.layers.Layer = self.recipe_embedding = tf.keras.Sequential([
#       tf.keras.layers.StringLookup(
#           vocabulary=unique_recipe_ids, mask_token=None),
#       tf.keras.layers.Embedding(len(unique_recipe_ids)+1, 32)
#     ])
#     self.diet_embedding = tf.keras.Sequential([
#       tf.keras.layers.StringLookup(
#           vocabulary = unique_diets, mask_token=None),
#       tf.keras.layers.Embedding(len(unique_diets)+1, 32)
#       ])

#     self.descr_embedding = tf.keras.Sequential([
#       tf.keras.layers.TextVectorization(max_tokens=max_tokens),
#       tf.keras.layers.Embedding(max_tokens, 32, mask_zero=True),
#       # We average the embedding of individual words to get one embedding vector
#       # per title.
#       tf.keras.layers.GlobalAveragePooling1D()
#     ])


#     #the tasks
#     self.rating_task: tf.keras.layers.Layer = tfrs.tasks.Ranking(
#         loss=tf.keras.losses.MeanSquaredError(),
#         metrics=[tf.keras.metrics.RootMeanSquaredError()],
#     )
#     self.retrieval_task: tf.keras.layers.Layer = tfrs.tasks.Retrieval(
#         metrics=tfrs.metrics.FactorizedTopK(
#             candidates = recipes.batch(128).map(self.candidate_model)
#         )
#     )
#     self.rating_model = tf.keras.Sequential([
#         tf.keras.layers.Dense(256, activation="relu"),
#         tf.keras.layers.Dense(128, activation="relu"),
#         tf.keras.layers.Dense(1),
#     ])
#     # The loss weights.
#     self.rating_weight = rating_weight
#     self.retrieval_weight = retrieval_weight

#   def call(self, features: Dict[Text, tf.Tensor]) -> tf.Tensor:
#     # We pick out the user features and pass them into the user model.
#     user_embedding = self.query_embedding(features["user_id"])
#     # And pick out the movie features and pass them into the movie model.
#     recipe_embedding = self.candidate_model(features["recipe_id"])
#     diet_embedding = self.candidate_model(features["diet"])
#     descr_embedding = self.candidate_model(features["description"])

#     return (
#         user_embedding,
#         recipe_embedding,
#         diet_embedding,
#         descr_embedding,
#         # We apply the multi-layered rating model to a concatentation of
#         # user and movie embeddings.
#         self.rating_model(
#             tf.concat([user_embedding, recipe_embedding, diet_embedding, descr_embedding], axis=1)
#         )
#     )

#   # def compute_loss(self, features, training=False):
#   #   # We only pass the user id and timestamp features into the query model. This
#   #   # is to ensure that the training inputs would have the same keys as the
#   #   # query inputs. Otherwise the discrepancy in input structure would cause an
#   #   # error when loading the query model after saving it.
#   #   query_embeddings = self.query_embedding(features["user_id"])

#   #   recipe_embeddings = self.candidate_model({
#   #       "recipe_id": features["recipe_id"],
#   #       "diets": features["diets_combined"],
#   #       "descriptions": features["description"]})

#   #   return self.task(query_embeddings, recipe_embeddings)

#   def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:

#       ratings = features.pop("rating")

#       query_embeddings, recipe_embeddings, rating_predictions = self(features)

#       # We compute the loss for each task.
#       rating_loss = self.rating_task(
#           labels=ratings,
#           predictions=rating_predictions,
#       )
#       retrieval_loss = self.retrieval_task(query_embeddings, recipe_embeddings)
#       # And combine them using the loss weights.
#       return (self.rating_weight * rating_loss
#               + self.retrieval_weight * retrieval_loss)

In [139]:
class UserRecipe_plus_Model(tfrs.models.Model):

  def __init__(self):
    super().__init__()
    self.query_model = tf.keras.Sequential([
      UserModel(),
      tf.keras.layers.Dense(32)
    ])
    self.candidate_model = tf.keras.Sequential([
      RecipeModelPlus(),
      tf.keras.layers.Dense(32)
    ])
    self.task = tfrs.tasks.Retrieval(
        metrics=tfrs.metrics.FactorizedTopK(
            candidates=recipe_ids.batch(32).map(self.candidate_model),
        ),
    )

  def compute_loss(self, features, training=False):
    # We only pass the user id and timestamp features into the query model. This
    # is to ensure that the training inputs would have the same keys as the
    # query inputs. Otherwise the discrepancy in input structure would cause an
    # error when loading the query model after saving it.
    query_embeddings = self.query_model(features["user_id"])
    recipe_embeddings = self.candidate_model({
        "recipe_id": features["recipe_id"],
        "diets": features["diets"]
    })
    return self.task(query_embeddings, recipe_embeddings)