# **Get Started with TensorFlow Recommenders and Matrix Factorization**
A hands-on tutorial on recommender systems with TensorFlow
**Code and Idea According :** https://medium.com/geekculture/get-started-with-tensorflow-recommenders-and-matrix-factorization-a90abae852e1

In [1]:
!pip install -q tensorflow-recommenders

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/96.2 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m96.2/96.2 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[?25h

In [2]:
from typing import Dict, Text

import numpy as np
import pandas as pd
import tensorflow as tf

import tensorflow_recommenders as tfrs

# **Preprocess the Data**
We will work with the [MovieLens dataset](https://https://grouplens.org/datasets/movielens/100k/), collected by the GroupLens Research Project at the University of Minnesota. Our goal is to build a model that suggests movies to users. We will keep the user-item pairs where the rating is above 3 and this is because we would like to recommend movies that the user is likely to watch but also like.

In this post, we will work with the TensorFlow tutorial where we will try to go deeper by showing:
*   How to start with a pandas data frame instead of a TensorFlow datasets
*   How to get the Users’ and Items’ Embeddings
*   How to find the expected score for every item for each user
*   How to make recommendations for each user
*   How to find similar items
*   How to save and load the TensorFlow model

In [3]:
import pandas as pd

In [4]:
url = 'https://raw.githubusercontent.com/sulthonpriyan/CapstoneProjectTeamC23-PS081/main/MachineLearning/EkstrasiData/rating.csv'
ratings = pd.read_csv(url)
ratings

Unnamed: 0,userid,produkid,rating
0,0,1,5
1,0,2,5
2,0,3,5
3,0,4,1
4,0,5,1
...,...,...,...
2795,99,24,1
2796,99,25,1
2797,99,26,5
2798,99,27,5


In [5]:
# load the produk data
movies = pd.read_csv('https://raw.githubusercontent.com/sulthonpriyan/CapstoneProjectTeamC23-PS081/main/MachineLearning/EkstrasiData/produk.csv')
movies

Unnamed: 0,produkid,produkname
0,1,pupuk kompos
1,2,pupuk npk
2,3,pupuk phonska
3,4,pupuk cair
4,5,pupuk organik
5,6,pupuk anorganik
6,7,pupuk kambing
7,8,bibit kelengkeng
8,9,bibit durian
9,10,bibit cengkih


In [6]:
# join the ratings with the movies
ratings = pd.merge(ratings, movies, on='produkid')


# keep only moviews with a rating greater than 3
ratings = ratings[ratings.rating>3]


# keep only the user id and the movie title columns
ratings = ratings[['produkname', 'userid']].reset_index(drop=True)

ratings

Unnamed: 0,produkname,userid
0,pupuk kompos,0
1,pupuk kompos,7
2,pupuk kompos,11
3,pupuk kompos,12
4,pupuk kompos,15
...,...,...
1053,bubuk jahe,81
1054,bubuk jahe,82
1055,bubuk jahe,85
1056,bubuk jahe,89


# **Build the Model**
The idea is to build a retrieval model using user and item embeddings. We will work with the TensorFlow-recommenders library.

Since we installed the tensorflow-recommenders on Colab, we will load the `ratings.csv` and `movies.csv` that we generated in the previous step.

In [7]:
# read the csv files as pandas data frames
ratings_df = ratings #pd.read_csv('ratings.csv')
movies_df = movies #pd.read_csv('movies.csv')


ratings_df.rename(columns = {'produkname': 'movie_title'}, inplace=True)
movies_df.rename(columns = {'produkname': 'movie_title'},  inplace=True)
ratings_df.rename(columns = {'userid': 'user_id'}, inplace=True)

Now, we will convert the pandas data frames to TensorFlow datasets.

In [8]:
# convert them to tf datasets
ratings = tf.data.Dataset.from_tensor_slices(dict(ratings_df))
movies = tf.data.Dataset.from_tensor_slices(dict(movies_df))

Let’s have a look at our data:

In [9]:
# get the first rows of the movies dataset
for m in movies.take(5):
  print(m)

{'produkid': <tf.Tensor: shape=(), dtype=int64, numpy=1>, 'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk kompos'>}
{'produkid': <tf.Tensor: shape=(), dtype=int64, numpy=2>, 'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk npk'>}
{'produkid': <tf.Tensor: shape=(), dtype=int64, numpy=3>, 'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk phonska'>}
{'produkid': <tf.Tensor: shape=(), dtype=int64, numpy=4>, 'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk cair'>}
{'produkid': <tf.Tensor: shape=(), dtype=int64, numpy=5>, 'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk organik'>}


In [10]:
# get the first rows of the ratings dataset
for r in ratings.take(4):
  print(r)

{'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk kompos'>, 'user_id': <tf.Tensor: shape=(), dtype=int64, numpy=0>}
{'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk kompos'>, 'user_id': <tf.Tensor: shape=(), dtype=int64, numpy=7>}
{'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk kompos'>, 'user_id': <tf.Tensor: shape=(), dtype=int64, numpy=11>}
{'movie_title': <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk kompos'>, 'user_id': <tf.Tensor: shape=(), dtype=int64, numpy=12>}


Let’s keep the basic features of our model.

In [11]:
# Select the basic features.
ratings = ratings.map(lambda x: {
    "movie_title": x["movie_title"],
    "user_id": x["user_id"]
})
movies = movies.map(lambda x: x["movie_title"])

# **Build vocabularies to convert user ids and movie titles into integer indices for embedding layers**
For our model, we need to assign indices for the unique users and movies. Note that we add an extra index for the unknown users and movies respectively.

In [12]:
user_ids_vocabulary = tf.keras.layers.IntegerLookup(mask_token=None)
user_ids_vocabulary.adapt(ratings.map(lambda x: x["user_id"]))


movie_titles_vocabulary = tf.keras.layers.StringLookup(mask_token=None)
movie_titles_vocabulary.adapt(movies)

# **Create the Model**
We will work with the tfrs.Model by implementing the compute_loss method.

In [13]:
class MovieLensModel(tfrs.Model):
  # We derive from a custom base class to help reduce boilerplate. Under the hood,
  # these are still plain Keras Models.

  def __init__(
      self,
      user_model: tf.keras.Model,
      movie_model: tf.keras.Model,
      task: tfrs.tasks.Retrieval):
    super().__init__()

    # Set up user and movie representations.
    self.user_model = user_model
    self.movie_model = movie_model

    # Set up a retrieval task.
    self.task = task

  def compute_loss(self, features: Dict[Text, tf.Tensor], training=False) -> tf.Tensor:
    # Define how the loss is computed.

    user_embeddings = self.user_model(features["user_id"])
    movie_embeddings = self.movie_model(features["movie_title"])

    return self.task(user_embeddings, movie_embeddings)

We have to define the user_model and moview_model which are sequential models for generating the embeddings. Finally, the objective of the task is a retrieval model.

In [14]:
# Define user and movie models
user_model = tf.keras.Sequential([
    user_ids_vocabulary,
    tf.keras.layers.Embedding(user_ids_vocabulary.vocabulary_size(), 128),
    tf.keras.layers.Dense(64),
    tf.keras.layers.Dense(32),
    tf.keras.layers.Dense(16)
])
movie_model = tf.keras.Sequential([
    movie_titles_vocabulary,
    tf.keras.layers.Embedding(movie_titles_vocabulary.vocabulary_size(), 128),
    tf.keras.layers.Dense(64),
    tf.keras.layers.Dense(32),
    tf.keras.layers.Dense(16)
])

# Define your objectives.
task = tfrs.tasks.Retrieval(metrics=tfrs.metrics.FactorizedTopK(
    movies.batch(128).map(movie_model)
  )
)

# **Fit the Model**
The last step is to build the model, train it and make some predictions.

In [54]:
# Create a retrieval model.
model = MovieLensModel(user_model, movie_model, task)
model.compile(optimizer=tf.keras.optimizers.Adagrad(0.2))

# Train for n epochs.
model.fit(ratings.batch(4000), epochs=5)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f8ade9b9d80>

# **Make Predictions**
Let’s make predictions for the `user_id=42`.

In [66]:
# Make prediction for Pupuk category
# Define List of Pupuk Items

pupuk_cat = ['pupuk kompos','pupuk npk','pupuk phonska','pupuk cair','pupuk organik','pupuk anorganik','pupuk kambing']
bibit_cat = ['bibit kelengkeng','bibit durian','bibit cengkih','bibit mahoni','bibit cabai','bibit alpukat','bibit kopi','bibit randu','bibit kelapa']
buahsayur_cat = ['kangkung','bayam','cabai','jeruk','apel','kelapa']
bahanolah_cat = ['madu','bubuk kopi','tepung tapioka','lada bubuk','tepung mocaf','bubuk jahe']

In [67]:
# Use brute-force search to set up retrieval using the trained representations.
index = tfrs.layers.factorized_top_k.BruteForce(model.user_model)
index.index_from_dataset(
    movies.batch(100).map(lambda title: (title, model.movie_model(title))))

# Get some recommendations.
n = 10 #no of recom
_, titles = index(np.array([42]))
print(f"Top n recommendations for user 42: {titles[0, :n]}")
pupuk_recom = []
for i in titles[0, :n]:
  for j in pupuk_cat:
    if i == j: pupuk_recom.append(i)

print(f"Some pupuk recommendations for user 42: {pupuk_recom}")

Top n recommendations for user 42: [b'kelapa' b'pupuk kompos' b'pupuk npk' b'pupuk phonska' b'bibit randu'
 b'bibit kelengkeng' b'lada bubuk' b'bibit kelapa' b'pupuk anorganik'
 b'tepung mocaf']
Some pupuk recommendations for user 42: [<tf.Tensor: shape=(), dtype=string, numpy=b'pupuk kompos'>, <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk npk'>, <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk phonska'>, <tf.Tensor: shape=(), dtype=string, numpy=b'pupuk anorganik'>]


In [68]:
np.array([42])
# _, titles = index(np.array([42]))
# titles[0]

array([42])

In [69]:
# Untuk category lainnya copas aja

# **Save and Load the Model**
To deploy a model like this, we simply export the BruteForce layer we created above:

In [70]:
import tempfile
import os
# Export the query model.
with tempfile.TemporaryDirectory() as tmp:
  path = os.path.join(tmp, "model")

  # Save the index.
  tf.saved_model.save(index, path)

  # Load it back; can also be done in TensorFlow Serving.
  loaded = tf.saved_model.load(path)

  # Pass a user id in, get top predicted movie titles back.
  scores, titles = loaded([42])
  print(f"Recommendations: {titles[0][:3]}")



Recommendations: [b'kelapa' b'pupuk kompos' b'pupuk npk']


In [71]:
index.save('my_model2')



In [72]:
loaded = tf.keras.models.load_model('my_model2', compile=False)

In [73]:
scores, titles = loaded([42])
print(f"Recommendations: {titles[0]}")

Recommendations: [b'kelapa' b'pupuk kompos' b'pupuk npk' b'pupuk phonska' b'bibit randu'
 b'bibit kelengkeng' b'lada bubuk' b'bibit kelapa' b'pupuk anorganik'
 b'tepung mocaf']


As expected, we get the same recommendations for user 42.

#Deploy model to the Firebase Console

Step 1. Initialize Firebase App Instance

In [74]:
# import firebase_admin

# firebase_admin.initialize_app(options={'projectId': projectID,
#              'storageBucket': projectID + '.appspot.com' })

Step 2. Upload the model file to Cloud Storage

In [75]:
# from firebase_admin import ml

# # This uploads it to your bucket as recommendation.tflite
# source = ml.TFLiteGCSModelSource.from_saved_model(export_dir, 'model.tflite')
# print (source.gcs_tflite_uri)

Step 3. Deploy the model to Firebase

In [76]:
# # Create a Model Format
# model_format = ml.TFLiteFormat(model_source=source)

# # Create a Model object
# sdk_model_1 = ml.Model(display_name="recommendations", model_format=model_format)

# # Make the Create API call to create the model in Firebase
# firebase_model_1 = ml.create_model(sdk_model_1)
# print(firebase_model_1.as_dict())

# # Publish the model
# model_id = firebase_model_1.model_id
# firebase_model_1 = ml.publish_model(model_id)