# **TensorFlow Implementation of Collaborative Filtering: A Detailed Explanation with Examples**
Collaborative Filtering (CF) is a popular recommendation system technique that predicts user preferences based on their past interactions and the interactions of similar users.

In this tutorial, we will implement a Matrix Factorization-based Collaborative Filtering model using TensorFlow.**bold text**

## Step 1: Load Required Libraries


In [27]:
import numpy as np
import pandas as pd
import tensorflow as tf

• numpy → Used for numerical computations.

• pandas → Helps in handling datasets.

• tensorflow → Used for building and training the recommendation model.

## Step 2: Load the Data

we have a dataset of user ratings for movies in a CSV file, movie_ratings.csv.

In [28]:
data = pd.read_csv('Movie_Ratings_Data.csv', index_col='Movie')
data

Unnamed: 0_level_0,User1,User2,User3,User4
Movie,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Inception,5,4,0,3
Titanic,3,0,5,2
Avatar,0,3,4,5
Joker,4,5,2,0


0 means the user hasn’t rated that movie yet.

Other values represent user ratings from 1-5.

## Step 3: Initialize Required Variables

In [29]:
n_movies, n_users = data.shape
print('Number of movies in dataset: {}'.format(n_movies))
print('Number of users in dataset: {}'.format(n_users))

Number of movies in dataset: 4
Number of users in dataset: 4


In [30]:
n_features = 10  # Number of latent factors

**What are Latent Factors?**
Latent factors represent the hidden characteristics that influence how users rate movies. These factors are not explicitly provided in the dataset but are learned automatically during training.

For example, when recommending movies, some possible latent factors could be:

Action Level

Romance Level

Comedy Level

Sci-Fi Influence

Director Popularity

Movie Length

Star Actor Popularity

Year of Release

Soundtrack Quality

Special Effects Quality


In [31]:
#Convert Data to Matrix
y = data.values
y #ratings

array([[5, 4, 0, 3],
       [3, 0, 5, 2],
       [0, 3, 4, 5],
       [4, 5, 2, 0]])

In [32]:
R = np.where(y>0, 1, 0) # 1 if rating exists, else 0
R

array([[1, 1, 0, 1],
       [1, 0, 1, 1],
       [0, 1, 1, 1],
       [1, 1, 1, 0]])

In [33]:
#Initialize Parameters

tf.random.set_seed(42)

W = tf.Variable(tf.random.normal((n_users, n_features), dtype=tf.float64, name='W'))
X = tf.Variable(tf.random.normal((n_movies, n_features), dtype=tf.float64, name='X'))
b = tf.Variable(tf.random.normal((1, n_users), dtype=tf.float64, name='b'))

learning_rate = 0.01
epochs = 100
lambda_ = 1 #Regularization parameter


## Step 4: Initialize the Optimizer

In [34]:
#Initialize the Optimizer

optimizer = tf.optimizers.Adam(learning_rate)


## Step 5: Define the Cost Function


In [35]:
#Define the Cost Function

def cofi_cost_function(X, W, b, Y, R, n_users, n_movies, lambda_):
  predictions = (tf.linalg.matmul(X, tf.transpose(W)) + b) * R #(Dot product of movie features and user preferences) + b (only for rated movies)

  cost = 0.5 * tf.reduce_sum(R * tf.square(predictions - Y))

  #Regularization to prevent overfitting

  cost += (lambda_/2) * (tf.reduce_sum(tf.square(W)) + tf.reduce_sum(tf.square(X)))

  return cost


## Step 6: Normalize the Data

In [36]:
# Normalize the Data

Y_mean = np.mean(y, axis=1) ## Shape: (n_movies,)
Y_mean = Y_mean[:, np.newaxis]  # Reshape to (n_movies, 1)
Y_norm = y - Y_mean  # Works because shapes align!


## Step 7: Train the Model

In [37]:
#We perform gradient descent to optimize the parameters.

for epoch in range(epochs):
  with tf.GradientTape() as tape: #GradientTape records computations for automatic differentiation.
    cost = cofi_cost_function(X, W, b, Y_norm, R, n_users, n_movies, lambda_)

  gradients = tape.gradient(cost, [W, X, b]) # tape.gradient computes gradients of W, X, and b.

  optimizer.apply_gradients(zip(gradients, [W, X, b])) #optimizer.apply_gradients updates parameters.

  #zip(gradients, [W, X, b]) → Creates pairs of (gradient, variable),
  #so TensorFlow knows which gradient corresponds to which variable.

  if (epoch + 1) % 10 == 0:
    print(f'Epoch {epoch+1}, Cost: {cost.numpy():0.4f}')



Epoch 10, Cost: 77.9828
Epoch 20, Cost: 56.3359
Epoch 30, Cost: 42.8293
Epoch 40, Cost: 34.2461
Epoch 50, Cost: 28.6755
Epoch 60, Cost: 24.9419
Epoch 70, Cost: 22.2762
Epoch 80, Cost: 20.2261
Epoch 90, Cost: 18.5487
Epoch 100, Cost: 17.1179


## Step 8: Make Predictions


In [38]:
predictions = tf.linalg.matmul(X, tf.transpose(W)) + b
predictions += Y_mean[:, np.newaxis]

predictions

<tf.Tensor: shape=(4, 4, 4), dtype=float64, numpy=
array([[[4.52681609, 3.86264334, 2.66774073, 2.57430986],
        [3.80706198, 4.87686772, 4.6191273 , 2.38475154],
        [5.77457316, 3.11006093, 3.6939972 , 4.44707311],
        [4.19552543, 5.0284681 , 2.33941438, 3.31152612]],

       [[4.02681609, 3.36264334, 2.16774073, 2.07430986],
        [3.30706198, 4.37686772, 4.1191273 , 1.88475154],
        [5.27457316, 2.61006093, 3.1939972 , 3.94707311],
        [3.69552543, 4.5284681 , 1.83941438, 2.81152612]],

       [[4.52681609, 3.86264334, 2.66774073, 2.57430986],
        [3.80706198, 4.87686772, 4.6191273 , 2.38475154],
        [5.77457316, 3.11006093, 3.6939972 , 4.44707311],
        [4.19552543, 5.0284681 , 2.33941438, 3.31152612]],

       [[4.27681609, 3.61264334, 2.41774073, 2.32430986],
        [3.55706198, 4.62686772, 4.3691273 , 2.13475154],
        [5.52457316, 2.86006093, 3.4439972 , 4.19707311],
        [3.94552543, 4.7784681 , 2.08941438, 3.06152612]]])>