<a href="https://www.kaggle.com/code/lorenzoarcioni/collaborative-latent-factors-recommending-system?scriptVersionId=218933887" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

 # Collaborative Latent-Factors-Based Filtering for Movie Recommendations (Incomplete)

## Introduction

The latent factor approach in recommendation systems utilizes matrix factorization techniques to uncover hidden patterns in user-item interactions. These methods predict user preferences by mapping both users and items to a shared latent space where their interactions can be represented by their proximity or alignment. Latent factor models, such as Singular Value Decomposition (SVD), are widely used in this context.

Key features of the latent factor approach:
- Captures underlying relationships between users and items.
- Handles sparse datasets effectively by reducing dimensionality.
- Improves scalability compared to neighborhood-based methods.

In this notebook, we will explore the latent factor approach to build a movie recommendation system using matrix factorization.

### Mathematical Background

The latent factor approach works by:
- Represent users and items in a shared lower dimensional latent space (i.e., as a vector of latent factors).
- Such vectros are inferred (i.e., learned) from the observed ratings.
- High correlation between user and item latent factors indicates a possible recomendation.
- Map both users and items to the latent space and then predict ratings based on the inner product in the latent space.

So formally we have:
- $R = \{0, 1, \dots, 5\} \lor R = [0, 1]$ is the set of ratings.
- $\vec x_u \in R^d$ is the latent factor vector for user $u$. Each $\vec x_u[k] \in R$ measure the extent of interest user $u$ has in items exhibiting latent factor $k$.
- $\vec w_i \in R^d$ is the latent factor vector for item $i$. Each $\vec w_i[k] \in R$ measure the extent of interest item $i$ has in users exhibiting latent factor $k$.

Essentially, $d$ hidden features to describe both users and items.

Thus, $r_{u,i}$ is the rating given by user $u$ to item $i$ and $\hat{r}_{u,i} = \vec x_u \cdot \vec w_i = \sum_{k=1}^d \vec x_u[k] \cdot \vec w_i[k]$ is the predicted rating for user $u$ and item $i$.

The problem is to approximate the user-item matrix $M \in \mathbb R^{n \times m}$ with the product of a user latent factor matrix $X \in \mathbb R^{n \times d}$ and an item latent factor matrix $W^T \in \mathbb R^{d \times m}$. So

$$
M \approx X \cdot W^T.
$$

## Dataset Description

We use two datasets for this analysis:

1. **Movies Dataset**:
   - `Movie_ID`: Unique identifier for each movie.
   - `Title`: Name of the movie.
   - `Year`: Year the movie was released.

2. **Ratings Dataset**:
   - `User_ID`: Unique identifier for each user.
   - `Movie_ID`: Identifier for the movie rated by the user.
   - `Rating`: Numeric rating provided by the user (e.g., on a scale of 1-5).

In [None]:
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix

# Load the datasets
movies_file = "/kaggle/input/netflix-movie-rating-dataset/Netflix_Dataset_Movie.csv"
ratings_file = "/kaggle/input/netflix-movie-rating-dataset/Netflix_Dataset_Rating.csv"

ratings = pd.read_csv(ratings_file)  # Columns: User_ID, Rating, Movie_ID
movies  = pd.read_csv(movies_file)    # Columns: Movie_ID, Year, Name

### Formal Definition

- $U = \{u_1, u_2, \dots, u_n\}$ is the set of users.
- $U_i = \{u \in U \mid r_{u,i} \neq 0\}$ is the set of users who have rated item $i$
- $I = \{i_1, i_2, \dots, i_m\}$ is the set of items.
- $I_u = \{i \in I \mid r_{u,i} \neq 0\}$ is the set of items rated by user $u$
- $R = \{0, 1, \dots, 5\} \lor R = [0, 1]$ is the set of ratings.
- $r_{u,i}$ is the rating given by user $u$ for item $i$ (equal to 0 if not rated).
- $D = \{(u_j, i_j)\}_{j=1}^{N}$ is the set of user-item pairs (our dataset).
- $I_D = \{i \in I \mid \exists (u, i) \in D\}$ is the set of items in the dataset.
- $U_D = \{u \in U \mid \exists (u, i) \in D\}$ is the set of users in the dataset.

## User-Based Collaborative Filtering

### 1. Data Preprocessing
- **User-Item Matrix Creation**: Convert the ratings dataset into a user-item matrix, where rows represent users and columns represent movies. Missing ratings are filled with zeros. Each rating is represented by a number from 1 to 5.
  $$ M[u, i] = r_{u,i} \in R$$
  Where:
  - $u \in U$
  - $i \in I$
  - $r_{u,i}$ is the rating given by user $u$ for movie $i$.

- **Sparse Matrix Conversion**: The dense matrix is converted to a sparse format for memory optimization:
  $$M_{\{\text{sparse}\}} = \text{sparse}(M)$$

In [None]:
# Step 1: Create a user-item matrix
user_item_matrix = ratings.pivot(index='User_ID', columns='Movie_ID', values='Rating')

# Fill missing values with 0 (can use NaN for some algorithms)
user_item_matrix.fillna(0, inplace=True) # It is not the case for this dataset

# Convert the DataFrame to a sparse matrix
sparse_user_item = csr_matrix(user_item_matrix.values)

### 2. Determine the Loss Function and Computing Its Gradient

Assuming we have access to the dataset $D$ of observed ratings, the matrix $M$ is partially known and filled with those observations. To actually learn the latent factors, we need to choose a loss function to optimize. In our case, we choose squared error (SE):

$$
L(X, W) = \frac{1}{2} \left[ \sum_{(u, i) \in D} (r_{u,i} - \hat{r}_{u,i})^2 + \lambda (\sum_{u \in U_D} \|\vec x_u\|^2 + \sum_{i \in I_D} \|\vec w_i\|^2)\right]
$$

Thus, 

$$X^*, W^* = \argmin_{X, W} \ L(X, W).$$

#### Loss Function
The loss function in matrix notation is defined in terms of matrices as:
$$
L(X, W) = \frac{1}{2} \left[ \| M - X W^T \|_F^2 + \lambda \left( \|X\|_F^2 + \|W\|_F^2 \right) \right],
$$
where:
- $M \in \mathbb{R}^{n \times m}$ is the observed rating matrix, with $M_{u,i} = r_{u,i}$ if user $u$ has rated item $i$, and 0 otherwise.
- $X \in \mathbb{R}^{n \times d}$ represents the user latent factors (each row corresponds to a user vector $X_u$).
- $W \in \mathbb{R}^{m \times d}$ represents the item latent factors (each row corresponds to an item vector $W_i$).
- $\| \cdot \|_F$ is the Frobenius norm.

The prediction matrix is:
$$
\hat{M} = X W^T.
$$

The loss consists of:
1. The reconstruction error:
$$
\| M - X W^T \|_F^2 = \sum_{(u, i) \in D} (r_{u,i} - X_u W_i^T)^2.
$$
2. The regularization terms:
$$
\lambda \left( \|X\|_F^2 + \|W\|_F^2 \right).
$$

---

#### Computing the Gradients

##### Gradient with respect to $X$

1. Differentiate the reconstruction error term:
$$
\frac{\partial}{\partial X} \frac{1}{2} \| M - X W^T \|_F^2 = -(M - X W^T) W.
$$

2. Differentiate the regularization term:
$$
\frac{\partial}{\partial X} \frac{\lambda}{2} \|X\|_F^2 = \lambda X.
$$

3. Combine the two terms:
$$
\frac{\partial L}{\partial X} = -(M - X W^T) W + \lambda X.
$$

---

##### Gradient with respect to $W$

1. Differentiate the reconstruction error term:
$$
\frac{\partial}{\partial W} \frac{1}{2} \| M - X W^T \|_F^2 = -(M - X W^T)^T X.
$$

2. Differentiate the regularization term:
$$
\frac{\partial}{\partial W} \frac{\lambda}{2} \|W\|_F^2 = \lambda W.
$$

3. Combine the two terms:
$$
\frac{\partial L}{\partial W} = -(M - X W^T)^T X + \lambda W.
$$

In [None]:
# Step 2: Creating the matrix M

# Dimensions of the user-item matrix
num_users, num_items = user_item_matrix.shape

# Create a mask for observed entries in R
M = user_item_matrix.values 
mask = M > 0  # Boolean mask for observed entries

### 3. Optimize the Loss Function with Stochastic Gradient Descent

In order to optimize the loss function, we use Stochastic Gradient Descent (SGD).

#### Explanation of the SGD Algorithm (Matrix Form)

##### 1. Initialization
- Matrices $X$ (users' latent factors) and $W$ (items' latent factors) are initialized randomly with small values.
- $X \in \mathbb{R}^{m \times d}$, where $m$ is the number of users and $d$ is the number of latent factors.
- $W \in \mathbb{R}^{n \times d}$, where $n$ is the number of items.

##### 2. Gradient Computation
- Define the prediction matrix:
  $$
  \hat{M} = X W^T
  $$
- Compute the error matrix (only for observed entries in $M$):
  $$
  E = \begin{cases}
  M_{ui} - \hat{M}_{ui} \quad &\text{if} \ M_{ui} > 0\\
  0 \quad &\text{otherwise}
  \end{cases},
  $$
  where $E_{ui} = 0$ for unobserved entries of $M$.

- Gradients for $X$ and $W$:
  $$
  \nabla_X = - E W + \lambda X
  $$
  $$
  \nabla_W = - E^T X + \lambda W
  $$

##### 3. Updates
- Update the latent factor matrices $X$ and $W$ simultaneously:
  $$
  X \leftarrow X - \eta \nabla_X
  $$
  $$
  W \leftarrow W - \eta \nabla_W
  $$
- Here, $\eta$ is the learning rate.

##### 4. Loss Tracking
- The total loss for each epoch combines the squared error and the regularization terms:
  $$
  L = \| M - X W^T \|_F^2 + \lambda (\|X\|_F^2 + \|W\|_F^2)
  $$
- This tracks the reconstruction error and ensures that the latent factor matrices do not grow too large (controlled by the regularization term).

##### 5. Optimization Loop
- Repeat the following steps for a fixed number of epochs or until the loss converges:
  1. Compute the error matrix $E$.
  2. Compute the gradients $\nabla_X$ and $\nabla_W$ using matrix operations.
  3. Update $X$ and $W$ using the gradients.
  4. Track and print the loss for each epoch.

---

##### Notes
- This implementation only updates $X$ and $W$ for the observed entries of $M$ using matrix masking.
- The hyperparameters ($\eta$, $d$, and $\lambda$) should be tuned based on the dataset for optimal performance.

In [None]:
!pip install tqdm_joblib

In [None]:
from joblib import Parallel, delayed
import numpy as np
from tqdm import tqdm
from tqdm_joblib import tqdm_joblib  # Per gestire le barre di avanzamento con joblib

# Set the numpy seed
np.random.seed(42)

# Hyperparameters
num_factors = 300  # Number of latent factors (k)
reg_lambda = 0.0001  # Regularization term (lambda)
gradient_clip = 10.0  # Gradient clipping threshold
num_epochs = 30 # Number of epochs
learning_rate = 0.001 # Learning rate (eta)

# Initialize X and W with small random values
X = 0.1 * np.random.randn(num_users, num_factors)
W = 0.1 * np.random.randn(num_items, num_factors)

# Loss history
loss_history = []

# Function to compute gradients for a single user (for X) or item (for W)
def compute_gradient_X(u):
    rated_items = mask[u, :]  # Mask for items rated by user u
    W_rated = W[rated_items]  # Subset of W for rated items
    M_rated = M[u, rated_items]  # Subset of M for rated items

    # Compute error and gradient for user u
    E_u = M_rated - X[u, :] @ W_rated.T
    grad = -E_u @ W_rated + reg_lambda * X[u, :]
    return np.clip(grad, -gradient_clip, gradient_clip)

def compute_gradient_W(i):
    rated_users = mask[:, i]  # Mask for users who rated item i
    X_rated = X[rated_users]  # Subset of X for rated users
    M_rated = M[rated_users, i]  # Subset of M for rated users

    # Compute error and gradient for item i
    E_i = M_rated - X_rated @ W[i, :].T
    grad = -E_i.T @ X_rated + reg_lambda * W[i, :]
    return np.clip(grad, -gradient_clip, gradient_clip)

# SGD Loop
for epoch in range(num_epochs):
    print(f"Epoch {epoch + 1}/{num_epochs}")
    
    # Compute the predicted matrix
    M_hat = X @ W.T

    # Compute the error matrix for observed entries only
    E = np.multiply(mask, M - M_hat)

    # Parallel computation of gradients with progress bars
    grad_X = np.zeros_like(X)
    grad_W = np.zeros_like(W)

    # Update X with a shared progress bar
    print("  Gradient wrt X...")
    with tqdm_joblib(tqdm(desc="    Users", total=num_users, leave=True, disable=True)) as _:
        grad_X = np.array(
            Parallel(n_jobs=-1)(delayed(compute_gradient_X)(u) for u in range(num_users))
        )

    # Update W with a shared progress bar
    print("  Gradient wrt W...")
    with tqdm_joblib(tqdm(desc="    Items", total=num_items, leave=True, disable=True)) as _:
        grad_W = np.array(
            Parallel(n_jobs=-1)(delayed(compute_gradient_W)(i) for i in range(num_items))
        )

    # Apply updates
    X -= learning_rate * grad_X
    W -= learning_rate * grad_W

    # Compute the total loss
    reconstruction_loss = np.sum(np.multiply(mask, E) ** 2)
    regularization_loss = reg_lambda * (np.linalg.norm(X, 'fro') ** 2 + np.linalg.norm(W, 'fro') ** 2)
    total_loss = reconstruction_loss + regularization_loss

    # Append the total loss to the history
    loss_history.append(total_loss)

    # Print the loss for the current epoch
    print(f"  Reconstruction Loss: {reconstruction_loss:.4f}")
    print(f"  Regularization Loss: {regularization_loss:.4f}")
    print(f"  Total Loss: {total_loss:.4f}")

    # Debugging: Check mean and std of gradients
    print(f"  Gradient X: mean={np.mean(grad_X):.4f}, std={np.std(grad_X):.4f}")
    print(f"  Gradient W: mean={np.mean(grad_W):.4f}, std={np.std(grad_W):.4f}")

# Salvataggio delle matrici X e W in formato .npy
np.save("X_matrix_sgd.npy", X)
np.save("W_matrix_sgd.npy", W)

In [None]:
from matplotlib import pyplot as plt

# Plot the loss history
plt.figure(figsize=(10, 6))
plt.plot(range(1, len(loss_history) + 1), loss_history, marker='o', linestyle='-')
plt.xlabel('Epoch')
plt.ylabel('Total Loss')
plt.title('Loss over Epochs for SGD')
plt.grid()
plt.show()

### Optimize the Loss Function with Alternating Least Squares (ALS)

Alternating Least Squares (ALS) is an optimization method for matrix factorization that alternates between updating the user latent factors ($X$) and the item latent factors ($W$).

#### Objective
The goal is to minimize the following loss function:
$$
L(X, W) = \|M - XW^T\|_F^2 + \lambda (\|X\|_F^2 + \|W\|_F^2)
$$
Where:
- $M$: User-item interaction matrix.
- $X$: User latent factor matrix ($m \times k$).
- $W$: Item latent factor matrix ($n \times k$).
- $\lambda$: Regularization parameter.

#### ALS Algorithm
1. **Initialization**:
   - Start with random values for $X$ and $W$.
2. **Alternating Updates**:
   - Fix $W$, solve for $X$:
     $$
     X_u = (W^T W + \lambda I)^{-1} W^T M_u
     $$
   - Fix $X$, solve for $W$:
     $$
     W_i = (X^T X + \lambda I)^{-1} X^T M_i
     $$
3. **Convergence**:
   - Iterate until the loss stabilizes or a set number of epochs is reached.

In [None]:
from joblib import Parallel, delayed
import numpy as np
from tqdm import tqdm
from tqdm_joblib import tqdm_joblib  # Progress bars synchronized with joblib

# Set the numpy seed
np.random.seed(42)

# Hyperparameters
num_factors = 500  # Number of latent factors (k)
reg_lambda = 0.0001  # Regularization term (lambda)
num_epochs = 60 # Number of epochs

# Initialize X and W with small random values
X = np.random.normal(scale=0.01, size=(num_users, num_factors))
W = np.random.normal(scale=0.01, size=(num_items, num_factors))

# List to store the loss values for each iteration
loss_history = []

def update_user(u, W, M, mask, reg_lambda, num_factors):
    """Update a single user's latent factors."""
    rated_items = mask[u, :]  # Mask for items rated by user u
    W_rated = W[rated_items, :]
    M_rated = M[u, rated_items]
    
    A = W_rated.T @ W_rated + reg_lambda * np.eye(num_factors)
    b = W_rated.T @ M_rated
    return np.linalg.solve(A, b)

def update_item(i, X, M, mask, reg_lambda, num_factors):
    """Update a single item's latent factors."""
    rated_users = mask[:, i]  # Mask for users who rated item i
    X_rated = X[rated_users, :]
    M_rated = M[rated_users, i]
    
    A = X_rated.T @ X_rated + reg_lambda * np.eye(num_factors)
    b = X_rated.T @ M_rated
    return np.linalg.solve(A, b)

# ALS iterations
for iteration in range(num_epochs):
    print(f"Iteration {iteration + 1}/{num_epochs}")
    
    # Update X by fixing W (Parallelized) with synchronized progress bar
    print("  Updating X...")
    with tqdm_joblib(tqdm(desc="    Users", total=num_users, leave=True, disable=True)) as _:
        X = np.array(Parallel(n_jobs=-1)(
            delayed(update_user)(u, W, M, mask, reg_lambda, num_factors) for u in range(num_users)
        ))
    
    # Update W by fixing X (Parallelized) with synchronized progress bar
    print("  Updating W...")
    with tqdm_joblib(tqdm(desc="    Items", total=num_items, leave=True, disable=True)) as _:
        W = np.array(Parallel(n_jobs=-1)(
            delayed(update_item)(i, X, M, mask, reg_lambda, num_factors) for i in range(num_items)
        ))
    
    # Compute the loss
    M_hat = X @ W.T
    reconstruction_loss = np.sum(np.multiply(mask, (M - M_hat) ** 2))
    regularization_loss = reg_lambda * (np.linalg.norm(X, 'fro') ** 2 + np.linalg.norm(W, 'fro') ** 2)
    total_loss = reconstruction_loss + regularization_loss
    
    # Save the loss in history
    loss_history.append(total_loss)
    
    # Print loss for the current iteration
    print(f"  Reconstruction Loss: {reconstruction_loss:.4f}")
    print(f"  Regularization Loss: {regularization_loss:.4f}")
    print(f"  Total Loss: {total_loss:.4f}")

# Salvataggio delle matrici X e W in formato .npy
np.save("X_matrix_als.npy", X)
np.save("W_matrix_als.npy", W)

In [None]:
from matplotlib import pyplot as plt

# Plot the loss history
plt.figure(figsize=(10, 6))
plt.plot(range(1, len(loss_history) + 1), loss_history, marker='o', linestyle='-')
plt.xlabel('Epoch')
plt.ylabel('Total Loss')
plt.title('Loss over Epochs for ALS')
plt.grid()
plt.show()

In [None]:
def recommend_movies(user_id, M_hat, user_item_matrix, movies, top_n=10):
    """
    Recommends movies for a given user based on the predicted matrix M_hat.

    Parameters:
    - user_id: ID of the user to whom the recommendations will be made.
    - M_hat: Predicted user-item matrix (num_users x num_items).
    - user_item_matrix: Original user-item matrix (Pandas DataFrame) with ratings.
    - movies: DataFrame containing movie details (Movie_ID, Name, Year).
    - top_n: Number of recommendations to return (default is 10).

    Returns:
    - recommendations: DataFrame containing the top_n recommended movies.
    """
    # Map user_id to the corresponding index in M_hat
    user_index = user_item_matrix.index.get_loc(user_id)
    
    # Get the predicted ratings for the user
    predicted_ratings = M_hat[user_index]

    # Get the user's original ratings
    original_ratings = user_item_matrix.loc[user_id]

    # Find movies that the user has not rated (those with a rating of 0)
    unrated_movies = original_ratings[original_ratings == 0].index

    # Map the unrated movies to the correct columns in M_hat
    unrated_predictions = {
        movie_id: predicted_ratings[user_item_matrix.columns.get_loc(movie_id)]
        for movie_id in unrated_movies
    }

    # Sort the predicted ratings for unrated movies in descending order
    sorted_predictions = sorted(unrated_predictions.items(), key=lambda x: x[1], reverse=True)

    # Get the top_n movie IDs
    top_movie_ids = [movie_id for movie_id, _ in sorted_predictions[:top_n]]

    # Retrieve movie details for the top_n recommendations
    recommendations = movies[movies['Movie_ID'].isin(top_movie_ids)]

    # Create a copy of the DataFrame to avoid the SettingWithCopyWarning
    recommendations = recommendations.copy()
    
    # Add the Predicted_Rating column
    recommendations['Predicted_Rating'] = [unrated_predictions[movie_id] for movie_id in recommendations['Movie_ID']]
    
    # Sort recommendations by predicted rating (optional, for clarity)
    recommendations = recommendations.sort_values(by='Predicted_Rating', ascending=False)

    return recommendations[['Movie_ID', 'Name', 'Year', 'Predicted_Rating']]

### 4. Model Testing
The function is tested with a sample user to generate personalized recommendations.

In [None]:
# Step 4: Test the recommendation function
user_id_to_test = 774868  # Select a test user
num_recommendations = 5   # Number of recommendations

try:
    # Exec the recommendation function
    recommendations = recommend_movies(user_id_to_test, M_hat, user_item_matrix, movies, top_n=num_recommendations)
    
    # Show the results
    print(f"Top {num_recommendations} movie recommendations for User {user_id_to_test}:")
    print(recommendations[['Name', 'Year', 'Predicted_Rating', 'Movie_ID']])
except KeyError as e:
    print(f"Error: User ID {user_id_to_test} not found in the dataset.")
except ValueError as e:
    print(f"Error: {e}")