 # A basic matrix factorization-based recommender system

1. Performs Singular Value Decomposition (SVD) on the user-item matrix.
2. Selects the top k singular values and vectors to approximate the matrix.
3. Approximates the user-item matrix using the reduced SVD matrices.
4. Recommends items for a specific user (in this case, John) by comparing his original ratings with the recommended ratings based on the approximation.

## 0. Prerequisites

In [18]:
import numpy as np
import pandas as pd

pd.set_option('display.float_format', lambda x: '%.2f' % x)  # Display numbers as decimals instead of scientific value

## 1. Load the data

In [4]:
# Define the user-item matrix (ratings)
user_item_matrix = np.array([
    [5, 0, 0, 0, 4, 0, 0],  # John
    [0, 4, 0, 0, 0, 5, 0],  # Alice
    [0, 0, 5, 0, 0, 0, 0],  # Sarah
    [0, 0, 0, 4, 0, 0, 3],  # Tom
    [3, 0, 0, 3, 5, 4, 4]   # Emma
])


# List of users
users = ["John", "Alice", "Sarah", "Tom", "Emma"]
# List of items
items = ["Terminator", "Alien", "Titanic", "The Notebook", "The Avengers", "The Godfather", "Jurassic Park"]

Visualize the data

In [20]:
def user_item_matrix_as_df(user_item_matrix, users, items):
    """ Create a DataFrame with the user-item matrix """
    df = pd.DataFrame(user_item_matrix, columns=items, index=users)
    return df

user_item_matrix_as_df(user_item_matrix, users, items)

Unnamed: 0,Terminator,Alien,Titanic,The Notebook,The Avengers,The Godfather,Jurassic Park
John,5,0,0,0,4,0,0
Alice,0,4,0,0,0,5,0
Sarah,0,0,5,0,0,0,0
Tom,0,0,0,4,0,0,3
Emma,3,0,0,3,5,4,4


## 2. Singular value decomposition


Performing Singular Value Decomposition (SVD) on the user-item matrix decomposes it into three separate matrices: U, Sigma, and Vt.

The matrix U represents the relationship between users and latent features, where each row corresponds to a user and each column to a latent feature. Sigma is a diagonal matrix containing the singular values, representing the importance of each latent feature. The matrix Vt represents the relationship between items and latent features, where each row corresponds to a latent feature and each column to an item.

By selecting the top k singular values and vectors, we approximate the original user-item matrix. The intuition behind this is that the first k singular values and corresponding vectors capture the most significant patterns in the data. These patterns represent the most important relationships between users, items, and latent features.

Therefore, by selecting these top _k_ singular values and vectors, we can effectively reduce the dimensionality of the user-item matrix while retaining the most important information. This approximation allows us to represent the original data in a lower-dimensional space, making it more manageable and efficient for further analysis and recommendation tasks.

In [19]:
# Perform Singular Value Decomposition (SVD) to get matrices U, Sigma, and Vt
U, Sigma, Vt = np.linalg.svd(user_item_matrix)

# Select the top k singular values and vectors to approximate the matrix
k = 2                            # Number of singular values and vectors to select
U_k = U[:, :k]                   # First k columns of U       
Sigma_k = np.diag(Sigma[:k])     # First kxk square matrix of Sigma
Vt_k = Vt[:k, :]                 # First k rows of Vt   


Unnamed: 0,Terminator,Alien,Titanic,The Notebook,The Avengers,The Godfather,Jurassic Park
John,3.29,-1.28,-0.0,1.52,3.58,-0.14,1.78
Alice,-0.98,3.27,0.0,0.87,-0.24,4.92,1.02
Sarah,-0.0,0.0,0.0,-0.0,-0.0,0.0,-0.0
Tom,1.14,0.24,0.0,0.84,1.43,1.11,0.98
Emma,3.87,0.83,0.0,2.86,4.88,3.78,3.34


## 3. Approximate the user-item matrix using the reduced SVD matrices

In [34]:
# Approximate the user-item matrix using the reduced SVD matrices
user_item_matrix_approx = np.dot(np.dot(U_k, Sigma_k), Vt_k)

# Make it a pandas dataframe for easy visualization
user_item_matrix_as_df(user_item_matrix_approx, users, items)

Unnamed: 0,Terminator,Alien,Titanic,The Notebook,The Avengers,The Godfather,Jurassic Park
John,3.29,-1.28,-0.0,1.52,3.58,-0.14,1.78
Alice,-0.98,3.27,0.0,0.87,-0.24,4.92,1.02
Sarah,-0.0,0.0,0.0,-0.0,-0.0,0.0,-0.0
Tom,1.14,0.24,0.0,0.84,1.43,1.11,0.98
Emma,3.87,0.83,0.0,2.86,4.88,3.78,3.34


## 4. What are the predicted ratings for  John?

In [29]:
# Recommend items for a specific user (e.g., John)
user = "John"
user_index = users.index(user)    #get the index of the user

john_ratings = user_item_matrix[user_index]
recommended_ratings = user_item_matrix_approx[user_index]

# Print recommended ratings for John
print("John's original ratings:     ", john_ratings)
print("Recommended ratings for John:", [round(rating, 2) for rating in recommended_ratings])


John's original ratings:      [5 0 0 0 4 0 0]
Recommended ratings for John: [3.29, -1.28, -0.0, 1.52, 3.58, -0.14, 1.78]


## 5. Let's make recommendations

In [31]:
# get the pairs: (item, rating) for the items that John has not rated
recommended_items = [(items[i], round(recommended_ratings[i], 2)) for i in range(len(items)) if john_ratings[i] == 0]
# Sort the list by rating in descending order
recommended_items = sorted(recommended_items, key=lambda x: x[1], reverse=True)
recommended_items

[('Jurassic Park', 1.78),
 ('The Notebook', 1.52),
 ('Titanic', -0.0),
 ('The Godfather', -0.14),
 ('Alien', -1.28)]

### Does it make sense?

In [33]:
# print the input matrix once again
user_item_matrix_as_df(user_item_matrix, users, items)

Unnamed: 0,Terminator,Alien,Titanic,The Notebook,The Avengers,The Godfather,Jurassic Park
John,5,0,0,0,4,0,0
Alice,0,4,0,0,0,5,0
Sarah,0,0,5,0,0,0,0
Tom,0,0,0,4,0,0,3
Emma,3,0,0,3,5,4,4
