# Matrix Factorisation

We look at two matrix factorisation methods: Non-negative Matrix Factorisation (NMF) and Singular Value Decomposition (SVD). We will use these methods to make recommendations for users in the Amazon Reviews dataset. We will also evaluate the performance of the recommendations using the root mean squared error (RMSE) metric, as well as MAE and F1-score (possibly).

## Non-negative Matrix Factorisation (NMF)

Is a matrix factorisation method that factorises a matrix $V$ into two matrices $W$ and $H$ such that $V \approx WH$. The columns of $W$ are called the basis vectors and the rows of $H$ are called the coefficients.It is a dimensionality reduction technique that can be applied to your user-item ratings matrix to find lower-dimensional representations of users and items. In the context of recommendation systems, NMF can be used to uncover latent factors that capture underlying patterns in user preferences and item characteristics. These latent factors can then be used to make recommendations for users.

Steps:

1. Create a matrix $V$ of user-item ratings
2. Initialise $W$ and $H$ with random values
3. Update $W$ and $H$ iteratively until $V \approx WH$
4. Use $W$ and $H$ to make recommendations by reconstructing $V$ from $W$ and $H$ by multiplying them together
5. Recommend the items with the highest predicted ratings
6. Evaluate the performance of the recommendations by calculating the error between the predicted ratings and the actual ratings 


## Singular Value Decomposition (SVD)

Is a matrix factorisation method that factorises a matrix $V$ into three matrices $U$, $S$, and $V$ such that $V \approx USV^T$. The columns of $U$ are called the left singular vectors, the columns of $V$ are called the right singular vectors, and $S$ is a diagonal matrix of singular values. It is a dimensionality reduction technique that can be applied to your user-item ratings matrix to find lower-dimensional representations of users and items. In the context of recommendation systems, SVD can be used to uncover latent factors that capture underlying patterns in user preferences and item characteristics. These latent factors can then be used to make recommendations for users.

Steps (very similar to NMF):

1. Create a matrix $V$ of user-item ratings
2. Factorise $V$ into $U$, $S$, and $V$ using SVD
3. Use $U$, $S$, and $V$ to make recommendations by reconstructing $V$ from $U$, $S$, and $V$ by multiplying them together
4. Recommend the items with the highest predicted ratings
5. Evaluate the performance of the recommendations by calculating the error between the predicted ratings and the actual ratings


***
# Testing Area (Ignore)

Here we will test out the workings of matrix factorisation collaborative filtering. Specifically, we will be conducting non-negative matrix factorisation (NMF) and singular value decomposition (SVD). We will be using the sample data created. The steps are as follows:

1. Have User Item matrix
2. Hide some ratings to simulate a test set
3. Factorise the matrix
4. Predict the hidden ratings - fill in missing values with predicted ratings
6. Take the predicted ratings and compare them to the hidden ratings
7. Calculate MAE, RMSE, MSE


In [1]:
%reset -f

# load libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
x = pd.read_csv(r"C:\Users\e1002902\Documents\GitHub Repository\Masters-Dissertation\Code\temp_data.csv", index_col=0)
x

Unnamed: 0,book1,book2,book3,book4,book5,book6,book7,book8,book9,book10
user1,0,0,2,5,4,3,4,4,4,4
user2,4,0,3,5,0,0,0,0,0,4
user3,0,3,4,4,0,2,0,0,0,0
user4,0,0,3,5,4,0,0,0,0,0
user5,3,4,0,4,4,0,5,5,5,5
user6,4,5,0,0,0,0,4,2,2,0
user7,2,2,0,0,0,0,5,3,3,3
user8,0,5,4,0,4,3,0,0,0,0
user9,0,5,4,0,5,2,0,2,2,0
user10,0,0,0,0,5,0,4,4,4,4


In [3]:
# create a copy of the original matrix to store hidden ratings
x_hidden = x.copy()
indices_tracker = []

# identifies rated books and randomly selects 2 books to hide ratings for each user
np.random.seed(10)  # You can use any integer value as the seed
for user_id in range(x_hidden.shape[0]):
    rated_books = np.where(x_hidden.iloc[user_id, :] > 0)[0]
    print(user_id)
    print(rated_books)
    hidden_indices = np.random.choice(rated_books, min(2, len(rated_books)), replace=False)
    indices_tracker.append(hidden_indices)
    print(hidden_indices)
    x_hidden.iloc[user_id, hidden_indices] = 0


0
[2 3 4 5 6 7 8 9]
[4 5]
1
[0 2 3 9]
[3 9]
2
[1 2 3 5]
[5 1]
3
[2 3 4]
[4 2]
4
[0 1 3 4 6 7 8 9]
[9 1]
5
[0 1 6 7 8]
[8 1]
6
[0 1 6 7 8 9]
[8 9]
7
[1 2 4 5]
[1 5]
8
[1 2 4 5 7 8]
[1 7]
9
[4 6 7 8 9]
[6 4]
10
[0 1 2 4 6 7 8]
[2 0]
11
[0 1 2 4 5 6 7 8]
[1 7]


In [4]:
# check tracker - all hidden ratings 
indices_tracker = pd.DataFrame(indices_tracker).to_numpy()
indices_tracker

# flattened
indices_tracker_flat = indices_tracker.flatten()
indices_tracker_flat


array([4, 5, 3, 9, 5, 1, 4, 2, 9, 1, 8, 1, 8, 9, 1, 5, 1, 7, 6, 4, 2, 0,
       1, 7], dtype=int64)

In [5]:
# see updated matrix with hidden ratings
display(x_hidden)

# see original matrix
display(x)

Unnamed: 0,book1,book2,book3,book4,book5,book6,book7,book8,book9,book10
user1,0,0,2,5,0,0,4,4,4,4
user2,4,0,3,0,0,0,0,0,0,0
user3,0,0,4,4,0,0,0,0,0,0
user4,0,0,0,5,0,0,0,0,0,0
user5,3,0,0,4,4,0,5,5,5,0
user6,4,0,0,0,0,0,4,2,0,0
user7,2,2,0,0,0,0,5,3,0,0
user8,0,0,4,0,4,0,0,0,0,0
user9,0,0,4,0,5,2,0,0,2,0
user10,0,0,0,0,0,0,0,4,4,4


Unnamed: 0,book1,book2,book3,book4,book5,book6,book7,book8,book9,book10
user1,0,0,2,5,4,3,4,4,4,4
user2,4,0,3,5,0,0,0,0,0,4
user3,0,3,4,4,0,2,0,0,0,0
user4,0,0,3,5,4,0,0,0,0,0
user5,3,4,0,4,4,0,5,5,5,5
user6,4,5,0,0,0,0,4,2,2,0
user7,2,2,0,0,0,0,5,3,3,3
user8,0,5,4,0,4,3,0,0,0,0
user9,0,5,4,0,5,2,0,2,2,0
user10,0,0,0,0,5,0,4,4,4,4


In [10]:
# Factorise the matrix (x_hidden) using NMF (non-negative matrix factorisation)
from sklearn.decomposition import NMF

# Specify the number of latent factors (rank)
rank = 5  # Adjust as needed

model = NMF(n_components=rank, init='random', random_state=0)
W = model.fit_transform(x_hidden) # decompose your original matrix into two lower-dimensional matrices
H = model.components_

# check shapes
print("Shape of W:", W.shape)
print("Shape of H:", H.shape)

# see W and H
print("\nW:\n", W)  

print("\nH:\n", H)

# save W and H as dataframes and csv files
W_df = pd.DataFrame(W)
W_df.to_csv(r"C:\Users\e1002902\Documents\GitHub Repository\Masters-Dissertation\Code\W.csv")

H_df = pd.DataFrame(H)
H_df.to_csv(r"C:\Users\e1002902\Documents\GitHub Repository\Masters-Dissertation\Code\H.csv")

Shape of W: (12, 2)
Shape of H: (2, 10)

W:
 [[0.02621048 2.57028713]
 [0.83923658 0.        ]
 [0.70929386 0.07454354]
 [0.         0.43807068]
 [0.43782946 2.99276301]
 [0.10305699 1.18898409]
 [0.         1.51821899]
 [1.23907235 0.        ]
 [1.59759402 0.        ]
 [0.         1.4585747 ]
 [0.37850361 2.62390924]
 [1.38657223 0.78657517]]

H:
 [[1.32621959 0.         2.76494453 0.1839249  2.25470865 0.97237289
  0.29930431 0.         0.80520162 0.        ]
 [0.54552099 0.28458278 0.         0.91521584 0.63426624 0.
  1.78572324 1.75647939 1.49879171 0.55360175]]


In [11]:
# check reconstruction, change to 2 decimal places, dataframe and csv
x_reconstructed = np.dot(W,H)
x_reconstructed = pd.DataFrame(x_reconstructed)
x_reconstructed = x_reconstructed.round(2)
x_reconstructed.to_csv(r"C:\Users\e1002902\Documents\GitHub Repository\Masters-Dissertation\Code\x_reconstructed.csv")
x_reconstructed


Unnamed: 0,0,1,2,3,4,5,6,7,8,9
0,1.44,0.73,0.07,2.36,1.69,0.03,4.6,4.51,3.87,1.42
1,1.11,0.0,2.32,0.15,1.89,0.82,0.25,0.0,0.68,0.0
2,0.98,0.02,1.96,0.2,1.65,0.69,0.35,0.13,0.68,0.04
3,0.24,0.12,0.0,0.4,0.28,0.0,0.78,0.77,0.66,0.24
4,2.21,0.85,1.21,2.82,2.89,0.43,5.48,5.26,4.84,1.66
5,0.79,0.34,0.28,1.11,0.99,0.1,2.15,2.09,1.87,0.66
6,0.83,0.43,0.0,1.39,0.96,0.0,2.71,2.67,2.28,0.84
7,1.64,0.0,3.43,0.23,2.79,1.2,0.37,0.0,1.0,0.0
8,2.12,0.0,4.42,0.29,3.6,1.55,0.48,0.0,1.29,0.0
9,0.8,0.42,0.0,1.33,0.93,0.0,2.6,2.56,2.19,0.81


In [None]:



# check reconstruction error for original matrix
from sklearn.metrics import mean_squared_error
print(mean_squared_error(x, np.dot(W,H)))


***

# Alternating Least Squares (ALS)

1. Create a matrix $V$ of user-item ratings
2. Initialise $W$ and $H$ with random values
3. Update $W$ and $H$ iteratively until $V \approx WH$
4. Use $W$ and $H$ to make recommendations by reconstructing $V$ from $W$ and $H$ by multiplying them together

In [12]:
# Specify the number of latent factors (rank)
rank = 5  # Adjust as needed

# dimensions
num_users = x.shape[0]
num_items = x.shape[1]

# Initialize user and item matrices
user_matrix = np.random.rand(num_users, rank)
item_matrix = np.random.rand(rank, num_items)


In [13]:
num_iterations = 10  # Adjust as needed

for iteration in range(num_iterations):
    # Update user matrix while keeping item matrix fixed
    for i in range(num_users):
        user_matrix[i, :] = update_user(i, user_matrix, item_matrix, observed_ratings_matrix)

    # Update item matrix while keeping user matrix fixed
    for j in range(num_items):
        item_matrix[:, j] = update_item(j, user_matrix, item_matrix, observed_ratings_matrix)


NameError: name 'update_user' is not defined