# Attempting Matrix Factorisation



In this notebook, I attempt to implement Matrix Factorisation to identify any latent factors to use in the collaborative layer. 

In [1]:
import pandas as pd 
import numpy as np 
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, mean_absolute_error
from surprise import SVD
from surprise.model_selection import cross_validate
from surprise import Dataset, Reader
from surprise import NMF
from surprise import SVDpp
from surprise import SlopeOne
from surprise import NMF
from sklearn.model_selection import KFold

In [2]:
num_users = 150 # number of dummy users 
num_items_a = 162 #total number of activities
num_items_r = 326 # total number of restaurants 

## Creating dummy interaction matrix

Generate interaction matrix for user-restaurant because there is no real data

In [3]:
interaction_matrix_restaurant = np.random.randint(0,5,size = (num_users, num_items_r)) #random rating from 0-5

print('Dummy User-Item Interaction Matrix: ')
print(interaction_matrix_restaurant)

Dummy User-Item Interaction Matrix: 
[[2 1 2 ... 3 1 0]
 [1 0 1 ... 2 4 2]
 [2 2 2 ... 2 4 3]
 ...
 [0 1 1 ... 2 3 0]
 [4 1 0 ... 3 0 4]
 [3 3 1 ... 0 2 1]]


In [4]:
imdf_r = pd.DataFrame(interaction_matrix_restaurant)

In the matrix below, there are a total of 326 items(columns), representing each restaurant in the dataset and 150 dummy users(rows). 
The values range from 0 to 5, indicating the strength of the interaction between users and items, with 5 being the strongest and 0 being the weakest.

In [5]:
imdf_r.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,316,317,318,319,320,321,322,323,324,325
0,2,1,2,3,4,2,1,3,2,0,...,2,1,0,4,2,3,3,3,1,0
1,1,0,1,1,2,1,3,0,2,1,...,1,3,0,2,1,3,0,2,4,2
2,2,2,2,2,0,2,1,0,0,0,...,4,2,3,0,4,2,0,2,4,3
3,2,4,0,1,1,2,0,1,1,4,...,0,2,0,0,2,3,1,2,2,2
4,4,2,3,3,2,1,2,4,2,0,...,3,2,4,1,1,3,2,2,3,4


With only 9605 zero elements out of 54300 elements in the matrix, it is a dense matrix instead of sparse, which is something to take into consideration when deciding on which matrix factorisation technique to use. 

In [6]:
np.count_nonzero(imdf_r == 0)

9967

In [8]:
imdf_r.to_csv('../datasets/imdf_r.csv')

Generate interaction matrix for user-activity because there is no real data

In [9]:
interaction_matrix_activity = np.random.randint(0,5,size = (num_users, num_items_a)) #random rating from 0-5

print('Dummy User-Item Interaction Matrix: ')
print(interaction_matrix_activity)

Dummy User-Item Interaction Matrix: 
[[1 2 1 ... 1 0 0]
 [0 3 1 ... 4 0 3]
 [0 3 2 ... 1 1 3]
 ...
 [1 4 1 ... 0 3 3]
 [2 2 3 ... 2 1 2]
 [2 2 1 ... 3 1 0]]


In [10]:
imdf_a = pd.DataFrame(interaction_matrix_activity)

In the matrix below, there are a total of 162 items(columns), representing each restaurant and activity in the dataset and 150 dummy users(rows). 
The values range from 0 to 5, indicating the strength of the interaction between users and items, with 5 being the strongest and 0 being the weakest.

In [11]:
imdf_a.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,152,153,154,155,156,157,158,159,160,161
0,1,2,1,1,0,4,4,0,4,1,...,0,3,3,3,4,2,3,1,0,0
1,0,3,1,1,4,0,1,2,2,2,...,3,4,3,0,4,1,1,4,0,3
2,0,3,2,2,1,1,2,1,3,3,...,2,4,3,0,2,4,2,1,1,3
3,0,0,4,4,0,1,4,1,3,4,...,2,0,0,3,2,1,3,0,0,0
4,4,2,4,1,3,3,3,2,1,0,...,2,3,3,3,3,3,2,2,4,1


With only 5011 zero elements out of 24300 elements in the matrix, it is a dense matrix instead of sparse, which is something to take into consideration when deciding on which matrix factorisation technique to use. 

In [12]:
np.count_nonzero(imdf_a == 0)

4812

In [13]:
imdf_r.to_csv('../datasets/imdf_r.csv')
imdf_a.to_csv('../datasets/imdf_a.csv')

---
## Split into training and test set

In [14]:
interaction_matrix_restaurant = interaction_matrix_restaurant.astype(float)
interaction_matrix_activity = interaction_matrix_activity.astype(float)

In [15]:
train_restaurant, test_restaurant = train_test_split(interaction_matrix_restaurant, test_size=0.2, random_state=42)
train_actviity, test_activity = train_test_split(interaction_matrix_activity, test_size=0.2, random_state=42)
# Print the shapes of the resulting matrices
print("Train matrix shape:", train_restaurant.shape)
print("Test matrix shape:", test_restaurant.shape)
print("Train matrix shape:", train_actviity.shape)
print("Test matrix shape:", test_activity.shape)

Train matrix shape: (120, 326)
Test matrix shape: (30, 326)
Train matrix shape: (120, 162)
Test matrix shape: (30, 162)


---
## Alternating Least Squares (ALS)

Aim: to factorise user-item interaction matrix into two lower-dimensional matrices: one representing users and one representing items. These matrices are then optimised iteratively to minimise the error between the reconstructed matrix and the original one 

Steps involved:
1. Prepare your data: Organize user-item interaction data into a matrix format, where rows represent users, columns represent items, and the values represent user-item interactions

2. Initialize user and item matrices: Start with random or predefined initial values for the user and item matrices.

3. Alternating optimization: Alternately optimize the user and item matrices to minimize the error between the reconstructed matrix and the original one. This is usually done using techniques like gradient descent or alternating least squares.

4. Convergence: Repeat the optimization process until the algorithm converges, meaning that the error between the reconstructed and original matrices is minimized or reaches a plateau.

5. Generate recommendations: Once the user and item matrices are optimized, you can generate recommendations by reconstructing the original matrix and recommending items with high predicted ratings or interactions for a given user.

6. Evaluation: Evaluate the performance of your recommendation system using appropriate evaluation metrics, such as precision, recall, or mean squared erro

Initialising user and item matrices where each row represents a user and each column represents a latent feature. The values represnt how much each user/item exhibits each latent feature. 

In [16]:
def initialize_matrices(num_users, num_items, num_latent_factors):
    # Initialize the user matrix with random values
    user_matrix = np.random.rand(num_users, num_latent_factors)
    
    # Initialize the item matrix with random values
    item_matrix = np.random.rand(num_items, num_latent_factors)
    
    return user_matrix, item_matrix




In [17]:
# restaurants
num_users = 120
num_items = 326
num_latent_factors = 10

user_matrix_r, item_matrix_r = initialize_matrices(num_users, num_items, num_latent_factors)

In [18]:
#activities
# Example usage
num_users = 120
num_items = 162
num_latent_factors = 10

user_matrix_a, item_matrix_a = initialize_matrices(num_users, num_items, num_latent_factors)

In [19]:
print(user_matrix_a,item_matrix_a)

[[0.23223857 0.04794252 0.84701604 ... 0.44410231 0.1780637  0.84541164]
 [0.38879236 0.39366708 0.47263839 ... 0.76839392 0.969153   0.16384953]
 [0.87412123 0.94690011 0.4872595  ... 0.70944278 0.97937195 0.11639709]
 ...
 [0.06322425 0.64555758 0.94346293 ... 0.07554124 0.39082618 0.80406111]
 [0.40388519 0.68745364 0.26557076 ... 0.79192197 0.24902472 0.73351865]
 [0.57730821 0.29264963 0.21143486 ... 0.98147428 0.09034614 0.54287743]] [[0.93902624 0.25008497 0.0778291  ... 0.62857695 0.90773712 0.30265775]
 [0.5991143  0.02997865 0.26000696 ... 0.37051572 0.14804131 0.21333704]
 [0.96084034 0.32498541 0.41610364 ... 0.01197696 0.22436816 0.580301  ]
 ...
 [0.39795181 0.8418086  0.99796522 ... 0.2770021  0.80253326 0.70295762]
 [0.76107205 0.12793826 0.44190426 ... 0.76294277 0.57848331 0.30315952]
 [0.21683109 0.43421446 0.98124882 ... 0.16233436 0.03176367 0.97857845]]


The function ALS returns the optimised user and item matrices P and Q




In [20]:
def ALS(train_matrix, latent_features, iterations=10, lambda_reg=0.1):
    
    num_users, num_items = train_matrix.shape
    user_matrix = np.random.rand(num_users, latent_features)
    item_matrix = np.random.rand(num_items, latent_features)
    
    for i in range(iterations):
        for u in range(num_users):
            user_matrix[u] = np.linalg.solve(np.dot(item_matrix.T, item_matrix) + lambda_reg * np.eye(latent_features),
                                              np.dot(item_matrix.T, train_matrix[u].T)).T
        

        for i in range(num_items):
            item_matrix[i] = np.linalg.solve(np.dot(user_matrix.T, user_matrix) + lambda_reg * np.eye(latent_features),
                                              np.dot(user_matrix.T, train_matrix[:, i]))
    
    return user_matrix, item_matrix





Setting values for parameters:  

train_matrix: the training user-item interaction matrix   
latent_features = number of latent features  
iterations = number of iterations(default = 10)  
lambda_ = regularisation parameter(default = 0.1)

In [21]:
train_matrix = train_restaurant
latent_features = 5
iterations = 100
lambda_ = 0.1

user_restaurant, item_restaurant = ALS(train_matrix, latent_features, iterations, lambda_)

In [22]:
user_restaurant

array([[ 2.43390578e-01,  2.21655172e-01,  1.28622621e+00,
         1.04736393e+00,  8.26745810e-01],
       [ 2.07024661e-01,  7.49672486e-01,  7.94508366e-01,
         7.58719898e-01,  1.19537111e+00],
       [ 1.08087975e+00,  7.01937791e-01,  5.40926431e-01,
         5.78212220e-01,  8.66031794e-01],
       [ 8.60767583e-01,  5.13134828e-01,  3.03197547e-01,
         1.06930394e+00,  1.33714881e+00],
       [ 6.86630581e-01,  6.15803651e-01,  2.45420605e-01,
         1.14426889e+00,  1.11417233e+00],
       [ 6.09670674e-01,  1.02299984e+00,  6.34380352e-01,
         9.57037061e-01,  5.70891422e-01],
       [ 7.48655727e-01,  9.38678462e-01,  9.73618217e-01,
         8.24514820e-01,  3.76067848e-01],
       [ 1.06830800e+00, -2.31252677e-02,  1.15284207e+00,
         8.28534953e-01,  9.60632983e-01],
       [ 1.03502247e+00,  1.09024835e+00,  9.05485086e-01,
         6.49497637e-02,  6.83574502e-01],
       [ 3.59289696e-01,  6.90138804e-01,  7.42367917e-01,
         1.12984359e+00

In [23]:
train_matrix = train_actviity
latent_features = 5 
lambda_ = 0.1

user_activity, item_activity = ALS(train_matrix, latent_features, iterations, lambda_)

In [24]:
item_activity.shape

(162, 5)

Step 1: Compute predicted ratings matrix by taking the dot product of user and item (transposed) matrices

In [25]:
predicted_ratings_restaurant = np.dot(user_restaurant, item_restaurant.T)
predicted_ratings_activity = np.dot(user_activity, item_activity.T)

In [26]:
predicted_ratings_restaurant.shape


(120, 326)

Step 2: Generate predictions (on test set)

In [27]:
def predict_ratings(user_matrix, item_matrix, test_set):
    num_users, num_items = test_set.shape

    predicted_ratings = np.zeros((num_users, num_items))
    for user_id in range(num_users):
        for item_id in range(num_items):

            user_features = user_matrix[user_id]
            item_features = item_matrix[item_id]
            
            predicted_ratings[user_id, item_id] = np.dot(user_features, item_features)
    
    return predicted_ratings



In [28]:
pred_activity = predict_ratings(user_activity, item_activity, test_activity)
pred_restaurant = predict_ratings(user_restaurant, item_restaurant, test_restaurant)

In [29]:
pred_activity.shape

(30, 162)

In [30]:
test_activity.shape

(30, 162)

Step 3. Calculate msea, maea, rmse

In [31]:
msea = mean_squared_error(test_activity, pred_activity)
mser = mean_squared_error(test_restaurant, pred_restaurant)

rmsea = np.sqrt(msea)
rmser = np.sqrt(mser)

maea = mean_absolute_error(test_activity, pred_activity)
maer = mean_absolute_error(test_restaurant, pred_restaurant)

print(msea,     mser)
print(rmsea,   rmser)
print(maea,    maer)

2.235298758522095 2.1845225002200976
1.4950915552306805 1.478013024374311
1.2746197340419154 1.2719422524181443


The scores above are basically the error score of the predictions and the lowest score in this case is 1.49. Considering that our values only range between 0-4, an error of more than 1 is considered too large and unacceptable. Hence, proceed to try cross validation to see if results improve.

## Cross validation

In [32]:
k_folds = 5
kf = KFold(n_splits=k_folds, shuffle=True, random_state=42)
mse_scores = []
rmse_scores = []

interaction_matrix = interaction_matrix_activity
for fold, (train_index, test_index) in enumerate(kf.split(interaction_matrix)):
    print(f"Fold {fold + 1}/{k_folds}")
    X_train, X_test = interaction_matrix[train_index], interaction_matrix[test_index]
    train_matrix = X_train
    latent_features = 4 
    iterations = 100
    lambda_ = 0.1
    user_m, item_m = ALS(train_matrix, latent_features, iterations, lambda_)
 
    
    predicted = predict_ratings(user_m, item_m, X_test)

    mse = mean_squared_error(X_test, predicted)
    mse_scores.append(mse)
    rmse= np.sqrt(mse)
    rmse_scores.append(rmse)

    print(f"MSE: {mse}")
    print(f"RMSE: {rmse}")

average_mse = np.mean(mse_scores)
print(f"Average MSE across all folds: {average_mse}")
average_rmse = np.mean(rmse_scores)
print(f"Average RMSE across all folds: {average_rmse}")


Fold 1/5
MSE: 2.1237376316980154
RMSE: 1.457304920631923
Fold 2/5
MSE: 2.2354796323730004
RMSE: 1.4951520432293834
Fold 3/5
MSE: 2.153696076994575
RMSE: 1.4675476404514354
Fold 4/5
MSE: 2.1641902972253018
RMSE: 1.4711187230218035
Fold 5/5
MSE: 2.1443150354226743
RMSE: 1.4643479898653442
Average MSE across all folds: 2.164283734742713
Average RMSE across all folds: 1.471094263439978


From a preliminary check on the acitivity data, the scores did not imrpove significantly even with cross validation. 

--- 

## Using Suprise Package, other matrix factorisation algorithms will be tested to see if they perform better than ALS

### SVD - Singular Vector Decomposition

Preparing dataset into format compatible with Surpise

In [43]:
imdf_r = pd.read_csv('../datasets/imdf_r.csv', index_col = False)

In [45]:
imdf_r.drop(columns = ['Unnamed: 0'], inplace = True)

In [46]:
imdf_r.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,316,317,318,319,320,321,322,323,324,325
0,2,1,2,3,4,2,1,3,2,0,...,2,1,0,4,2,3,3,3,1,0
1,1,0,1,1,2,1,3,0,2,1,...,1,3,0,2,1,3,0,2,4,2
2,2,2,2,2,0,2,1,0,0,0,...,4,2,3,0,4,2,0,2,4,3
3,2,4,0,1,1,2,0,1,1,4,...,0,2,0,0,2,3,1,2,2,2
4,4,2,3,3,2,1,2,4,2,0,...,3,2,4,1,1,3,2,2,3,4


Reader is required for Surprise Algorithm to read the data.

In [52]:
reader = Reader(rating_scale=(0, 5))
ratings = []
for user_id, row in imdf_r.iterrows():
    for item_id, rating in row.items():
        ratings.append((str(user_id), str(item_id), rating))
ratings_df = pd.DataFrame(ratings, columns=['userID', 'itemID', 'rating'])
data = Dataset.load_from_df(ratings_df[['userID', 'itemID', 'rating']], reader)



In [53]:
cross_validate(algo, data, measures = ['RMSE', 'MAE'], cv = 5, verbose = True)

Evaluating RMSE, MAE of algorithm SVD on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.5191  1.5183  1.5361  1.5489  1.5325  1.5310  0.0114  
MAE (testset)     1.2949  1.2906  1.3107  1.3243  1.3038  1.3049  0.0120  
Fit time          0.17    0.15    0.15    0.15    0.15    0.15    0.01    
Test time         0.02    0.02    0.02    0.12    0.02    0.04    0.04    


{'test_rmse': array([1.51906464, 1.51828097, 1.53611664, 1.54892947, 1.53247354]),
 'test_mae': array([1.2949248 , 1.2906008 , 1.31073996, 1.32425648, 1.30376204]),
 'fit_time': (0.1650559902191162,
  0.15061688423156738,
  0.14901185035705566,
  0.1491858959197998,
  0.1505904197692871),
 'test_time': (0.01655125617980957,
  0.01557612419128418,
  0.015522003173828125,
  0.11651301383972168,
  0.015401124954223633)}

---
### NMF - Non-negative Matrix Factorisation

In [55]:
algo = NMF()

In [58]:
cross_validate(algo, data, measures = ['RMSE', 'MAE'], cv = 5, verbose = True)

Evaluating RMSE, MAE of algorithm NMF on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.4581  1.4585  1.4623  1.4537  1.4643  1.4594  0.0037  
MAE (testset)     1.2568  1.2565  1.2632  1.2551  1.2677  1.2598  0.0048  
Fit time          0.17    0.14    0.14    0.14    0.14    0.15    0.01    
Test time         0.01    0.01    0.07    0.01    0.01    0.02    0.02    


{'test_rmse': array([1.45810061, 1.45854903, 1.46229557, 1.45370183, 1.4642697 ]),
 'test_mae': array([1.25679142, 1.2564577 , 1.26322188, 1.25508967, 1.26766373]),
 'fit_time': (0.16717314720153809,
  0.14203214645385742,
  0.1410510540008545,
  0.1439528465270996,
  0.14159321784973145),
 'test_time': (0.013751029968261719,
  0.013382911682128906,
  0.07115483283996582,
  0.01334524154663086,
  0.012676715850830078)}

---
### SlopeOne()

In [63]:
algo = SlopeOne()

In [64]:
cross_validate(algo, data, measures = ['RMSE', 'MAE'], cv = 5, verbose = True)

Evaluating RMSE, MAE of algorithm SlopeOne on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.4279  1.4171  1.4389  1.4218  1.4213  1.4254  0.0076  
MAE (testset)     1.2300  1.2177  1.2449  1.2244  1.2239  1.2282  0.0092  
Fit time          0.13    0.12    0.11    0.12    0.11    0.12    0.01    
Test time         0.47    0.54    0.46    0.47    0.47    0.48    0.03    


{'test_rmse': array([1.42791307, 1.41714347, 1.43886066, 1.4217689 , 1.42130731]),
 'test_mae': array([1.22998866, 1.21772557, 1.24488126, 1.22443019, 1.22386899]),
 'fit_time': (0.12915515899658203,
  0.11556005477905273,
  0.11489200592041016,
  0.11579108238220215,
  0.11313700675964355),
 'test_time': (0.47347283363342285,
  0.5378050804138184,
  0.46018099784851074,
  0.4691200256347656,
  0.4650609493255615)}

In [67]:
algo = SVDpp()
cross_validate(algo, data, measures = ['RMSE', 'MAE'], cv = 5, verbose = True)

Evaluating RMSE, MAE of algorithm SVDpp on 5 split(s).

                  Fold 1  Fold 2  Fold 3  Fold 4  Fold 5  Mean    Std     
RMSE (testset)    1.5287  1.5425  1.5448  1.5428  1.5174  1.5353  0.0106  
MAE (testset)     1.2975  1.3117  1.3095  1.3099  1.2884  1.3034  0.0090  
Fit time          3.80    3.71    3.70    3.71    3.74    3.73    0.04    
Test time         0.79    0.81    0.78    0.86    0.79    0.81    0.03    


{'test_rmse': array([1.52872898, 1.54253428, 1.54475821, 1.54283102, 1.51741392]),
 'test_mae': array([1.29753021, 1.31170461, 1.30949013, 1.30992129, 1.28841578]),
 'fit_time': (3.8010001182556152,
  3.713340997695923,
  3.695087194442749,
  3.710900068283081,
  3.7365708351135254),
 'test_time': (0.7863340377807617,
  0.8069970607757568,
  0.7829139232635498,
  0.8641388416290283,
  0.7859749794006348)}

---
## Conclusion

Due to the randomness of the data, the results of all the matrix factorisation algorithms were poor, with RMSE scores being around 1.5 which is not acceptable given our range of values is 0-4. Hence, matrix factorisation is not incorporated into the final model. However, when more user interaction data is collected as more people use the recommender over time, they can be incorporated in the collaborative layer to identify latent factors and generate recommendations as well as predict missing data in existing interaction matrixc.