# Alternating Least Square Implementation

Matrix factorization by alternating least squares.

In [1]:
import numpy as np
import pandas as pd

## Intuition

First build some intuition by manually executing the 2 iterations

In [2]:
df = pd.read_csv("../data/critics/critics.csv")
df.head()

Unnamed: 0,User,Movie,Rating
0,Lisa Rose,Lady in the Water,2.5
1,Lisa Rose,Snakes on a Plane,3.5
2,Lisa Rose,Just My Luck,3.0
3,Lisa Rose,Superman Returns,3.5
4,Lisa Rose,"You, Me and Dupree",2.5


In [3]:
user_product_matrix = df.pivot(index="User", columns="Movie", values="Rating").to_numpy()
user_product_matrix

array([[3. , nan, 3.5, 4. , 4.5, 2.5],
       [1.5, 3. , 3.5, 5. , 3. , 3.5],
       [nan, 3. , 4. , 5. , 3. , 3.5],
       [3. , 2.5, 3.5, 3.5, 3. , 2.5],
       [nan, 2.5, 3. , 3.5, 4. , nan],
       [2. , 3. , 4. , 3. , 3. , 2. ],
       [nan, nan, 4.5, 4. , nan, 1. ]])

We will try to factorize the matrix above with 3 latent factors.

### 1. Initialize User Matrix

In [4]:
# initialize user matrix

u_init = np.random.normal(0, 1/np.sqrt(3), size = (user_product_matrix.shape[0], 3))
u_init

array([[-0.030671  ,  0.06342017, -0.13078589],
       [-1.33277787,  0.34912523, -0.26002749],
       [-0.18881152, -0.61877083, -1.18733841],
       [ 0.7716916 , -0.12645649,  1.27510558],
       [ 0.03103979,  0.30318645, -0.08314291],
       [-0.53745464,  0.51625971,  0.93776223],
       [-0.44025033, -0.51312388, -0.46529026]])

In [5]:
def calculate_ols_coefficients(X, y, l=0):
    X_2 = X.T @ X
    X_y = X.T @ y
    l_i = l * np.eye(X_2.shape[0])

    coeff = np.linalg.inv(X_2 + l_i) @ X_y
    return coeff

### 2. Calculate product matrix v with initialized user matrix

In [6]:
v = []

for j in range(user_product_matrix.shape[1]):
    dataset = np.hstack((u_init, np.expand_dims(user_product_matrix[:, j], axis = 1)))
    dataset = dataset[~np.isnan(dataset).any(axis = 1)]
    X = dataset[:, :-1]
    y = dataset[:, -1]
    coefficients = calculate_ols_coefficients(X, y, 0.1)
    v.append(coefficients)

v = np.array(v)
v


array([[-0.72561429,  0.60546413,  2.00207213],
       [-2.61402277, -2.16414477,  1.52306013],
       [-3.95924838, -4.34043851,  2.27087587],
       [-4.12350373, -3.97454506,  1.66673098],
       [-1.71926379, -0.12470231,  0.80373967],
       [-2.93567944, -3.46958596,  1.63495846]])

### 3. (2nd iteration) calculate user matrix u with estimated v matrix in previous iteration

In [7]:
user_product_matrix_T = user_product_matrix.T
u = []

for j in range(0, user_product_matrix_T.shape[1]):
    dataset = np.hstack((v, np.expand_dims(user_product_matrix_T[:, j], axis = 1)))
    dataset = dataset[~np.isnan(dataset).any(axis = 1)]
    X = dataset[:, :-1]
    y = dataset[:, -1]
    coefficients = calculate_ols_coefficients(X, y, 0.1)
    u.append(coefficients)

u = np.array(u)
u

array([[-2.33057517,  1.45832795,  0.27647832],
       [-1.76604783,  0.59646196, -0.11455789],
       [-1.74584558,  0.4769962 , -0.22899681],
       [-1.29034272,  0.78224461,  0.77336061],
       [-2.04994469,  1.40212074,  0.36592875],
       [-1.38118039,  0.6993728 ,  0.41881665],
       [-2.00357272,  1.53034833,  1.02942299]])

### 3. (2nd iteration) calculate product matrix v with estimated u matrix in 1st half of iteration

In [8]:
v = []

for j in range(0, user_product_matrix.shape[1]):
    dataset = np.hstack((u, np.expand_dims(user_product_matrix[:, j], axis = 1)))
    dataset = dataset[~np.isnan(dataset).any(axis = 1)]
    X = dataset[:, :-1]
    y = dataset[:, -1]
    coefficients = calculate_ols_coefficients(X, y, 0.1)
    v.append(coefficients)

v = np.array(v)
v


array([[-0.75399798,  0.58568827,  1.63172586],
       [-2.09919028, -1.17161865,  0.83809726],
       [-2.8791341 , -2.23263774,  1.89029601],
       [-3.38918636, -2.41195043,  0.66926709],
       [-1.67858168,  0.28350189,  0.70787885],
       [-2.61054153, -2.35131911,  0.01223478]])

Alternating least squares algorithm repeats this until the values of U and V matrices converge.

As you see below, u and v_T are then multiplied to reconstruct the user product rating matrix.

In [9]:
pred = u @ v.T
print(pred)
(pred).shape

[[3.06251138 3.41543223 3.97674629 4.56637668 4.52121266 2.65845155]
 [1.4940102  2.91243381 3.53645671 4.47015868 3.05246051 3.2064672 ]
 [1.22207512 2.91408283 3.52869208 4.61324482 2.90367174 3.4332304 ]
 [2.69297982 2.44033393 3.43048137 3.00406152 2.9351591  1.53864846]
 [2.96395525 2.96715704 3.46335164 3.81070292 4.0975367  2.05910953]
 [2.13441564 2.43097132 3.20684489 3.2745254  2.81316906 1.96630427]
 [4.08673297 3.27565231 4.29775527 3.78831593 4.52572385 1.64466729]]


(7, 6)

In [10]:
print(user_product_matrix)
print(user_product_matrix.shape)

[[3.  nan 3.5 4.  4.5 2.5]
 [1.5 3.  3.5 5.  3.  3.5]
 [nan 3.  4.  5.  3.  3.5]
 [3.  2.5 3.5 3.5 3.  2.5]
 [nan 2.5 3.  3.5 4.  nan]
 [2.  3.  4.  3.  3.  2. ]
 [nan nan 4.5 4.  nan 1. ]]
(7, 6)


With this intuition, the algorithm is replicated ideally until there is a convergence of a training error. Many loss functions can be used for the training. However, for starters and small use cases, using RMSE (calculated where corresponding entry exist in both predicted and actual matrices) is suffice.

In [11]:
# RMSE
float(np.sqrt(np.nanmean((user_product_matrix - pred) ** 2)))

0.36834391153076623

# Alternating least square algorithm

Now let's look at how different hyper paramters affect the model.

In [12]:
import sys
sys.path.append("../src")

from matrix_factorization.alternating_least_sqaures import ALS

In [13]:
ratings = pd.read_csv("../data/movie_lens/rating.csv", nrows=900000)

In [14]:
%%time

als = ALS(
    n_features = 10,
    user_column_header = "userId",
    item_column_header = "movieId",
    rating_column_header = "rating",
    max_iter = 20
)

als.fit(rating_matrix = ratings)

INFO: Initializing user matrix
INFO: Start training
INFO: iteration 1: RMSE = 0.7914422845489588
INFO: iteration 2: RMSE = 0.7470814988463755
INFO: iteration 3: RMSE = 0.7213361787157171
INFO: iteration 4: RMSE = 0.7055223768419965
INFO: iteration 5: RMSE = 0.6960140082448351
INFO: iteration 6: RMSE = 0.6897512150493115
INFO: iteration 7: RMSE = 0.6854062496083185
INFO: iteration 8: RMSE = 0.6823041863302646
INFO: iteration 9: RMSE = 0.6800200253465999
INFO: iteration 10: RMSE = 0.6782983571124401
INFO: iteration 11: RMSE = 0.6769691805600915
INFO: iteration 12: RMSE = 0.6759085783936304
INFO: iteration 13: RMSE = 0.675037536906905
INFO: iteration 14: RMSE = 0.6743090542867793
INFO: iteration 15: RMSE = 0.6736923899440486
INFO: iteration 16: RMSE = 0.6731659614502936
INFO: iteration 17: RMSE = 0.672713303550274
INFO: iteration 18: RMSE = 0.6723197808949207
INFO: iteration 19: RMSE = 0.6719746692630768
INFO: iteration 20: RMSE = 0.6716701103429061


CPU times: user 1min 30s, sys: 1.67 s, total: 1min 32s
Wall time: 1min 24s


In [15]:
als.U.shape

(6034, 10)

In [16]:
als.V.shape

(13771, 10)

In [17]:
als.predict_rating(10, 145)

4.4832490863467305

In [18]:
als.R[10, 145]

np.float64(5.0)

Lets try increasing the count of latent factor.

In [19]:
%%time

als = ALS(
    n_features = 100,
    user_column_header = "userId",
    item_column_header = "movieId",
    rating_column_header = "rating",
    max_iter = 20
)

als.fit(rating_matrix = ratings)

INFO: Initializing user matrix
INFO: Start training
INFO: iteration 1: RMSE = 0.4657379873640672
INFO: iteration 2: RMSE = 0.3691381896381516
INFO: iteration 3: RMSE = 0.33044264919327543
INFO: iteration 4: RMSE = 0.3075582068206599
INFO: iteration 5: RMSE = 0.29177864469481046
INFO: iteration 6: RMSE = 0.2799764446680789
INFO: iteration 7: RMSE = 0.2707180068893774
INFO: iteration 8: RMSE = 0.26321494362152126
INFO: iteration 9: RMSE = 0.25699035434038026
INFO: iteration 10: RMSE = 0.25173804858930665
INFO: iteration 11: RMSE = 0.2472437239176054
INFO: iteration 12: RMSE = 0.2433480981608739
INFO: iteration 13: RMSE = 0.23993006662776525
INFO: iteration 14: RMSE = 0.23689801137584884
INFO: iteration 15: RMSE = 0.23418332819873877
INFO: iteration 16: RMSE = 0.23173405353633683
INFO: iteration 17: RMSE = 0.22951002118565395
INFO: iteration 18: RMSE = 0.22747938875295645
INFO: iteration 19: RMSE = 0.22561632100793894
INFO: iteration 20: RMSE = 0.22389963122138265


CPU times: user 17min 9s, sys: 2.16 s, total: 17min 11s
Wall time: 4min 24s


In [20]:
als.U.shape

(6034, 100)

In [21]:
als.V.shape

(13771, 100)

In [22]:
als.predict_rating(10, 145)

5.133698158054369

In [23]:
als.R[10, 145]

np.float64(5.0)

We see here that more latent factors lead to less training errors; at the expense of training and inference time. In situations where large amounts of latent factors are called, stochastic gradient descent should be used to estimate the entries of the factor matrices.