# Matrix Factorization

In this task you are supposed to (manually) implement the matrix factorization variant you learned in the Data Cleaning chapter using the `numpy` library.

In [1]:
import numpy as np

We continue the scenario from the tutorials.

Assume that you have a ginormous database $D$ of three users and three movies and ratings provided by some users to some movies, which we represent as a matrix, where the entry $D_{ij}$ represents the rating user $i$ gave to movie $j$.
Since not all users have rated movies, and the rating ranges from 1 to 5, we encode missing ratings as 0.

In [6]:
# missing values encoded as 0
D = [
     [3,1,0],
     [1,0,3],
     [0,3,5],
    ]
D = np.array(D)

N = len(D)
M = len(D[0])
print(M)

3


First, randomly initialize the two factors $E$ and $A$ for $f=2$ latent features. For evaluating the correctness of your results from the tutorial, you may *additionally* provide hard-coded inital factors as they have been provided in the tutorial.

In [13]:
# number of latent features
f = 2

# TODO your code goes here
np.random.seed() 
E = np.random.rand(N, f)
A = np.random.rand(M, f)


Implement a function that takes the data matrix $D$, the inital factors $E, A$, the number of epochs (iterations), the learning rate $\eta$, and performs the factorization of $D$. Use a default number of 5000 for the epochs and 0.001 for $\eta$.

Updates to $E$ and $A$ are applied immediately. $\tilde{D}$ is updated after an entry from D was completely dealt with. Update ordered by latent features and E before A.

In [14]:
# TODO your code goes here
def matrix_factorization(D, E, A, epochs=5000, learning_rate=0.001):
    for epoch in range(epochs):
        for i in range(N):
            for j in range(M):
                if D[i][j] > 0:
                    error_ij = D[i][j] - np.dot(E[i, :], A[j, :])
                    for k in range(f):
                        E[i][k] += learning_rate * (2 * error_ij * A[j][k])
                        A[j][k] += learning_rate * (2 * error_ij * E[i][k])
    
    return E, A

Now test your matrix factorization for the parameters sepcified above.

In [15]:
# TODO your code goes here
E, A = matrix_factorization(D, E, A)
print("Original Matrix D:")
print(D)
print("\nFactorized Matrix E:")
print(E)
print("\nFactorized Matrix A:")
print(A)
print("\nReconstructed Matrix D_hat:")
D_hat = np.dot(E, A.T)
print(D_hat) 

Original Matrix D:
[[3 1 0]
 [1 0 3]
 [0 3 5]]

Factorized Matrix E:
[[2.06426282 0.51723042]
 [0.4818558  1.11303798]
 [0.84938907 1.80788062]]

Factorized Matrix A:
[[1.37759699 0.30210685]
 [0.07781237 1.62283804]
 [1.90686582 1.86978015]]

Reconstructed Matrix D_hat:
[[2.99998109 1.00000637 4.90337938]
 [1.00005949 1.84377472 2.99997066]
 [1.71628894 2.99999043 5.00001028]]
