A = W. H

Solution consists of two steps. First, we fix W and learn H, given A. Next, we fix H and learn W, given A of size M by N. We repeat this procedure iteratively. Fixing one variable and learning the other is popularly known as alternating least squares (ALS), as the problem is reduced to a least squares problem. However, we want to constraint W and H to be non-negative, we us non-linear LS (NNLS) instead of basic least squares.

From answer on SE: https://stackoverflow.com/questions/22767695/python-non-negative-matrix-factorization-that-handles-both-zeros-and-missing-dat

Read: https://towardsdatascience.com/prototyping-a-recommender-system-step-by-step-part-2-alternating-least-square-als-matrix-4a76c58714a1


Difference between ALS and Funk SVD: 

1. The objective function in ALS uses L2 norm regularization while Funk uses L1 norm regularization

Remember that L(p) norm is defined as: 

$$
|| x ||_p = \left(\sum_i |x_i|^p\right)^{1/p}
$$

ALS minimizes two loss functions alternatively; It first holds user matrix fixed and runs gradient descent with item matrix; then it holds item matrix fixed and runs gradient descent with user matrix. Its scalability: ALS can run its gradient descent in parallel across multiple partitions of the underlying training data. 

In [55]:
import numpy as np
import pandas as pd

M, N = 5, 5
np.random.seed(42)
A_orig = np.abs(np.random.uniform(low=0.0, high=1.0, size=(M,N)))
print (pd.DataFrame(A_orig).head())
#A_orig = np.empty((M,N,))
#A_orig[:] = np.nan
#A_orig[0][1] = 4.5
#A_orig[0][2] = 2.
#A_orig[1][0] = 2.
#A_orig[1][2] = 3.5
#A_orig[2][1] = 5.
#A_orig[2][3] = 2.
#A_orig[3][1] = 3.5
#A_orig[3][2] = 4.
#A_orig[3][3] = 1.

          0         1         2         3         4
0  0.374540  0.950714  0.731994  0.598658  0.156019
1  0.155995  0.058084  0.866176  0.601115  0.708073
2  0.020584  0.969910  0.832443  0.212339  0.181825
3  0.183405  0.304242  0.524756  0.431945  0.291229
4  0.611853  0.139494  0.292145  0.366362  0.456070


In [43]:
A = A_orig.copy()
A[0, 0] = np.NAN
A[3, 1] = np.NAN
A[2, 3] = np.NAN

A_df = pd.DataFrame(A)
print (A_df.head())

          0         1         2         3         4
0       NaN  0.950714  0.731994  0.598658  0.156019
1  0.155995  0.058084  0.866176  0.601115  0.708073
2  0.020584  0.969910  0.832443       NaN  0.181825
3  0.183405       NaN  0.524756  0.431945  0.291229
4  0.611853  0.139494  0.292145  0.366362  0.456070


In [44]:
K = 3
W = np.abs(np.random.uniform(low=0, high=1, size=(M, K)))
H = np.abs(np.random.uniform(low=0, high=1, size=(K, N)))
W = np.divide(W, K*W.max())
H = np.divide(H, K*H.max())

pd.DataFrame(W).head()

Unnamed: 0,0,1,2
0,0.27104,0.068927,0.177512
1,0.2045,0.016035,0.209723
2,0.058864,0.022456,0.327552
3,0.333333,0.279056,0.105152
4,0.033716,0.236195,0.151939


In [45]:
#def cost(A, W, H):
#    from numpy import linalg
#    WH = np.dot(W, H)
#    A_WH = A-WH
#    return linalg.norm(A_WH, 'fro')

We have to skip NaN values in A, so better use below. 

In [46]:
def cost1(A, W, H):
    from numpy import linalg
    mask = pd.DataFrame(A).notnull().values
    WH = np.dot(W, H)
    WH_mask = WH[mask] # WH_mask has MXN - number of NaN values. 
    A_mask = A[mask]
    A_WH_mask = A_mask-WH_mask
    # Since now A_WH_mask is a vector, we use L2 instead of Frobenius norm for matrix
    return linalg.norm(A_WH_mask, 2)

In [47]:
cost1(A, W, H)

2.1857796631331246

In [48]:
num_iter = 1000
num_display_cost = max(int(num_iter/10), 1)
from scipy.optimize import nnls

for i in range(num_iter):
    if i%2 ==0:
        # Learn H, given A and W
        for j in range(N):
            mask_rows = pd.Series(A[:,j]).notnull()
            H[:,j] = nnls(W[mask_rows], A[:,j][mask_rows])[0]
    else:
        for j in range(M):
            mask_rows = pd.Series(A[j,:]).notnull()
            W[j,:] = nnls(H.transpose()[mask_rows], A[j,:][mask_rows])[0]
    WH = np.dot(W, H)
    c = cost1(A, W, H)
    if i%num_display_cost==0:
        print (i, c)

0 0.9785197553425572
100 0.0016018689192724485
200 0.001601868919272392
300 0.001601868919272395
400 0.0016018689192724227
500 0.001601868919272443
600 0.0016018689192724516
700 0.0016018689192724056
800 0.0016018689192724349
900 0.0016018689192724568


In [49]:
A_pred = pd.DataFrame(np.dot(W, H))
A_pred.head()
#A_pred.shape

Unnamed: 0,0,1,2,3,4
0,0.103401,0.950562,0.731903,0.599071,0.155792
1,0.155935,0.058062,0.866038,0.601327,0.708077
2,0.020433,0.970057,0.832207,0.642881,0.182134
3,0.183797,0.419027,0.525591,0.430919,0.290993
4,0.611755,0.13952,0.291952,0.366546,0.456168


In [50]:
A_pred.values[~pd.DataFrame(A).notnull().values]

array([0.10340053, 0.64288115, 0.41902671])

In [51]:
A_orig[~pd.DataFrame(A).notnull().values]

array([0.37454012, 0.21233911, 0.30424224])

In [52]:
pd.DataFrame(W).head()

Unnamed: 0,0,1,2
0,0.036828,0.06490315,0.317083
1,0.743454,0.1215928,1.9e-05
2,0.126604,7.865314e-15,0.321262
3,0.187266,0.1366086,0.135312
4,0.104524,0.4748486,0.043953


In [53]:
pd.DataFrame(H).head()

Unnamed: 0,0,1,2,3,4
0,2.79271e-15,0.07802363,1.13766,0.734905,0.828767
1,1.282429,2.184242e-15,0.166134,0.451736,0.755985
2,0.06360111,2.98877,2.142096,1.711496,0.240329
