## SVD in Python

SciPy has a straightforward implementation of SVD to help us avoid all the complex steps of SVD. We can use the svds() function to decompose a matrix as shown below. We will use csc_matrix() to create a sparse matrix object.

In [1]:
from scipy.sparse import csc_matrix
from scipy.sparse.linalg import svds

# Create a sparse matrix 
A = csc_matrix([[1, 0, 0], [5, 0, 2], [0, 1, 0], [0, 0, 3], [4, 0, 9]], dtype=float)

# Apply SVD
u, s, vt = svds(A, k=2) # k is the number of stretching factors

print ('A:\n', A.toarray())
print ('=')
print ('\nU:\n', u)
print ('\nΣ:\n', s)
print ('\nV.T:\n', vt)

A:
 [[1. 0. 0.]
 [5. 0. 2.]
 [0. 1. 0.]
 [0. 0. 3.]
 [4. 0. 9.]]
=

U:
 [[-2.21829477e-01 -4.58445949e-02]
 [-8.50288016e-01 -3.86369035e-01]
 [ 1.76646871e-17  1.69188057e-18]
 [ 3.88289052e-01 -2.35719092e-01]
 [ 2.77549248e-01 -8.90535654e-01]]

Σ:
 [ 3.89366418 10.99269663]

V.T:
 [[-8.63729488e-01  6.87803594e-17  5.03955724e-01]
 [-5.03955724e-01  1.85983299e-17 -8.63729488e-01]]


Now we can recreate the original ratings matrix by multiplying the three factors of the matrix together. Let's look at the exact values and then the rounded values to get an idea of what our ratings should be.

In [2]:
import numpy as np
print('Approximation of Ratings Matrix')
u.dot(np.diag(s).dot(vt))

Approximation of Ratings Matrix


array([[ 1.00000000e+00, -6.87803594e-17, -3.33066907e-16],
       [ 5.00000000e+00, -3.06705137e-16,  2.00000000e+00],
       [-6.87803594e-17,  5.07663572e-33,  1.85983299e-17],
       [ 4.44089210e-16,  5.57949896e-17,  3.00000000e+00],
       [ 4.00000000e+00, -1.07736469e-16,  9.00000000e+00]])

In [3]:
print('Rounded Approximation of Ratings Matrix')
np.round(u.dot(np.diag(s).dot(vt)))

Rounded Approximation of Ratings Matrix


array([[ 1., -0., -0.],
       [ 5., -0.,  2.],
       [-0.,  0.,  0.],
       [ 0.,  0.,  3.],
       [ 4., -0.,  9.]])

As you can see, the matrix has now been almost recreated to the exact specifications of the original matrix. Out of the 12 user-item ratings, we have incorrectly rated one of them (Row 3, Column 2). SVD is not a perfect solution, but when we have enough users and items, we are able to gain valuable insights about the underlying relationships found in our data.

The example we've provided above demonstrates matrix factorization with SVD and relating this to a real-life problem, like recommending a movie or a song. Next, we will look at implementing a simple recommendation system in Python to further strengthen our intuition around this idea.



### Matrix Factorization with Alternating Least Squares

In [4]:
import numpy as np

# users X factors
P = np.array([[-0.63274434,  1.33686735, -1.55128517], 
              [-2.23813661,  0.5123861 ,  0.14087293], 
              [-1.0289794 ,  1.62052691,  0.21027516], 
              [-0.06422255,  1.62892864,  0.33350709]])

In [5]:
# factors X items
Q = np.array([[-2.09507374,  0.52351075,  0.01826269], 
              [-0.45078775, -0.07334991,  0.18731052], 
              [-0.34161766,  2.46215058, -0.18942263], 
              [-1.0925736 ,  1.04664756,  0.69963111], 
              [-0.78152923,  0.89189076, -1.47144019]])

In [6]:
# the original 
R = np.array([[2, np.nan, np.nan, 1, 4], 
              [5, 1, 2, np.nan, 2], 
              [3, np.nan, np.nan, 3, np.nan], 
              [1, np.nan, 4, 2, 1]])

In [7]:
print(P[2])

[-1.0289794   1.62052691  0.21027516]


In [8]:
print(Q.T[:,4])

[-0.78152923  0.89189076 -1.47144019]


In [9]:
P[2].dot(Q.T[:,4])

1.9401031341455333

Now we can do the calculation for the entire ratings matrix. You can see that the values in the predicted matrix are very close to the actual ratings for those that are present in the original rating array. The other values are new!

In [10]:
P.dot(Q.T)

array([[ 1.99717984, -0.10339773,  3.80157388,  1.00522135,  3.96947118],
       [ 4.95987359,  0.99772807,  1.9994742 ,  3.08017572,  1.99887552],
       [ 3.00799117,  0.38437256,  4.30166793,  2.96747131,  1.94010313],
       [ 0.99340337, -0.02806164,  3.96943336,  2.00841398,  1.01228247]])