# The singular value decomposition (SVD)

Pros: Simplifies data, removes noise, may improve algorithm results.

Cons: Transformed data may be difficult to understand.

Works with: Numeric values

The method that uses SVD is called latent semantic indexing (LSI) or
latent semantic analysis.

In LSI, a matrix is constructed of documents and words. When the SVD is done on
this matrix, it creates a set of singular values. The singular values represent concepts
or topics contained in the documents.

Use case are recommendation systems, efficient searching of documents

In late 2006 the movie company Netflix held a contest that awarded $1M to anyone
who would provide 10% better recommendations than the state of the art. The
winning team used the SVD in their solution

The SVD is a type of matrix factorization, which will break down our data matrix
into separate parts.



## SVD in Python

In [1]:
from numpy import *

In [2]:
U,Sigma,VT=linalg.svd([[1, 1],[7, 7]])

In [3]:
U

array([[-0.14142136, -0.98994949],
       [-0.98994949,  0.14142136]])

Sigma is the diagnoal elements sorted from largest to smallest.

In [5]:
#returned as a row vector and not a matrix
Sigma

array([  1.00000000e+01,   2.82797782e-16])

In [6]:
VT

array([[-0.70710678, -0.70710678],
       [ 0.70710678, -0.70710678]])

In [7]:
from numpy import linalg as la

def loadExData():
    return[[0, 0, 0, 2, 2],
           [0, 0, 0, 3, 3],
           [0, 0, 0, 1, 1],
           [1, 1, 1, 0, 0],
           [2, 2, 2, 0, 0],
           [5, 5, 5, 0, 0],
           [1, 1, 1, 0, 0]]

In [8]:
Data=loadExData()

In [9]:
U,Sigma,VT=linalg.svd(Data)

In [10]:
Sigma

array([  9.64365076e+00,   5.29150262e+00,   6.51609210e-16,
         2.14818942e-16,   5.18511491e-17])

In [11]:
def ecludSim(inA,inB):
    return 1.0/(1.0 + la.norm(inA - inB))

def pearsSim(inA,inB):
    if len(inA) < 3 : return 1.0
    return 0.5+0.5*corrcoef(inA, inB, rowvar = 0)[0][1]

def cosSim(inA,inB):
    num = float(inA.T*inB)
    denom = la.norm(inA)*la.norm(inB)
    return 0.5+0.5*(num/denom)

In [12]:
myMat=mat(loadExData())

In [13]:
ecludSim(myMat[:,0],myMat[:,4])

0.12973190755680383

In [14]:
ecludSim(myMat[:,0],myMat[:,0])

1.0

In [15]:
cosSim(myMat[:,0],myMat[:,4])

0.5

All of these metrics assumed the data was in column vectors.