## Latent content (concept) discovery and concept similarity mining

This can be used for example for recommender systems.

Based on https://www.youtube.com/watch?v=P5mlg91as1c.

Given a Matrix A of users to items(1 row per user, 1 col per item) factorize the matrix using SVD into three matrices :u,s,v, the result can be used to analize latent(hidden) concepts in data and relationships between users , items and concepts

* u = users to concept similarity matrix, for a user i how much the user is similar to concept j
* v = item to concept similarity matrix, for a movie j how much it is similar to concept i
* s = vector where i element represents the "strenght" of each concept

In matrices u and v , we use the magnitude (absolute value) of the elements to measure similarities t o concepts.

In [1]:
import numpy as np
import pandas as pd

In [2]:
A = np.array([
    [1,1,1,0,0],
    [3,3,3,0,0],
    [4,4,4,0,0],
    [5,5,5,0,0],
    [0,2,0,4,4],
    [0,0,0,5,5],
    [0,1,0,2,2]])

In [3]:
u,s,v = np.linalg.svd(A,full_matrices=True)

In [4]:
u.shape

(7, 7)

In [5]:
s.shape

(5,)

In [6]:
v.shape

(5, 5)

In [7]:
v.T

array([[-5.62258405e-01,  1.26641382e-01,  4.09667482e-01,
        -7.07106781e-01, -0.00000000e+00],
       [-5.92859901e-01, -2.87705846e-02, -8.04791520e-01,
         3.72941547e-16,  1.27687359e-16],
       [-5.62258405e-01,  1.26641382e-01,  4.09667482e-01,
         7.07106781e-01, -1.27687359e-16],
       [-9.01335372e-02, -6.95376220e-01,  9.12571001e-02,
        -2.84242227e-17,  7.07106781e-01],
       [-9.01335372e-02, -6.95376220e-01,  9.12571001e-02,
         2.70869285e-17, -7.07106781e-01]])

In [8]:
s

array([1.24810147e+01, 9.50861406e+00, 1.34555971e+00, 1.84716760e-16,
       9.74452038e-33])

Remove singular values with values very small(concepts with very low strenght) and corresponding users and items.

In [9]:
which_singular_values = np.where(s>1.5)[0]
s = s[which_singular_values]

In [10]:
u = u[:,which_singular_values]

In [11]:
v = v[which_singular_values,:]

### Analize users to concepts similarity matrix

In [12]:
pd.DataFrame(u)

Unnamed: 0,0,1
0,-0.137599,0.023611
1,-0.412797,0.070834
2,-0.550397,0.094446
3,-0.687996,0.118057
4,-0.152775,-0.591101
5,-0.072217,-0.731312
6,-0.076388,-0.29555


In [13]:
pd.DataFrame(np.abs(u))

Unnamed: 0,0,1
0,0.137599,0.023611
1,0.412797,0.070834
2,0.550397,0.094446
3,0.687996,0.118057
4,0.152775,0.591101
5,0.072217,0.731312
6,0.076388,0.29555


* Users 0,1,2,3 are very similar to concept 0
* Users 4,5,6 are very similar to concept 1

### Analize items to concepts similarity matrix

In [14]:
pd.DataFrame(v)

Unnamed: 0,0,1,2,3,4
0,-0.562258,-0.59286,-0.562258,-0.090134,-0.090134
1,0.126641,-0.028771,0.126641,-0.695376,-0.695376


In [15]:
pd.DataFrame(np.abs(v))

Unnamed: 0,0,1,2,3,4
0,0.562258,0.59286,0.562258,0.090134,0.090134
1,0.126641,0.028771,0.126641,0.695376,0.695376


* Items 0,1,2 seem are very similar to concept 0
* Items  seems very familiara to concept 1
* Item 4 seems very familiar to concept 1

### Analize concepts strenght

In [16]:
pd.DataFrame(np.diag(s))

Unnamed: 0,0,1
0,12.481015,0.0
1,0.0,9.508614


Concept 0 is the strongest ,concept 1 is weaker

### Analysis posible conclusion: 

The strongest concept is concept 0, and the users that are similar to it are users 0,1,2,3  an items 0,1,2 .

If A was a users to movies matrix ,for example if user 1 hasen't seen movie 2, we can do a recommendation.