# Collaborative based Recommender Systems

## Factorizations Models

**Matrix factorization is a simple embedding model, which decomposes the user-item intraction matrix**, $R \in R^{m\times n}$ matrix, where $m$ is the number of users and $n$ the number of items, into the product of two lower dimensionality rectangular matrice. The goal of the factorization models is to learn:
* A user embedding (or user latent factor) $P \in R^{m\times k}$, where row $i$ is the embedding of user $i$.
* A item embedding (or item latent factor) $Q \in R^{n\times k}$, where row $j$ is the embedding of user $j$.

![alt factorization models](https://miro.medium.com/max/988/1*nIVWl2ROaxOY23hHajkTTg.png)


There are different approaches to solve a recommender problem using factorization models:

### SVD Decomposition

Singular Value Decomposition (SVD) is a well established technique for **identifying latent semantic factors**. Done by **factorizing the user-item rating matrix**.

The singular value decomposition is a methods that decomposes a matrix into three other matrices as given below:
$$ R = USV^T$$

Where
* $R$ is a $m\times n$ rating matrix;
* $U$ is a $m\times k$ orthogonal left singular matrix, which **represents the relationship between users and latent factors** and it is known **user latent matrix**;
* $S$ is a $r\times r$ diagonal matrix, whcih describes the **strengh of each latent factor**, and;
* $V$ is a $n \times k$ orthogonal right singular matrix, which represents the **relationship between items and latent factors** and it is known **item latent matrix**.

Columns of U and V are constrained to be mutually orthogonal. 

Mutual orthogonality has the advantage that the concepts can be completely independent of one another. Can be interpreted in scatterplots

APPROACH: 
1. Initialization: Initialize the missing entries in the ith row of R to be the mean μi of that row to create Rf .
2. Iterative step 1: Perform rank-k SVD of Rf in the form QkΣkPkT 
3. Iterative step 2: Readjust only the (originally) missing entries of Rf to the corresponding values in QkΣkPkT . Go to iterative step 1

**Problem**: $R$ matrix needs to be complete in order to be decomposed
* Solution: fill missing values with the mean rating of the user

### The Vanilla Matrix Factorization Model 
* Also know as **Funk SVD**
* * Despite its name, in Funk SVD, no singular value decomposition is applied.
* * https://sifter.org/simon/journal/20061211.html

**A straightforward matrix factorization model maps both users and items to a joint latent factor space of dimensionality D. User-item interaction are modeled as inner products in that space**
$$R = UV$$

Each item j is associated with a vector $v_j$ from $V$, and each user $i$ is associated with a vecor $u_i$ from $U$.
The resulting dot product $u_i\cdot v_j$ captures the interaction between the user $i$ and item $j$:
$$ \hat{r} = u_i\cdot v_j$$

The goal of the matrix factorization consist on finding the mapping of each item and user to factors $u_i$ and $v_j$. To do so, the minimization the of squarred error function is performed:
$$ \sum(R_{ui} - u_i\cdot v_j)^2$$

This factorization can be learnt using **only those known ratings**. We do not need to infer missing values.

![alt Amazon](https://miro.medium.com/max/4800/1*b4M7o7W8bfRRxdMxtFoVBQ.png)

### The Vanilla Matrix Factorization Model with biases 

* Despite its name, in SVD, no singular value decomposition is applied.


Now the model is defined as:
$\hat{r}_{ui} = \bar{r} + b_{u_u} + b_{i_i}  \sum_{k = 1}^KP_{uk} Q_{ik}^T  $

To learn the model we can use the SGD as before. Now the latent factors and biases are updated as follows:
* $error = r -\hat{r}$
* $b_{u_u} = b_{u_u} + \alpha*(error -  \lambda*b_{u_u})$
* $b_{i_i} = b_{i_i} + \alpha*(error -  \lambda*b_{i_i})$
* $P_{uk} = P_{uk} + \alpha*(error*Q_{ik} -  \lambda*P_{uk})$
* $Q_{ik} = Q_{ik} + \alpha*(error*P_{uk} -  \lambda*Q_{ik})$

where $\alpha$ is the learning rate and $\lambda$ is the regularization term.

### SVD++

We are adding a vector which accounts for the importance of observing an item ($y_i$), learning how important it is to have seen an important item. If the user has seen a lot of movies but not much important, it would not be taken into account with that much weight. 


## Factorization Machines

Summary here

### Deep Factorization Machines