## Collaborative Filtering Model

Derive unknown ratings from similar items, discovered by **k-nearest neighbours** algorithm.

> $\displaystyle \hat{r}_{xi} = \frac{\sum_{j \in N(i,x)} s_{ij} \cdot r_{xj}}{\sum_{j \in N(i,x)} s_{ij}}$

$s_{ij}$ is the **similarity** of items $i$ and $j$.

$r_{xj}$ is the **rating** of user $x$ on item $j$.

$N(i,x)$ is the set of **items similar** to item $i$ that were **rated** by user $x$.

### Root Mean Squared Error (RMSE)

> $\displaystyle RMSE = \sqrt{\frac{\sum_{(i,x)}(\hat{r}_{xi} - r_{xi})^2}{|R|}}$

$R$ is the ratings matrix, $R \in \mathbb{R}^{m \times n}$.

$\hat{r}_{xi}$ is a **Predicted** rating of user $x$ on item $i$.

$r_{xi}$ is a **True** rating of user $x$ on item $i$.

$|R|$ is the total number of ratings.

### Item-Based Cosine Similarity

> $\displaystyle s_{ij} = \frac{r_i \cdot r_j}{\|r_i\|_2 * \|r_j\|_2}$

$r_i$ is the vector of ratings of item $i$.

$r_j$ is the vector of ratings of item $j$.

### Item-Based Adjusted Cosine Similarity

Takes into account the difference in rating scale between different users.

> $\displaystyle s_{i,j} = \frac{\sum_{x} (r_{xi} - \bar{r}_x) (r_{xj} - \bar{r}_x)}
{\sqrt{\sum_{x}(r_{xi} - \bar{r}_x)^2} \sqrt{\sum_{x}(r_{xj} - \bar{r}_x)^2}}$

$r_{xi}$ is a rating of user $x$ on item $i$.

$r_{xj}$ is a rating of user $x$ on item $j$.

$\bar{r}_{x}$ is the mean of all the rating of user $x$.

### Including Global Effects

We get better estimates if we model deviations.

> $\displaystyle \hat{r}_{xi} = b_{xi} + \frac{\sum_{j \in N(i,x)} s_{ij} \cdot (r_{xj} - b_{xj})}{\sum_{j \in N(i,x)} s_{ij}}$

Where we use the **baseline** $b_{xi}$.

$b_{xi} = \mu + b_x + b_i$

$\mu$ is the **overall mean rating**.

$b_x$ is the **rating deviation** of user $x$:
> $b_x$ = (average rating of user $x$) - $\mu$

$b_i$ is the **rating deviation** of item $i$:
> $b_i$ = (average rating of item $i$) - $\mu$

## Latent Factor Model

Let's assume we can approximate the rating matrix $R$ as a product of "thin" $Q \cdot P^{\top}$.

> $R \approx Q \cdot P^{\top}$
>
> $\displaystyle \hat{r}_{xi} = q_i \cdot p^{\top}_x$

$q_i$ is the row $i$ of $q$.

$p^{\top}_x$ is the column $x$ of $p^{\top}$.

### Summed Squared Error (SSE)

Our goal is to find $P$ and $Q$ such as:

> $\displaystyle min_{P,Q} \sum_{(i,x) \in R} (r_{xi} - q_i \cdot p^{\top}_x)^2$

### Summed Squared Error with Regularization

> $\displaystyle min_{P,Q} \sum_{(i,x) \in R} (r_{xi} - q_i \cdot p^{\top}_x)^2 + \lambda\big(\sum_{x}\|p_{x}\|^2 + \sum_{i}\|q_{i}\|^2\big)$

### Gradient Descent

- Initialize $P$ and $Q$ (using **SVD**, pretend missing ratings are $0$).
- Do gradient descent:
    - $P \leftarrow P - \alpha \cdot \nabla P$
    > $\displaystyle \nabla Q = [\nabla q_{ik}]$
    >
    > $\displaystyle \nabla q_{ik} = \sum_{(i,x) \in R} -2(r_{xi} - q_{i} \cdot p^{\top}_{x})p_{xk} + 2\lambda q_{ik}$
    >
    > $q_{ik}$ is entry $k$ of row $q_i$ of matrix $Q$.
    - $Q \leftarrow Q - \alpha \cdot \nabla Q$
    > $\displaystyle \nabla P = [\nabla p_{ik}]$
    >
    > $\displaystyle \nabla p_{xk} = \sum_{(i,x) \in R} -2(r_{xi} - q_{i} \cdot p^{\top}_{x})q_{ik} + 2\lambda p_{xk}$
    >
    > $p_{xk}$ is entry $k$ of row $p_x$ of matrix $P$.

### Including Global Effects

> $\hat{r}_{xi} = \mu + b_x + b_i + q_i \cdot p^{\top}_{x}$

$\mu$ is the **overall mean rating**.

$b_x$ is the **rating deviation** of user $x$:
> $b_x$ = (average rating of user $x$) - $\mu$

$b_i$ is the **rating deviation** of item $i$:
> $b_i$ = (average rating of item $i$) - $\mu$

### Stochastic Gradient Descent

Solve:

> $\displaystyle min_{Q,P} \sum_{(x,i)\in R} \big(r_{xi} - (\mu + b_x + b_i +q_i \cdot p^{\top}_x)\big)^2
+ \lambda\big(\sum_i \|q_i\|^2 + \sum_x \|p_x\|^2 + \sum_x b_x^2 + \sum_i b_i^2 \big)$