# Collaborative Filtering Intro

General Idea: rank items based on their score

## Average Rating

Basic approach for computing item scores is taking the average. Sum up all of the ratings for item j and divide by the total number of ratings.

$$s(j) = \text{score for item j} = \frac{\sum_{i \in \Omega_j}{r_{ij}}}{\mid \Omega_j \mid}$$

$$\Omega_j = \text{set of all users that have rated item j}$$

$$r_{ij} = \text{rating user i gave to item j}$$

But this isn't personalized. We can do better!

## Personalizing Item Scores

Item score should depend on both user *i* and item *j*.

*i'* is just in index for iterating over the ratings for item *j*. 

$$s(i,j) = \frac{\sum_{i' \in \Omega_j}{r_{i'j}}}{\mid \Omega_j \mid}$$

$$i = 1 \dots N, N = \text{number of users}$$

$$j = 1 \dots M, M = \text{number of items}$$

$$R_{N \times M} = \text{user-item ratings matrix}$$

![alt text](images/user-item-matrix.png)

The user-item matrix is very similar to the term-document matrix from NLP.

Many of the techniques used in collaborative filtering are analogous to NLP techniques, e.g. Matrix Factorization and SVD.

### Sparsity

Many of the entires in R are empty because users only rate or interact with a small subset of items.

This is actually good. If users have already tried all items, then we would have nothing to recommend them!

### Goal of Collaborative Filtering

To predict scores for items not yet seen by a user.

$$s(i,j) = \hat{r}(i,j) = \text{predicted score for user i and item j}$$

We can then suggest the items not yet seen with highest predicted score.

### Evaluating Models

Since scores are continuous, we will be using regression techniques.

Mean-squared error (MSE) is popular error metric for regression.

$$MSE = \frac{1}{\mid \Omega \mid} \sum_{i,j \in \Omega}{(r_{ij} - \hat{r}_{ij})^2}$$

$$\Omega = \text{set of (i,j) pairs where user i has rated item j}$$