# Recommender Systems


## When feature vector for items is avaialble
- Examples of feature vectors for movies are genres, tags, actors, length etc.
- Examples of feature vector for food are cuisine, hotness, texture etc.

$n_u$ = number of users

$m$ = number of items

$y(i,j) = 1$ if user $n_j$ has rated item $i$ else 0.

$W^{(j)}, b^{(j)}$ = parameters for user $j$

$x^{(i)}$ = feature vector for item $i$

Rating for $u_j = W \cdot X + b $

$J(w^{(j)}, b^{(j)}) = \frac{1}{2} \sum_{i:r(i,j)=1} [w^{(j)} \cdot x^{(j)} + b^{(j)} - y^{(i,j)}]^2 + \frac{\lambda}{2} \sum_{k=1}^{n_u}(w_k^{(j)})^2$

Apply gradient descent for any other optimisation algorithm to learn $w$, $b$ for each user.

## Collaborative Filtering

Cost function = cost + regularization of $x_k$ + regularization of $w_k$

$J(w,x,b) = \frac{1}{2} \sum_{i:r(i,j)=1} [w^{(j)} \cdot x^{(j)} + b^{(j)} - y^{(i,j)}]^2 + \frac{\lambda}{2} \sum_{i=1}^{n_u} \sum_{k=1}^{n}(w_k^{(j)})^2 + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n}(x_k^{(j)})^2$

### Binary labels: favs, likes and clicks

Logistic function, 

where $f(w,b,x) = g(w^{(j)} \cdot x^{(j)} + b^{(j)})$

Loss for a single example,

$L(f(w,x,b), y^{(i,j)})= - y^{(i,j)} \log_2({f(w,x,b)}) - (1-y^{(i,j)})\log_2(1-f(w,x,b))  $

Therfore total loss $J$,

$J(w,x,b) = \sum_{i,j:r(i,j)=1} L(f(w,x,b), y^{(i,j)})$

## Practical Tips

### Mean normalization

|M/U|User 1|User 2 | User 3| User 4|
|---|------|-------|-------|-------|
|Movie 1| 4.5 | 3.5 | 2.0 | ? |
|Movie 2| 1.5 | 2.0 | 2.0 | ? |
|Movie 3| 0 | 5 | 2.0 | ? |
|Movie 4| ? | ? | ? | ? |


For a new user, normalize the rows of the ratings matrix with mean of each row.

- Row_new = Row - $\mu$

For a new movie that no user has rated yet, normalize the columns of the ratings matrix with mean of each column.

- Column_new = Columns - $\mu$



### Finding Related Items

Feature vectors for item 1, 2 and 3 are are $x^{(1)}, x^{(2)}, x^{(3)} $, respectively.

In order to find other $k$ item related to it, we need to find $x^{(k)}$ similar to $x^{(i)}$.

Therefore we would find item with smallest distance,

$\sum_{l=1}^{n}(x_l^{(k)} - x_l^{(i)})^2 $

### Limitations
#### Cold Start Problem
- Ranking a new item that very few users have rated
- show somthing reasonable to new users who have rated few items?

Use other information about items/user:
- demographics (age, gender, location), expressed prferences e.g. Web Browser
- items: genre, actors, studio, director

## Deep-learning approach for Content Based Filtering
- Recommend items based on features of user and item to find a match.

$x_u^{(j)}$ = User feature matrix

$x_m^{(i)}$ = Item feature vector

$V_u$ = User feature **vector** derived from $x_u^{(j)}$

$V_m$ = Item feature **vector** derived from $x_m^{(i)}$

Combine the two NNs into single,

![](figures/NN_recommender.png)


Cost function is,

$J = \sum_{(i,j):r(i,j)=1} (v_u^{(j)} \cdot v_m^{(i)} - y^{(i,j)})^2 + \text{NN Regularization}$

### Recommendation from a Large Catalog

#### Retrieval
1. Generate a large list of plausible item candidates.
    - For each of the last 10 movies watched, find 10 most similar movies -> 100 movies list
    - For most viewed 3 genres, find the top 10 movies in each --> 30 movies
    - Top 20 movies in the country
2. Combine retrieved items into a list, remove duplicates and items already watched/purchased.

**Trade-offs** : Retrieving more items is better performance but slower. Carry out offline experiments with different number of retrieved items.

#### Ranking
1. Rank the list using the trained model. 
    - Use $V_u$ for the user and use each movies $v_m$
2. Display ranked items to the user.

