## Problem definition

Given observations of users past behavior, predict which other things that same user will like. 

### Input

- set of users
- set of items
- users's ratings for items they interacted with

### Outputs

- predicted user's ratings for items they haven't interacted with yet

## Common approaches

### Collaborative filtering

- based on premise that similar similar people like similar things
- core data is user-item matrix, also called utility matrix
- each number in matrix is representing user's rating of a given item
- goal is to fill in the blank spaces in a matrix

### Content-based filtering

- taking into consideration features of people (age, gender, spoken language, ...) and also features of items

## Recommender pipeline

### Pre-processing

- transforming user ratings into utility matrix
  - rows represents users
  - columns represents items
  - we can use `scipy.sparse.csr_matrix`
- sanity checks
  - problem of collaborative filtering - it doesn't perform well on too sparse matrix
    - it is important to calculate sparsity of our matrix - total number of ratings that we have divided by number of cells in matrix
    - if out data is too sparse (like 0.5%), collaborative filtering might not be the best option
  - it is also wort looking at how many new users and new items we have
    - collaborative filtering does not work for them
- normalization
  - we need to normalize ratings be accounting for user and item bias
    - optimists: rate everything 4 or 5
    - pessimists: rate everything 1 or 2
  - mean normalization
    - take average rating of an item and subtract that average from every rating of that item
    - the same for users - take an average user's rating and subtract it from all ratings that user gave

### Model picking

- matrix factorization is a commonly used technique in collaborative filtering
  - we start with user-item matrix and we factorize it into two latent factor matrices
  - we have a user factor matrix and item factor matrix
  - we don't actually know what each latent feature represents, but we could imagine that one feature could represent an user that likes indie scary movies from 90s
  - once we have these two latent feature matrices, we can reconstruct user-item matrix by taking the inner product of these two latent factor matrices
  - this reconstructed matrix will populate empty cells in out original matrix - our predicted ratings
- there are many algorithms for matrix factorization
  - examples
    - Alternating Least Squares (ALS)
    - Stochastic Gradient Descent (SGD)
    - Singular Value Decomposition (SVD)

### Evaluation metric picking

- what are we trying to optimize here?
- popular metrics for recommenders is `precision@k`
  - it looks at top `k` recommendations and calculates what proportion is relevant to an user

### Hyperparameter tuning

- hyperparameters are external property of a model
- we need to try all combinations and find out which gives us the best results
- let's say we are using ALS for matrix factorization
  - we need to find two hyperparameters
    - k (# of factors)
    - λ (regularization parameter)
  - goal is to find the hyperparameters that give us the best precision@k (or any other metric that we want to optimize)
- there are several ways to do that
  - Grid search
    - `sklearn.model_selection.GridSearchCV`
    - iterates over all combinations of provided values of hyperparameters
  - Random search
    - `sklearn.model_selection.RandomizedSearchCV`
    - evaluates randomly selected combinations of hyperparameters
    - this approach is less exhaustive and more effective than grid search
  - Sequential Model-based optimization
    - it takes into consideration results of previous iterations when picking values of hyperparameters to try in this iteration
    - `scikit-optimize`, `hyperopt`, `Metric Optimization Engine`

### Model training and Prediction

- let's say we tuned our hyperparameters and found ones that give us the best precision
- we can train our model with these optimal hyperparameters to get our predicted ratings
- we can use these results to generate our recommendations

### Post-processing

- we have to sort all predicted ratings and get top N
- we might wanna filter out items that a user already interacted with
- we can also generate item-item recommendations with this infrastructure
  - we can apply some similarity metric (e.g. cosine similarity) to get the most similar movies for a given movie
  - "Because you watched movie X, you might also like..."

### Evaluation

- the best way to evaluate any recommender is to test it in a wild
  - do A/B testing, usability testing, feedback from real users
- if that is not possible, we can do off-line evaluation
  - in traditional ML, we would split our dataset into two parts - some users would be in training and some in testing
  - this is not working for recommender models, because the model will not work if we train it on different user population than the validation set
  - for recommenders, we mask random interactions in our matrix 
  - we pretend that we don't know user's rating of some items so we can compare predicted rating with the actual rating
- metrics
  - Precision@K
    - of the top k recommendations, what proportion is actually relevant?
    - trying to minimize number of false positives - irrelevant items, that did get into top k recommendations
  - Recall@K
    - what proportion of real top k items got into top k recommendations
    - trying to minimize number of false negative - relevant items, that didn't get into top k recommendations

### Important considerations

- interpretability
- efficiency and scalability
- diversity
- serendipity
