# Netflix

## Scope

- Give a user and context (time, location, etc) predict probability of engagement for each movie, and order movies.
- Will use implicit feedback (user watched the movie or not) rathen explicit feedback (user rated the movie) to gather large training data.

## Metric

### Online
- Engagement rate: (user clicked a movie / total number of sessions)
- Videos watched: count videos user watch at least for some time.
- Session watch time: overall time that user spent watching movies based on recommendation in a session.

### Offline
- Mean Average Precision (mAP @ N)
    - $AP@N = \dfrac{1}{n}\displaystyle\sum_{k=1}^{N}P(k)rel(k)$
    - $P(k)$ = precision up to $k$
    - Precision = number of relevant recommendations / total number of recommendations
    - rel(k) = whether $k^{th}$ item is relevant or not
    - N = length of recommendation list
    - m = number of movies relevant to user based on historical data
    - Measures how system performs overall.

- Mean Average Recall (mAR @ N)
    - Recall = number of relevant recommendations / number of all movies
    - Measures how many top recommendation (based on historical data) that system can put in the recommendation list.
    
- F1 score = 2 * (mAP*mAR) / (mAP+mAR)

## Architecture

<img src="img/recommendation-system-a1.png" style="width:800px;height:600px;">

### Candidate generation

- Select top $k$ movies to recommend to user.
- Focuses on recall.

### Collaborative filtering
- Find simialr users to active user based on historical watches.
- User and media profiles do not require domain knowledge.
- Has cold start problem.
    - It is hard to find similar users to current user who had historical interactions.
    - Cannot recommend new movies because they do not have user feedback yet.

<img src="img/recommendation-system-a3.png" style="width:300px;height:300px;">

#### 1. Nearest neighborhood
- Computationally expensive.

<img src="img/recommendation-system-a4.png" style="width:500px;height:300px;">

- Consider $n$ by $m$ matrix of user $n_{i}$ and movie $m_{j}$
- 1: user watched the movie.
- 0: user ignored the movie.
- empty: no impression yet.

<img src="img/recommendation-system-a5.png" style="width:500px;height:300px;">

- Task is to predict the feedback for movies that users haven't watched.
- Compute (for example) cosine similarity between user $i$ and other users, and select top $k$ similar users. (Nearest neighbors)
- Then, take weighted average of feedback from top $k$ similar users for movie $j$.

#### 2. Matrix factorization

- Use latent vector $M$ such that
    - User profile matrix $n$ by $M$.
    - Media profile matrix $M$ by $m$. 
- Latent vector $M$ can be considered as features of users or movies.
- Initialize user and movie vectors randomly. 
- For each known feedback value $f_{ij}$, predict feedback by taking dot product between user profile vertor $n_{i}$ and movie profile vector $m_{j}$. 
- Difference betweeen actual and predicted will be the error.
    - $e_{ij} = f_{ij} - n_{i} \cdot m_{j}$
- Use stochastic gradient descent to update user and movie latent vectors.

<img src="img/recommendation-system-a6.png" style="width:500px;height:300px;">

### Content-based filtering
- Make recommendations based on content of media that user had already interacted with.
- User and media profiles require some domain knowledge. (Can get this by asking user preference when they sign up)
- Does not have cold start problem.

#### Two options for recommending media to user (Given TD-IDF representation for each movie)
1. Similarity with historical interactions.
    - Recommend movies similar to movies that user watched in the past.
    - Compute by taking dot product between movies.
2. Similarity between media and user profiles.

### Embedding-based similarity
- Use deep learning to generate latent vectors/embeddings to represent both movies and users.
- Then, use KNN to find movies to recommend.
- Has cold start problem.
    - If any one of user or movie is new, then fewer feedbacks are available.
    - In other words, there is lack of training example to update user and movie embedding vectors.

### Ranking

- Probability of user watching a media.
- Focuses on precision.

### Logistic regression
- When training data is limited.

<img src="img/recommendation-system-a7.png" style="width:500px;height:300px;">

### Deep learning
- When 100M data is available.

#### Two sparse features to consider
1. Videos user watched in the past.
2. User's search terms.

#### How to feed these into network
<img src="img/recommendation-system-a8.png" style="width:1000px;height:700px;">

#### Start with 2-3 hidden layers with RELU.

<img src="img/recommendation-system-a9.png" style="width:1000px;height:700px;">

## Feature engineering

<img src="img/recommendation-system-a2.png" style="width:1000px;height:200px;">

### User
- age
- gender
- language
- country
- average_session_time
- last_genre_watched
- user_actor_histogram: histogram showing historical interaction between users and actors in movies.
- user_genre_histogram
- user_language_histogram

### Context
- season_of_the_year
- upcoming_holiday
- days_to_upcoming_holiday
- time_of_day
- day_of_week
- device

### Media
- public-platform-rating
- revenue
- time_passed_since_release_date
- time_on_platform
- media_watch_history
- genre
- movie_duration
- content_set_time_period
- content_tags
- show_season_number
- country_of_origin
- release_country
- release_year
- release_type
- maturity_rating

### User-media
- user_genre_historical_interaction_3months
- user_genre_historical_interaction_1year
- user_and_movie_embedding_similarity
- user_actor
- user_director
- user_language_match
- user_age_match

### Sparse feature
- movie_id
- title_of_media
- synopsis
- original_title
- distributor
- creator
- original_language
- director
- first_release_year
- music_composer
- actors

## Training data generation

- User watched 80% or more of the movie? positive example
- User watched 10% or less of the movie? negative example
- Between 10% and 80%? uncertain example
- Make sure to downsample over-represented class.

# Youtube

## Problem
- Recommend Youtube vidoes.

## Metric

### Offline
- Prevision, recall, log loss.

### Online
- Click through rate, watch time.

### Training
- Many times a day.

### Inference
- Latency between 100ms and 200ms.

## Model

### Candidate generation
- Inverted index, FAISS, ScaNN

### Ranking

## Estimation

### Assume
- 1.3B users.
- 150B video views per month.
- 15B out of 150B are watched based on recommendations.
- User sees 100 recommendations.
- User watches 2 videos out of 100 recommendations.
- If user does not watch recommended video in 10 minutes, then it is failed recommendation.

### Assume data size
- 15B positive and 750B negative examples per month.
- Each data point comes with hundreds of features.

### Assume bandwidth
- Generate recommendation for 10M users every second.

## Design

<img src="img/recommendation-system-b1.png" style="width:600px;height:400px;">

- User Watched history: stores which videos are watched by users.
- Search Query DB: stores historical queries that users searched in the past.
- User/Video DB: stores users and their profiles with video metadata.
- User historical recommendations: stores past recommendations for users.
- Resampling data: downsample negative examples.
- Feature pipeline: generate all required features for training a model.
- Model Repos: stores all models.

### Huge data size
- Pick 1 month or 6 months of recent data.

### Imbalance data
- Random negative down-sampling.

### High availability#
- Kubernetes to auto-scale the number of pods.