# Twitter feed

## Scope

- Reverse chronological order fails to catch most engaging tweets due to the sheer large number of tweets.

<img src="img/twitter_feed1.png" style="width:500px;height:300px;">

## Scale

Assume
- 500M daily active users.
- 1 user is connected to 100 users.
- User fetches the feed 10 times a day.
    - We run Tweet ranking algorithm 5B times per day. 

<img src="img/twitter_feed2.png" style="width:400px;height:200px;">

## Metrics

### Positive user actions
- Time spent viewing Tweets.
- Liking Tweets.
- Re-Tweeting.
- Commenting on Tweets.

### Negative user actions
- Hiding Tweets.
- Reporting Tweets as inappropriate.

<img src="img/twitter_feed3.png" style="width:800px;height:600px;">

### Weighted user actions
- Not all actions are equal value.

<img src="img/twitter_feed4.png" style="width:600px;height:400px;">

## Architecture

<img src="img/twitter_feed5.png" style="width:800px;height:400px;">

### Tweet selection

<img src="img/twitter_feed6.png" style="width:600px;height:600px;">

#### Consider new Tweets
- Tweets generated betwwen user's log out and log in.
- Previous Tweets viewed by user, which was not popular but now is popular. 
- Previous Tweets not viewed by user while user was logged in.

<img src="img/twitter_feed7.png" style="width:400px;height:400px;">

#### User comes back after a while
- There will be limits of Tweet data to fetch.
- Need to fetch certain numbers of Tweets from a pool.

<img src="img/twitter_feed8.png" style="width:400px;height:400px;">

#### Tweets outside the user network
- Aligns with user interests.
- Locally/globally tredning.
- Tweet is relevant to user's network.

## Feature engineering

<img src="img/twitter_feed9.png" style="width:1000px;height:400px;">

### User-author historical relations
- author_liked_posts_3months: percentage of author Tweets user liked in the last 3 months.
- author_liked_posts_count_1year: number of author Tweets user liked in the past one year.

### User-author similarity
- common_followees: numbers of users and hash tags followed by both.
- topic_similarity: similarity between hash tags in the posts that both interacted.
- tweet_content_embedding_similarity: generate embedding (bag-of-words) for every user and take dot product between them.
- social_embedding_similarity: every user is represented by bag-of-ids (rather than bag-of-words)

### Author influence
- is_verified: if author is verified.
- author_social_rank: similar to Google page rank.
- author_num_followers: nubmer of followers that author has.
- follower_to_following_ratio

### Author Tweets historical trend
- author_engagement_rate_3months: (Tweets-interactions) / (Tweets-views)
- author_topic_engagement_rate_3months: compute similar feature above but per topic.

### User-tweet
- topic_similarity: similarity between hashtags and contents that user tweeted in the past and the Tweet itself.
- embedding_similarity: dot product between user and Tweet vector.

### Tweet content
- Tweet_length: concise Tweet has higher chance of getting likes.
- Tweet_recency: people are interested in latest Tweets.
- is_image_video: Tweets with image or video are more catchy.
- is_URL: Tweets with URL have higher probability of engagement.

### Tweet interaction
- num_total_interactions: need to use time decay model to give proper attention to trending Tweets.
- likes_in_last_3_days:
- comments_in_last_1_day:
- reshares_in_last_2_hours:
- likes_in_last_3_days_user’s_network_only:
- comments_in_last_1_day_user’s_network_only:
- reshares_in_last_2_hours_user’s_network_only:

### Context based features
- day_of_week:
- time_of_day:
- current_user_location:
- season:
- lastest_k_tag_interactions:
- approaching_holiday:

### Sparse features
- unigrams/bigrams of a Tweet:
- user_id:
- tweets_id:

<img src="img/twitter_feed10.png" style="width:500px;height:500px;">

## Training data generation

IF single model
- All Tweets with user interation will be postive examples.
- All Tweeks with only impressions will be negative examples.

If many models
- Tweets with "likes" will be positive and Tweets without "likes" will be negative.
- Tweets with "comments" will be positive and Tweets without "comments" will be negative.
- And so on.

<img src="img/twitter_feed11.png" style="width:700px;height:700px;">

Balancing positive and negative examples
- Randomly downsample to match the number of positive and negative examples.

Train/dev/test
- Train data on one time interval and validate data on next time interval.

<img src="img/twitter_feed12.png" style="width:500px;height:300px;">

## Ranking

- Given Tweets, predict probabilities of likes, comments, and re-Tweets.
- This is classification problem.

Logistic regression
- Must create feature in training data manually. (Tree and NN are able to learn features)
- Single model to predict overall engagement or separate models to predict different types of engagement.

<img src="img/twitter_feed13.png" style="width:700px;height:500px;">

Deep learning
- Hyperparameters
    - Learning rate.
    - Number of hidden layers.
    - Batch size.
    - Number of epochs.
    - Dropout rate.
- Multi task NN where total_loss = like_loss + comment_loss + retweet_loss
- Better than training sepearate network for each task because shared layers make training faster.

<img src="img/twitter_feed14.png" style="width:500px;height:300px;">

<img src="img/twitter_feed15.png" style="width:900px;height:900px;">

Stacking models
- Use Tree and NN to generate features to use in logistic regression. 
    - For example, use outputs of last hidden layer as input of logistic regression.
    - Online learning: update model based on user action.

<img src="img/twitter_feed16.png" style="width:1000px;height:900px;">

## Diversity

- Introduce penalty for same authors and similar content.
    - For example, add negative score for repeated author and contents.
    
## Online experimentation

- Use training and validation data to train 15 different models.
- Use test data to select the best model offline.
- Do A/B testing between the best offline model and online model.
    - Before testing, retraining the best offline model with the latest data.
    - Select 1% of users. Use existing model to half of 1% of users. Use the best offline model to the other half of 1% of users.
    - Compare user engagement.
        - Use statistical significance. (Ex. p-value)
        - Also, consider if new model causes the system to be more complex.