# AD prediction

## Scope
- Predict the probability of user engagement with an AD given context (query, device, etc)

## Metrics

### Offline
- AUC (Area under curve)
    - Common metric for binary classification.
    - Does not penalize how far off predicted score is from the actual label.
    - Insensitive to well-calibrated probabilities.
- Log loss (Cross-entropy loss)
    - Calibration sensitive metric.
    - Captures what degree expected probabilities diverge from class label.

### Online
- Revenue, which is the sum of winning bid value.
    - If bid is $1 and user clicks the AD, advertiser is charged $1.
    - Advertiser is not charged unless users click the AD.
- Counter metric
    - Hide the AD.
    - Never see the AD.
    - Report AD as inappropriate.
    
## Architecture

<img src="img/ad-prediction2.png" style="width:1000px;height:600px;">

### Auction
- AD rank score = (AD predicted score * bid)
- Cost per engagement = (AD rank of AD below / AD rank score) + 0.01
- AD will cost the minimum price that wins the auction.

### When user issues a query
- AD selection selects all ADs matching the targeting criteria and predict AD relevance score using simple model.
- AD selection also ranks the ADs and sends top ADs to AD prediction.
- AD prediction uses ML model to predict precisely calibrated score.
- AD auction uses bid and predicted score to pick the top most relevant ADs shown to users. 

## Feature engineering

### Ad
- ad_id
- ad_content_raw_terms
- historical_engagement_rate
    - ad_engagement_history_last_24_hrs
    - ad_engagement_history_last_7_days
- ad_impression
- ad_negative_engagement_rate
- ad_embedding
- ad_age
- ad_bid

### Advertiser
- advertiser_domain
- historical_engagement_rate
- region_wise_engagement

### User
- user_previous_search_terms
- user_search_terms
- age
- gender
- language
- embedding_last_k_ads
- engagement_content_type
- engagement_days
- platform_time_spent
- region

### Context
- current_region
- time
- device
    - screen_size
    
### User-ad
- embedding_similarity
- region_wise_engagement
- user_ad_category_histogram
- user_ad_subcategory_histogram
- user_gender_ad_histogram
- user_age_ad_histogram

### User-advertiser
- embedding_similarity
- user_gender_advertiser_histogram
- user_age_advertiser_histogram

## Training data generation

### Positive
- Clicks the AD.
- Add item to cart.

### Negative
- Ignore the AD.
- Negative feedback on AD.

### Model recalibration
- Downsample negative examples because it is likely that 98% of data would be negative.
- Recalibration is needed such that
    - $q = \dfrac{p}{p+(1-p)/w}$
    - $q$ is re-calibrated prediction score.
    - $p$ is prediction in downsampling space.
    - $w$ is downsampling rate.
    
### Train/test
- Train on first two weeks of data and test on third week of data.

## AD selection

### 1. Selection
- Build an in-memory index.
- Issue a query to fetch all ADs that are targeted for the current user.
    - Ex. Use search term, age, location, gender, etc to fetch the result.

### 2. Narrow down selection
- Use (bid * prior cost per engagement score) to pick the top selections.
    - If no prior score due to being new AD, give slightly higher score. 

### 3. Rank using simple model
- Select top $k$ ADs from the result of the previous step. 
- Use logistic regression or additive trees.
- At evaluation, ADs will be ranked basd on (bid * cost per engagement score)

## AD prediction

### Online learning
- Refresh model with the latest impression and engagement at regular interval (15min, 30min, etc)
    - Train base model and add new examples on top of it.
    - Stochastic gradient descent is used.

<img src="img/ad-prediction1.png" style="width:1000px;height:600px;">

- Generates the latest training examples using an online joiner. 
- Training data generater takes those examples and generates right feature sets.
- Model trainer runs SGD with those examples.

### Non-linear feature generation    
- Use additive trees and neural network to generate features.
- Use features from above in logistic regression.