https://book.douban.com/subject/30243136/
# Part I: Modelling Approaches
## Performance Metric
  - F1 score: 2/F = 1/P + 1/R
  - Other interpretations for AUC: 
    - Wilcoxon Test of Ranks
    - Gini-index: Gini+1 = 2*AUC
    - Not sensitive to predicted score

## Feature Engineering and Feature Selection

***Continuous Variables***
  - Bucketing for continuous variables in, for example, logistic regression (by width or by percentile)
  - Missing value treatment (imputation or code dummy variables)
  - Feed RF nodes to linear models

***Discrete Variables***
- Cross-interaction
- Statistics (e.g., unique values of B for each A)

***Time, Space, Text Features***

## Popular Models
***Logistic Regression***:
- Why not OLS (outliers)
- How to solver: GD, or stochastic GD (Google FTRL)
- Advantage: Fast, scalable

***FM***
- Motivation: 
   - Feature interaction (not done manually)
   - Polynomial kernel (too many parameters, too sparse matrix)
- Approach: 
  - Instead of learning all co-occurrence of i and j, the weight w is calculated as the dot product of v_i and v_j with dimension k. 
  - Here assumption is imposed on matrix W so that it can be de-composed.
  - The parameters for different combinations are no longer independent
- Improvement: 
  - FFM to map similar features into a *field* 
- Application:
   - Serve as embedding for NN (e.g., User and Ad similarity)
   - Outperforms GBDT for learn complicated feature interactions (due to sparse combinations)

***GBDT***
- Compared with Linear Models: Missing value, Range difference of attributes,, outliers, interactions, non-linear decision boundary



***GBRank***
- A point-wise ranking method. 
- For returned pair of x and y, assume the right order is x > y, and the score function is $h$ (i.e., $h(x) > h(y)$), and the function is expressed by a series of trees.


- Define Loss function:
$$L_i = \sum_{i=1}^N \{max(0, [\tau + h(y_i) - h(x_i)])\}^2$$
where $\tau$ is defined as a threshold for better definition of loss function (i.e., difference is at least $\tau$ to have zero loss)


- Define Gradient for pairs with non-zero loss:
$$r_m(x_i) = \frac{\partial L_i}{\partial h_{m-1}(x_i)} = -\tau - h_{m-1}(y_i) + h_{m-1}(x_i)$$
$$r_m(y_i) = \frac{\partial L_i}{\partial h_{m-1}(y_i)} = \tau + h_{m-1}(y_i) - h_{m-1}(x_i)$$


- **Note:** Instead of fitting $r_m$, directly fitting $h_m - \rho r_m$
- For simplicity, set $\rho = 1$
$$h_m(x_i) = h_{m-1}(x_i) - r_m(x_i) = \tau + h_{m-1}(y_i)$$
$$h_m(y_i) = h_{m-1}(y_i) - r_m(y_i) = -\tau + h_{m-1}(x_i)$$


- Fit a regression tree $\Delta_m $with sample points ${(x_i, h_m(x_i)),((y_i, h_m(y_i))}$
- Get the score function $h$ for next iteration:
$$h_m = \frac{mh_{m-1} + \eta \Delta_m}{m+1}$$



***RankNet***
- A point-wise ranking method
- For returned pair of x and y, assume the right order is x > y (i.e., $P_0 = 1$), and the score function is $f$ (i.e., $f(x) > f(y)$), and the function is expressed by a neural network.


- Define forward network (take two layer for example, with sizes 10, 5, 1)
$$\underset{5 \times 1}{f_1} = \underset{5 \times 10}{\mathbf W_1} \cdot \underset{10 \times 1}{\mathbf x}$$
$$\underset{1 \times 1}{f(x)} = \underset{1 \times 5}{\mathbf W_2} \cdot \underset{5 \times 1}{f_1} =\underset{1 \times 5}{\mathbf W_2} \cdot  \underset{5 \times 10}{\mathbf W_1} \cdot \underset{10 \times 1}{\mathbf x}$$
$$f_{xy} = f(x) - f(y)$$

- Define predicted probability
$$P_{xy} = P(f(x) > f(y)) = Sigmoid (f_{xy})$$


- Loss is defined as cross entropy
$$L = -P^0log(P)-(1-P^0)log(1-P)$$

- Define Gradients for weights
$$\frac{\partial f_x}{\partial {\mathbf W_2}} = \underset{1 \times 5}{?} = f_1^T$$
$$\frac{\partial f_x}{\partial {\mathbf W_1}} = \underset{5 \times 10}{?} = \underset{5 \times 1}{?} \cdot \underset{1 \times 10}{?} =  {\mathbf W_2^T} \cdot  {\mathbf x^T}$$

- Define Gradient for output $f$
$$\frac{\partial L}{\partial f_{xy}} = P_{xy} - P_0 = Sigmoid (f_{xy}) - P^0 $$

- Define update rule
$$\frac{\partial L}{\partial {\mathbf W}} = \frac{\partial L}{f_{xy}} \cdot \frac{\partial f_{xy}}{f_{x}}  \cdot  \frac{\partial f_{x}}{\mathbf W}  + \frac{\partial L}{f_{xy}} \cdot \frac{\partial f_{xy}}{f_{y}}  \cdot  \frac{\partial f_{y}}{\mathbf W} $$

# Part II: Notes
# 用户画像
## 标签体系
- LBS 例如常住城市、居住商圈
- 人口属性， 例如性别、年龄、婚姻
- 业务标签，例如外卖、丽人
- 兴趣偏好，例如品类、品牌
- 特征人群，例如亲子、有车
- 用户分级，例如会员、用户价值
- 用户行为，例如浏览、交易

## 处理流程
- 用户标识与特征工程
- 标签产生平台
    - 特征库与样本库
    - 特征定制
    - 模型构建
    - 模型发布
- 建模方法
    - 基于规则（基于历史行为的用户品类偏好）
    - 样本明确（例如预测年龄等）
    - 小样本（精准营销）
- 应用场景
    - 根据用户的ID查询用户画像，根据画像数据进行个性化推荐
    - 根据画像数据作为查询条件
    
# POI 实体链接
两种思路：
- 聚类分析，同一个cluster则是同一个实体（实时性差，不能实施计算新实体）
- 建立索引缩小候选集（索引粒度，召回率，候选集大小的tradeoff）

一个例子：酒店实体
- 选择关键字段：名称、地址、电话、经纬度
- 训练线下模型
- 通过索引找出potential链接，应用模型解决二分类问题

# 评论挖掘
- 中文NLP预处理
    - 噪声、错别字的删除与替换
    - 中文分词
        - 词义消歧（例如用wordvec替换为语义层面相近的词）
        - 预定义词表（专有名词）
        - 未登录词
        - PoS
        - NER
    - 长句切分
    - 去除停用词
    - 词向量
- 无监督与有监督问题
- 情感分析
    - 评论情感分析的特殊性
        - 长短不一，实时性事件
        - 情感负责，正负存在于同一样本
        - 正负样本比例
        
# 查询理解
- 倒排索引的概念（建立索引表，根据搜索条件求交集）
- 意图识别 
    - 餐饮，酒店，景点等
    - 建立多个二分类模型。例子:黄鹤楼，酒店或景点
- 实体识别
- 召回策略
    - 特定查询只在特定文本域求交
    - 将不同成分映射到不同文本域做检索（即利用实体识别的产出，例如望京的涮羊肉->商圈索引域+品类索引域）
- 查询改写
    - 同义词：宾馆，客栈
    - 下位词：陕西美食，羊肉泡馍
    - 方法：Session挖掘，二部图，语义向量
- 词权重
    - TF-IDF
    - 基于规则
    - 基于统计学习

# 用户引导
- 搜索前引导：例如历史搜索词，推荐系统
- 搜索中引导：查询补全
- 搜索后引导：相关查询、商家推荐

# 排序
- 主要特点：移动化、场景化、本地化、个性化
- 排序模型：Point-wise与Pair-wise
- 场景化：不同场景对应不同模型
- 评价指标：Prec@K, AP, MAP（AUC不适用，不直接反映排序的好坏）

# 推荐
- 主要特点：地理位置附近、用户历史行为、实施即时推荐
- 推荐召回
    - 协同过滤
    - 实时位置
    - 实时行为
    - 替补策略
- 推荐排序
    - 主要特征：用户、Item、场景
    - 样本选取：
        - 如何选正样本？Skip-Above
        - 负样本补充？曝光数据
        - 样本权重：支付、下单、点击
- 推荐系统评价指标

# 计算广告
- 特点：三方利益（平台、商户、用户）
- 不同评价标准：
    - 商户：可见性、线上增量收益、整体增量收益、ROI
    - 用户：CTR与CVR
    - 平台：Revenue = # of Impression * CTR * CPC
- 不同的召回机制
    - 搜索场景广告
        - 根据用查询词在广告索引中寻找匹配的商户
        - 相比自然排序，业务目标不同：除排序正确还需要考虑CTR准确
    - 推荐场景广告
        - 推送广告：根据时间、用户历史、地理位置、（已知和估计的）人口属性、用户偏好、Look-Alike等来定向(p188)
        - 主要考量：点击、转化率与覆盖率的平衡，不能太窄或太泛
- 排序
    - $Rank Score = bid \times CTR$
    - $CTR = P(Click|AD, User, Query)$
    - 常见的CTR模型：LR, FM, FFM, DL
