因子分解机  
This section is another common model for estimating user behavior in recommendation systems:factorization machines(FM)  

- 通过点击率(click through rate, CTR)预测用户点击其他物品的概率
- 点击与未点击是一个二分类问题，可以使用logistics regression解决
- logistics regression中feature $x_i$与$x_j$之间没有运算，相互独立；但是问题中feature是可能存在关联的，引入双线性改进预测:
$$\hat{y}(x)=\theta_0+\mathbf{\theta^T x}+\frac{1}{2}\mathbf{x^T Wx}$$
- one-hot encording.实际中使用独热编码来表示事务的离散特征，犹豫独热编码的特征向量维度高，稀疏性大，x_ix_j=0无法对w_{ij}进行梯度更新。
- 因子分解机器模型：$$\mathbf{W = VV^T}, \mathbf{V}\in \mathbb{R}^{d\times k}$$  

最后得到FM的预测公式：
$$\hat{y}(x)=\theta_0+\sum^{d}_{i=1}\theta_ix_i+\frac{1}{2}\sum^{k}_{l=1}((\sum^{d}_{i=1}v_{il}x_i)^2-\sum^{d}_{i=1}v^2_{il}x^2_i)$$

The dataset used in this section is a sample dataset for fm_dataset.csv, which contains the characteristics of an item that a user has viewed and whether the user has clicked on the item.Each row of the dataset contains an item,with the first 24 columns being its characteristics and the last column being 0 or 1,indicating that the user did no or had clicked on the item,respectivedly.Our goal is to predict user behavior on the test set based on input characteristics,which is a dichotomous problem.

在NumPy中，`np.random.seed(0)`的作用是设置随机数生成器的种子，以确保每次运行程序时产生的随机数序列是相同的。这种设置对于机器学习和其他需要随机性的任务很有用，因为它可以使得实验的结果可重复。如果不设置种子，每次运行程序时都会得到不同的随机数序列，这可能会导致实验结果的不稳定性。通过设置种子，可以使得每次运行程序时得到相同的随机数序列，便于结果的比较和调试。

In [2]:
import numpy as np
import matplotlib.pyplot as plt
from sklearn import metrics # The Evaluation Index function library in sklearn
from tqdm import tqdm

# load dataset
data = np.loadtxt('fm_dataset.csv', delimiter=',')

# divide dataset
np.random.seed(0)
ratio = 0.8
split = int(ratio * len(data))
x_train = data[:split, :-1]
y_train = data[:split, -1]
x_test = data[:split, :-1]
y_test = data[split:, -1]

# feature numbers
feature_num = x_train.shape[1]
print('size of x_train:', len(x_train))
print('size of x_test:', len(x_test))
print('feature numbers:', feature_num)

size of x_train: 800
size of x_test: 800
feature numbers: 24


In [4]:
class FM:
    def __init__(self, feature_num, vector_dim):
        # vector_dim represents the k in the formula
        # and is the dimension of vecotr v
        self.theta0 = 0.0 # constant terms
        self.theta = np.zeros(feature_num) # Linear parameter
        self.v = np.random.normal(size=(feature_num, vector_dim)) # Bilinear parameter
        self.eps = 1e-6 # precision parameter
        
    def _logistic(self, x):
        # utility function for converting predictions into probabilities
        return 1 / (1 + np.exp(-x))
    
    def pred(self, x):
        # linear term
        linear_term = self.theta0 + x @ self.theta
        # Bilinear term
        square_of_sum = np.square(x @ self.v)
        sum_of_square = np.suqare(x) @ np.square(self.v)
        # final predict
        y_pred = self._logistic(linear_term
                               + 0.5 * np.sum(square_of_sum - sum_of_square, axis=1))
        # In order to prevent the following gradients from being too large,
        # the predicted values are clipped and limited to a certain range
        y_pred = np.clip(y_pred, self.eps, 1 - self.eps)
        return y_pred
    
    def update(self, grad0, grad_theta, grad_v, lr):
        self.theta0 -= lr * grad0
        self.theta -= lr * grad_theta
        self.v -= lr * grad_v

In [None]:
vector_dim = 16
learning_rate = 0.01
lbd = 0.05
max_training_step = 200
batch_size = 32

np.random.seed(0)
model = FM(feature_num, vector_dim)

train_acc = []
test_acc = []
train_auc = []
test_auc = []

with tqdm(renge(max_training_step)) as pbar:
    for epoch in pbar:
        st = 0
        while st < len(x_train):
            ed = min(st + batch_size, len(x_train))
            X = x_train[st: ed]
            Y = y_train[st: ed]
            st += batch_size
            # calculate model predict
            y_pred = model.pred(X)
            # calculate Cross entropy loss
            cross_entropy = -Y * np.log(y_pred) \ 
                - (1 - Y) * np.log(1 - y_pred)
            loss = np.sum(cross_entropy)