# Factorization Machine

之前的　Linear Regression 的形式是形如　$\hat{y}= w^T x$，Factorization Machine 在　Linear Regression　的基础之上添加了所谓的交叉项，即　$c_{ij}x_ix_j$, 即　$\hat{y} = w_1x_1 + \ldots + w_fx_f + \sum_{p=1}^{f-1} \sum_{q=p+1}^{f} c_{pq}x_px_q$，由于有些交叉项在实际中并不存在，所以使用向量相乘的办法，用一个　$f \times k$　的矩阵，从中任选两个向量 $v_i, v_j$　相乘作为系数，从　$n$　个向量中任选两个相乘构成的系数个数一共有 $C_f^2 = \frac{f(f-1)}{2}$　个，刚好等于后面交叉项的数量。

最后的交叉项可以写成　$\sum_{p=1}^{f-1} \sum_{q=p+1}^{f} c_{pq}x_px_q = \frac{1}{2}(\sum_{p=1}^f \sum_{q=1}^f c_{pq}x_px_q - \sum_p^f v_p^2x_p^2)=\frac{1}{2} \sum_{u=1}^k[(\sum_{p=1}^fv_{p,u}x_p)(\sum_{q=1}^f v_{q,u}x_q) - (\sum_{p=1}^f v_{p, u}^2x_p^2)]= \frac{1}{2} \sum_{u=1}^k[(\sum_{p=1}^fv_{p,u}x_p)^2- (\sum_{p=1}^f v_{p, u}^2x_p^2)]$，降低计算复杂度。

其中 $c_{pq} = v_p \cdot v_q$

计算　Loss 采用的函数仍是　MSE，即　$loss = \frac{1}{2} \sum_i^n(\hat{y}_i - y_i)^2$

Gradient 的计算及更新：

$\begin{align*}
w_i & = w_i - \eta \cdot \frac{1}{n} \cdot [\sum_i^n x_i \cdot (\hat{y_i} - y_i) ] \\
v_{p,u} & = v_{p,u} - \eta \cdot \frac{1}{n} \cdot \sum_{i=1}^n [(\hat{y_i} - y_i) \cdot x_{i, p}^2 \cdot (\sum_{p=1}^f v_{p, u} - v_{p, u})]
\end{align*}$

Factorization Machine 是一个适用于回归场景的算法（分类场景也可以使用）。为了演示这个算法，采用 Boston Housing 数据集

In [93]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split


boston = load_boston()
boston_data = pd.DataFrame(boston.data,  columns=boston.feature_names)
boston_data['bias'] = np.ones(boston.data.shape[0])
boston_data['target'] = boston.target

ss = StandardScaler()
boston_data = ss.fit_transform(boston_data)

shape = boston_data.shape
X_train, X_test, y_train, y_test = train_test_split(boston_data[0:shape[0], 0:-1], boston_data[0:shape[0], -1], test_size=0.25,
                                                    random_state=33)

X_train = X_train.T
X_test = X_test.T

In [169]:
# numpy version

w = np.random.rand(X_train.shape[0])
k = 10
v = np.random.rand(X_train.shape[0], k)

BATCH_SIZE = 8
LEARNING_RATE = 0.00001263

PRINT_STEP = 1

EPOCH = 1000
for epoch in range(EPOCH):
    index = np.random.randint(0, X_train.shape[1], size=BATCH_SIZE)
    X_batch = X_train[:, index]
    y_batch = y_train[index]

    # linear part
    linear_part = np.dot(w.T, X_batch)

    # cross part
    cross_part = np.zeros(BATCH_SIZE)
    for m in range(0, X_train.shape[0] - 1):
        for n in range(m + 1, X_train.shape[0]):
            v_m = v[m, :]
            v_n = v[n, :]
            cross_part += np.dot(v_m, v_n) * np.multiply(X_batch[m, :], X_batch[n, :])
 
    y_hat = linear_part + cross_part
    loss = y_hat - y_batch

    # linear pard update grade
    w = w - LEARNING_RATE * np.multiply(loss, X_batch).sum(axis=1) / BATCH_SIZE
    
    # matrix grad update
    for p in range(X_train.shape[0]):
        for u in range(k):
            v_grad = np.multiply(loss,  X_batch[p, :]**2 * (v[:, u].sum() -  v[p, u])).sum()
            v[p, u] =  v[p, u] - LEARNING_RATE * v_grad / BATCH_SIZE
        
    if epoch % PRINT_STEP == 0:
        print('EPOCH: %d, loss: %f' % (epoch, (loss * loss).sum()))

ss: 2421.068171
EPOCH: 334, loss: 2943.515062
EPOCH: 335, loss: 17877.261137
EPOCH: 336, loss: 8889.630842
EPOCH: 337, loss: 2661.412424
EPOCH: 338, loss: 629.099950
EPOCH: 339, loss: 10880.182918
EPOCH: 340, loss: 4256.975474
EPOCH: 341, loss: 5394.365113
EPOCH: 342, loss: 1482.767862
EPOCH: 343, loss: 12826.270713
EPOCH: 344, loss: 1095.472496
EPOCH: 345, loss: 4635.479408
EPOCH: 346, loss: 9378.882344
EPOCH: 347, loss: 6092.925269
EPOCH: 348, loss: 14057.590136
EPOCH: 349, loss: 3319.328224
EPOCH: 350, loss: 1172.559911
EPOCH: 351, loss: 5595.512883
EPOCH: 352, loss: 20884.348089
EPOCH: 353, loss: 4007.364540
EPOCH: 354, loss: 9719.476123
EPOCH: 355, loss: 6248.360166
EPOCH: 356, loss: 17496.993057
EPOCH: 357, loss: 13018.473582
EPOCH: 358, loss: 636.394697
EPOCH: 359, loss: 2034.323169
EPOCH: 360, loss: 17326.999071
EPOCH: 361, loss: 7617.521790
EPOCH: 362, loss: 4921.514630
EPOCH: 363, loss: 3449.532465
EPOCH: 364, loss: 716.568809
EPOCH: 365, loss: 2264.796892
EPOCH: 366, loss: 8

In [None]:
# PyTorch Version

import torch