# Factorization Machine

之前的　Linear Regression 的形式是形如　$\hat{y}= w^T x$，Factorization Machine 在　Linear Regression　的基础之上添加了所谓的交叉项，即　$c_{ij}x_ix_j$, 即　$\hat{y} = w_1x_1 + \ldots + w_fx_f + \sum_{p=1}^{f-1} \sum_{q=p+1}^{f} c_{pq}x_px_q$，由于有些交叉项在实际中并不存在，所以使用向量相乘的办法，用一个　$k \times f$　的矩阵，从中任选两个向量 $v_i, v_j$　相乘作为系数，从　$n$　个向量中任选两个相乘构成的系数个数一共有 $C_f^2 = \frac{f(f-1)}{2}$　个，刚好等于后面交叉项的数量。

最后的交叉项可以写成　$\sum_{p=1}^{f-1} \sum_{q=p+1}^{f} c_{pq}x_px_q = \frac{1}{2}(\sum_{p=1}^f \sum_{q=1}^f c_{pq}x_px_q - \sum_p^f v_p^2x_p^2)=\frac{1}{2} \sum_u^k[(\sum_p^fv_{p,u}x_p)(\sum_q^f v_{q,u}x_q) - (\sum_p^f v_{p, u}^2x_p^2)]= \frac{1}{2} \sum_u^k[(\sum_p^fv_{p,u}x_p)^2- (\sum_p^f v_{p, u}^2x_p^2)]$，降低计算复杂度。

其中 $c_{ij} = v_i \times v_j$

计算　Loss 采用的函数仍是　MSE，即　$loss = \frac{1}{2} \sum_i^n(\hat{y}_i - y_i)^2$

Gradient 的计算及更新：

$\begin{align*}
w_i & = w_i - \eta \cdot [\sum_i^n x_i \cdot (\hat{y_i} - y_i) ] \\
v_{p,u} & = v_{p,u} - \eta \cdot \sum_i^n [(\hat{y_i} - y_i) \cdot (x_{i, p} \sum_p^f v_{p, u} x_{i, p} - x_{i, p}^2v_{p, u})]
\end{align*}$

Factorization Machine 是一个适用于回归场景的算法。为了演示这个算法，采用典型的　Boston Housing 的数据集。

In [1]:
# load boston housing 

from sklearn.datasets import load_boston
import numpy as np


boston = load_boston()
X = boston.data
y = boston.target

In [2]:
import pandas as pd

boston_df = pd.DataFrame(X, columns=boston.feature_names)

# z-score
for feature in boston.feature_names:
    boston_df[feature] = (boston_df[feature] - boston_df[feature].mean()) / boston_df[feature].std()

# min-max
for feature in boston.feature_names:
    boston_df[feature] = (boston_df[feature] - boston_df[feature].min()) / (boston_df[feature].max() - boston_df[feature].min())

X = np.c_[boston_df.values, np.ones(X.shape[0])]

In [1]:
k = 6

LEARNING_RATE = 1e-7
EPOCH = 10

BATCH_SIZE = 10

PRINT_NUMS = 20
PRINT_INTERVAL = EPOCH / PRINT_NUMS

f = X.shape[1] - 1

w = np.random.uniform(0, 1, size=(1, f + 1))
v = np.random.uniform(0, 1, size=(k, f))


for epoch in range(EPOCH):
    index = np.random.randint(0, X.shape[0], size=BATCH_SIZE)
    sample_x = X[index]
    sample_y = y[index]

    # linear part
    linear_part = np.dot(w, sample_x.T)

    # cross part
    # cross_part = 0
    # for f in range(k):
    #     cross_part += (np.dot(v[f,:], sample_x[0,0:n]) - np.dot(np.power(v[f,:], 2), np.power(sample_x[0,0:n], 2)))

    y_hat = linear_part
    loss = y_hat - sample_y

    # update gradient
    # w = w - LEARNING_RATE * loss * sample_x
    # for i in range(n):
    #     for f in range(k):    
    #         v[f,i] = v[f,i] - LEARNING_RATE * loss * (sample_x[0,i] * np.dot(v[f,:], sample_x[0,0:n]) - v[f,i] * sample_x[0,i]**2)

    # if epoch % PRINT_INTERVAL == 0:
    #     print('EPOCH: %d, loss: %f' % (epoch, loss**2))

NameError: name 'X' is not defined