# Factorization Machine

之前的　Linear Regression 的形式是形如　$\hat{y}= w^T x$，Factorization Machine 在　Linear Regression　的基础之上添加了所谓的交叉项，即　$w_{ij}x_ix_j$, 即　$\hat{y} = w_1x_1 + \ldots + w_nx_n + \sum_{i=1}^{n-1} \sum_{j=i+1}^{n} w_{ij}x_ix_j$，由于有些交叉项在实际中并不存在，所以使用矩阵分解的办法，用一个　$k \times n$　的矩阵，从中任选两个向量 $v_i, v_j$　相乘作为系数，从　$n$　个向量中任选两个相乘构成的系数个数一共有 $C_n^2 = \frac{n(n-1)}{2}$　个，刚好等于后面交叉项的数量。

最后的交叉项可以写成　$\sum_{i=1}^{n-1} \sum_{j=i+1}^{n} w_{ij}x_ix_j = \frac{1}{2}(\sum_{i=1}^n \sum_{j = 1}^n w_{ij}x_ix_j - \sum_i^n w_{ii}x_ix_i) = \frac{1}{2} \sum_{f}^k[(\sum_i^n v_{fi}x_i)^2 - \sum_i^nv_{fi}^2 x_i^2]$，降低计算复杂度。

其中 $w = v \times v$

计算　Loss 采用的函数仍是　MSE，即　$loss = (\hat{y} - y)^2$

Gradient 的计算及更新：

$\begin{align*}
w_i & = w_i - \eta \cdot (\hat{y} - y) \cdot x_i \\
v_{fi} & = v_{fi} - \eta \cdot (\hat{y} - y) \cdot (x_i \sum_j^n v_{fj}x_j - x_i^2v_{fi})
\end{align*}$

Factorization Machine 是一个适用于回归场景的算法。为了演示这个算法，采用典型的　Boston Housing 的数据集。

In [1]:
# load boston housing 

from sklearn.datasets import load_boston
import numpy as np


boston = load_boston()
x = boston.data
y = boston.target

x = np.c_[x, np.ones(x.shape[0])]

In [33]:
k = 20

LEARNING_RATE = 1e-5
EPOCH = 5

BATCH_SIZE = 1

PRINT_NUMS = 5
PRINT_INTERVAL = EPOCH / PRINT_NUMS

n = x.shape[1] - 1

w = np.random.uniform(0, 1, size=(1, n + 1))
v = np.random.uniform(0, 1, size=(k, n))


for epoch in range(EPOCH):
    index = np.random.randint(0, x.shape[0], size=BATCH_SIZE)
    sample_x = x[index]
    sample_y = y[index]

    # linear part
    linear_part = np.dot(w, sample_x.T)

    # cross part
    cross_part = 0
    for f in range(k):
        part_one = 0
        part_two = 0
        for i in range(n):
            part_one = v[f,i] * sample_x[0,i]
            part_two = v[f,i]** 2 * sample_x[0,i] ** 2

        cross_part = part_one + part_two
    cross_part = cross_part / 2

    y_hat = linear_part +cross_part
    loss = y_hat - sample_y

    if epoch % PRINT_INTERVAL == 0:
        print('EPOCH: %d, loss: %f' % (epoch, loss**2))

    # update gradient
    for i in range(n):
        w[0,i] = w[0,i] - LEARNING_RATE * loss * sample_x[0,i]
        
        for f in range(k):
            sum_part = 0
            for j in range(n):
                sum_part += v[f,j] * sample_x[0,j]
            v[f,i] = v[f,i] - LEARNING_RATE * loss * (sample_x[0,i] * sum_part - v[f,i] * sample_x[0,i]**2)

EPOCH: 0, loss: 728485.367060
EPOCH: 1, loss: 598136874976026169503505928176508715576923783168.000000
EPOCH: 2, loss: inf
EPOCH: 3, loss: nan
EPOCH: 4, loss: nan
