## 勾配降下法

モデルのパラメータ$w=(w_0, ..., w_d)^\top$を変えて誤差関数$L(w)$を最小化する問題

$$
\min_{w} L(w)
$$

を考える。

勾配降下法（gradient descent method）は勾配$\nabla L(w) = \partial L(w) / \partial w$と学習率$\eta$を使って逐次的に重みを更新して最適化していく。$t$回目の反復（iteration）における重みは

$$
w^{(t)} = w^{(t - 1)} + \eta \nabla L(w)
$$

のように算出される

In [1]:
import numpy as np

np.random.seed(0)
w_true = np.array([3])
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
e = np.random.normal(scale=0.1, size=X.shape[0]).round(1)
y_true = X @ w_true + e


def model(w): # linear model
    return X @ w

def loss(y_true, y_pred): # squared loss
    return (1/2) * (y_true - y_pred)**2

def grad(y_true, y_pred):
    return (y_true - y_pred)


w = np.random.normal(size=X.shape[1]) # initialize
eta = 0.1
for i in range(5):  # 本当はLossが一定値未満になるまでwhileでループする
    y_pred = model(w)
    print(f"""
Iteration {i}
    MSE = {np.mean(loss(y_true, y_pred)):.3f}
    gradients for each sample = {grad(y_true, y_pred).round(1)}
""".strip())

    # update
    w = w + eta * sum(grad(y_true, y_pred))

print(f"""
Final Model:
    estimated weights: {w}
    true weights: {w_true}
    predicted values : {model(w).round(1)}
    true target values: {y_true.round(1)}
""")

Iteration 0
    MSE = 88.846
    gradients for each sample = [ 4.2  8.  12.  16.1 20.1]
Iteration 1
    MSE = 22.375
    gradients for each sample = [ -1.9  -4.1  -6.1  -8.  -10.1]
Iteration 2
    MSE = 5.516
    gradients for each sample = [1.2 1.9 3.  4.  5. ]
Iteration 3
    MSE = 1.422
    gradients for each sample = [-0.3 -1.1 -1.5 -2.  -2.5]
Iteration 4
    MSE = 0.338
    gradients for each sample = [0.4 0.4 0.7 1.  1.2]

Final Model:
    estimated weights: [3.17241493]
    true weights: [3]
    predicted values : [ 3.2  6.3  9.5 12.7 15.9]
    true target values: [ 3.2  6.   9.1 12.2 15.2]

