### numerical gradient

由全部的变量的偏导数汇总而成的向量成为梯度（gradient）

导数：（函数值增量与自变量增量的比值的极限）<p/>
$ \dfrac{\Delta{f(x)}}{\Delta{x}} = \lim_{h\to0} \dfrac{f(x+h)-f(x-h)}{2h} $

梯度：（偏导数向量 - 偏导数只多个自变量中的其中一个的导数）<p/>
$ \Bigg(\dfrac{\Delta{f}}{\Delta{x_0}},\dfrac{\Delta{f}}{\Delta{x_1}}\Bigg) $

梯度下降：（沿梯度方向前进，逐渐减小函数值的过程，用来求损失函数最小值时的权重和偏置）<p/>
$
\begin{align} 
x_0 = x_0 - \eta\dfrac{\Delta{f}}{\Delta{x_0}} \\
x_1 = x_1 - \eta\dfrac{\Delta{f}}{\Delta{x_1}} 
\end{align}
$

In [1]:
import numpy as np

In [8]:
# 求导数 - 中心差分
def numerical_diff(f, x):
    h = 1e-4
    return (f(x + h) - f(x - h)) / (2 * h)

# 求梯度 - 全部变量的偏导数向量
def numerical_gradient(f, x):
    h = 1e-4
    grad = np.zeros_like(x)
    for idx in range(x.size):
        tmp_val = x[idx]
        # f(x+h)
        x[idx] = tmp_val + h
        fxh1 = f(x)
        # f(x-h)
        x[idx] = tmp_val - h
        fxh2 = f(x)
        grad[idx] = (fxh1 - fxh2) / (2*h)
    return grad

# 求梯度下降 - 迭代（全部变量 - 学习率 * 梯度）
def gradient_descent(f, init_x, lr=0.01, step_num=100):
    x = init_x
    for i in range(step_num):
        grad = numerical_gradient(f, x)
        x -= lr * grad
        print(i, 'x=', x, 'grad=', grad)
    return x

$ f(x_0 + x_1) = x_0^2 + x_1^2 $

In [3]:
def foo(x):
    return np.sum(x ** 2)

In [4]:
# 求函数的梯度
print(numerical_gradient(foo, np.array([3.0, 4.0])))
print(numerical_gradient(foo, np.array([0.0, 2.0])))
print(numerical_gradient(foo, np.array([3.0, 0.0])))

[6. 8.]
[0. 4.]
[6. 0.]


In [9]:
# 求函数foo的最小值 - 即求函数 f(x1+x2) = x0**2 + x1**2，x0和x1的最小值
init_x = np.array([-3.0, 4.0])
des = gradient_descent(foo, init_x=init_x, lr=0.1, step_num=100)
print(des)

0 x= [-2.4001  3.1999] grad= [-6.  8.]
1 x= [-1.92018  2.55982] grad= [-4.8002  6.3998]
2 x= [-1.536244  2.047756] grad= [-3.84036  5.11964]
3 x= [-1.2290952  1.6381048] grad= [-3.072488  4.095512]
4 x= [-0.98337616  1.31038384] grad= [-2.4581904  3.2762096]
5 x= [-0.78680093  1.04820707] grad= [-1.96675232  2.62076768]
6 x= [-0.62954074  0.83846566] grad= [-1.57360186  2.09641414]
7 x= [-0.50373259  0.67067253] grad= [-1.25908148  1.67693132]
8 x= [-0.40308608  0.53643802] grad= [-1.00746519  1.34134505]
9 x= [-0.32256886  0.42905042] grad= [-0.80617215  1.07287604]
10 x= [-0.25815509  0.34314033] grad= [-0.64513772  0.85810083]
11 x= [-0.20662407  0.27441227] grad= [-0.51631018  0.68628067]
12 x= [-0.16539926  0.21942981] grad= [-0.41324814  0.54882453]
13 x= [-0.13241941  0.17544385] grad= [-0.33079851  0.43885963]
14 x= [-0.10603552  0.14025508] grad= [-0.26483881  0.3508877 ]
15 x= [-0.08492842  0.11210406] grad= [-0.21207105  0.28051016]
16 x= [-0.06804274  0.08958325] grad= [-0.