## Gradient descent 梯度下降法

### 基本思路

- 输入：目标函数f(x)，梯度函数$g(x)=\nabla f(x)$. 精度$\epsilon$, 步长 $\eta$
- 输出：$f(x)$的极小值
---
- 1 初始值$x^{(0)} \in R^{(n)}$，置$k=0$
- 2 计算$f(x^{k})$
- 3 计算梯度$g_k = g(x^{k}) $，当$||g_k|| < \varepsilon$时，停止迭代
- 4 当$||g_k|| >= \varepsilon$时，步进$x^{k+1} = x^{k} - \eta * g_k$
- 5 重复2，3，4

# 一维梯度下降法

$$
f(x) = x^2+1 \\
f'(x) = 2 * x
$$

In [4]:
def target_func(x):
    """目标函数"""
    return x**2 + 1

def grad_target_func(x):
    """目标函数梯度"""
    return x*2

def gradient_descent_algrithm(func, grad, current_x = 0.1, learning_rate = 0.01, precision = 0.01, max_iter = 10):
    """梯度下降法"""
    for i in range(max_iter):
        current_grad = grad(current_x)
        if abs(current_grad) < precision:
            break
            
        # update x
        current_x = current_x - current_grad * learning_rate
        print("Number ", i, ", current_x", current_x, ", f(x): ", func(current_x))
    
    print("当 x = ", current_x, ", f(x)的局部最小值为: ", func(current_x))
    
    return current_x

In [6]:
gradient_descent_algrithm(target_func, grad_target_func, current_x= 10, learning_rate= 0.2, precision= 0.01, max_iter= 10)

Number  0 , current_x 6.0 , f(x):  37.0
Number  1 , current_x 3.5999999999999996 , f(x):  13.959999999999997
Number  2 , current_x 2.1599999999999997 , f(x):  5.665599999999999
Number  3 , current_x 1.2959999999999998 , f(x):  2.6796159999999993
Number  4 , current_x 0.7775999999999998 , f(x):  1.6046617599999997
Number  5 , current_x 0.46655999999999986 , f(x):  1.2176782335999998
Number  6 , current_x 0.2799359999999999 , f(x):  1.078364164096
Number  7 , current_x 0.16796159999999993 , f(x):  1.0282110990745599
Number  8 , current_x 0.10077695999999996 , f(x):  1.0101559956668416
Number  9 , current_x 0.06046617599999997 , f(x):  1.003656158440063
当 x =  0.06046617599999997 , f(x)的局部最小值为:  1.003656158440063


0.06046617599999997

# 二维梯度下降法
$$
f(x) = -e^{-(x^2+y^2)}
$$

In [21]:
import math
import numpy as np

def target_func_2d(x, y):
    """目标函数"""
    return -math.exp(-(x **2 + y **2))

def gradient_target_func_2d(x, y):
    """目标函数梯度2d"""
    deriv_x = 2 * x * math.exp(-(x **2 + y ** 2))
    deriv_y = 2 * y * math.exp(-(x **2 + y ** 2))
    return deriv_x, deriv_y

def gradient_descent_2d(target_func, grad_func, current_x=0.1, current_y=0.1, learning_rate=0.01, precision=0.01, max_iters=20):
    """二维梯度下降法"""
    for i in range(max_iters):
        grad_x, grad_y = grad_func(current_x, current_y)
        if np.linalg.norm([grad_x, grad_y], ord=2) < precision:
            break
        
        # update x, y
        current_x = current_x - learning_rate * grad_x
        current_y = current_y - learning_rate * grad_y
        
        print("Number ", i, ", Current x, y:", current_x," , ", current_y )
        
    print("\n当 x = ", current_x, ", y = ", current_y , ", f(x)的局部最小值为: ", target_func_2d(current_x, current_y))

In [22]:
gradient_descent_2d(target_func_2d, gradient_target_func_2d, current_x=1, current_y=1, learning_rate=0.2, precision=0.1, max_iters=20)

Number  0 , Current x, y: 0.9458658867053549  ,  0.9458658867053549
Number  1 , Current x, y: 0.8826544334549  ,  0.8826544334549
Number  2 , Current x, y: 0.8083266112542866  ,  0.8083266112542866
Number  3 , Current x, y: 0.7208044838602468  ,  0.7208044838602468
Number  4 , Current x, y: 0.6188058941145235  ,  0.6188058941145235
Number  5 , Current x, y: 0.5037222225452176  ,  0.5037222225452176
Number  6 , Current x, y: 0.3824227965845662  ,  0.3824227965845662
Number  7 , Current x, y: 0.26824673335239607  ,  0.26824673335239607
Number  8 , Current x, y: 0.17532999068693128  ,  0.17532999068693128
Number  9 , Current x, y: 0.10937992229287938  ,  0.10937992229287938
Number  10 , Current x, y: 0.06666242193107458  ,  0.06666242193107458
Number  11 , Current x, y: 0.04023339487195043  ,  0.04023339487195043
Number  12 , Current x, y: 0.024192054151996364  ,  0.024192054151996364

当 x =  0.024192054151996364 , y =  0.024192054151996364 , f(x)的局部最小值为:  -0.9988301738125699
