# 梯度下降法GradientDescent
- 将之前的推理扩展到高维，可以认为梯度代表方向，对应J增大的方向，区别在于：之前处理的是一个导数问题，高维空间中处理的是一个向量
- 暨$-\eta\Delta J$
- $\Delta J = (\frac{∂J}{∂\theta_0},\frac{∂J}{∂\theta_1},...,\frac{∂J}{∂\theta_n})$

![梯度下降法](../../pic/multiGradientDescent.png)

- 梯度方向是下降最快的方向
- 简单线性回归的目标：使损失函数$\sum_{i=1}^m(y^i - \hat y^i)^2$···(1)尽可能小
- $\hat y^i = \theta_0 + \theta_1X^i_1 + \theta_2X^i_2+...+\theta_nX^i_n$ ··· (2)
- (2)带入(1)得损失函数J：$\sum_{i=1}^m(y^i - \theta_0 - \theta_1X^i_1 - \theta_2X^i_2 - ... - \theta_nX^i_n)^2$
- 对应的J的梯度值，暨J对$\theta$每个维度的未知量求导：

$$\Delta J(\theta) = (\frac{∂J}{∂\theta_0},\frac{∂J}{∂\theta_1}...\frac{∂J}{∂\theta_n})^T$$

最后可以推导出：$\Delta J = \frac{2}{m}(\sum_{i=1}^m(X^I_b\theta - y^i,\sum_{i=1}^m(X^i_b\theta - y^i)·X^i_1,...,\sum_{i=1}^m(X^i_b\theta - y^i)·X^i_n)^T$

In [1]:
import numpy as np

np.random.seed(666)
x = 2 * np.random.random(size=100)
y = x * 6. + 9. + np.random.normal(size=100)
X = x.reshape(-1, 1)

## 使用上面推导出来的公式，使用梯度下降法训练

In [2]:
def J(theta, X_b, y):
    try:
        return np.sum((y - X_b.dot(theta)) ** 2) / len(X_b)
    except:
        return float('inf')
def dJ(theta, X_b, y):
    result = np.empty(len(theta))
    result[0] = np.sum(X_b.dot(theta) - y)
    for i in range(1, len(theta)):
        result[i] = (X_b.dot(theta) - y).dot(X_b[:, i])
    return result * 2 / len(X_b)

def gradient_descent(X_b, y, initial_theta, eta, n_iter=1e4, epsilon=1e-8):
    theta = initial_theta
    cur_iter = 0
    while cur_iter < n_iter:
        gradient = dJ(theta, X_b, y)
        last_theta = theta
        theta = theta - eta * gradient
        if abs(J(last_theta, X_b, y) - J(theta, X_b, y)) < epsilon:
            break
        cur_iter+=1
    print("curent iter:",cur_iter)
    return theta

In [3]:
X_b = np.hstack([np.ones((len(x), 1)), x.reshape(-1,1)])
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01
theta = gradient_descent(X_b,y,initial_theta,eta)
theta

curent iter: 2037


array([9.02145676, 6.0070637 ])

In [4]:

def J(theta, X_b, y):
    try:
        return np.sum((y - X_b.dot(theta))**2) / len(X_b)
    except:
        return float('inf')
def dJ(theta, X_b, y):
    res = np.empty(len(theta))
    res[0] = np.sum(X_b.dot(theta) - y)
    for i in range(1, len(theta)):
        res[i] = (X_b.dot(theta) - y).dot(X_b[:,i])
    return res * 2 / len(X_b)
def gradient_descent(X_b, y, initial_theta, eta, n_iters = 1e4, epsilon=1e-8):
    theta = initial_theta
    cur_iter = 0
    while cur_iter < n_iters:
        gradient = dJ(theta, X_b, y)
        last_theta = theta
        theta = theta - eta * gradient
        if abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon:
            break
        cur_iter += 1
    print("curent iter:",cur_iter)
    return theta

X_b = np.hstack([np.ones((len(x), 1)), x.reshape(-1,1)])
initial_theta = np.zeros(X_b.shape[1])
eta = 0.01

theta = gradient_descent(X_b, y, initial_theta, eta)
theta

curent iter: 2037


array([9.02145676, 6.0070637 ])