梯度下降是一种优化算法，主要用于最小化目标函数，如损失函数，在机器学习和深度学习中用于训练模型。其核心原理是沿着目标函数梯度的反方向逐步调整参数

In [23]:
import numpy as np 

In [25]:
def gradient_descent(x,y,lr = 0.01,n_iters = 1000):
    m,n = x.shape
    w = np.zeros(n)  

    for _ in range(n_iters):
        # 计算预测值
        y_pred = np.dot(x,w)
        # 计算梯度
        gradient =  np.dot(x.T, (y_pred - y)) / m 
        # 更新参数
        w = w - lr * gradient
    return w

In [26]:
x = np.array([[1,2],[3,4],[5,6]])  # (3,2)
y = np.array([1,2,3])    # (3,)  
w = gradient_descent(x,y)   
w  # (2,)
print("最优参数：", w)

最优参数： [0.10065471 0.42053884]


###  计算梯度
NumPy本身不直接提供自动求导功能，但可以通过数值方法（如有限差分法）来近似计算梯度

torch 提供了自动求导功能， 定义tensor时， set requires_grad = True 

对函数求梯度 $$ y=x^2+3x$$ 


In [27]:
c = np.random.randint(0, 50, (3, 5))  # 二维数组示例
grad_c = np.gradient(c)
for grad in grad_c:
    print(grad)  # 分别打印每个维度的梯度

[[ 16.   13.   31.    9.   13. ]
 [ -7.    2.   10.5  10.5   9. ]
 [-30.   -9.  -10.   12.    5. ]]
[[ -8.   -8.5   1.    0.  -11. ]
 [-11.   -1.   -1.   -9.   -7. ]
 [ 10.    9.    9.5  -1.5 -14. ]]


In [20]:
import torch

x = torch.tensor([1.0,2.0], requires_grad=True)

def fun(x):
    return x[0]**2 + 3*x[1]

y = fun(x)

y.backward()

gradient = x.grad
print(gradient)

tensor([2., 3.])


In [28]:
import numpy as np

# 定义函数和梯度
def function(x):
    return x**2 + 3*x

def gradient(x):
    return 2 * x

# 梯度下降算法
def gradient_descent(initial_x, learning_rate, num_iterations):
    x = initial_x
    for i in range(num_iterations):
        grad = gradient(x)  # 计算梯度
        x = x - learning_rate * grad  # 更新参数
        if i % 10 == 0:
            print(f"Iteration {i}: x = {x}, f(x) = {function(x)}")
    return x

# 初始参数
initial_x = 10.0
learning_rate = 0.1
num_iterations = 100
optimal_x = gradient_descent(initial_x, learning_rate, num_iterations)


Iteration 0: x = 8.0, f(x) = 88.0
Iteration 10: x = 0.8589934592000003, f(x) = 3.3148501405483835
Iteration 20: x = 0.09223372036854777, f(x) = 0.28520822027866677
Iteration 30: x = 0.009903520314283045, f(x) = 0.029808640657464552
Iteration 40: x = 0.001063382396627933, f(x) = 0.0031912779720052573
Iteration 50: x = 0.00011417981541647683, f(x) = 0.00034255248327967906
Iteration 60: x = 1.2259964326927117e-05, f(x) = 3.678004328750665e-05
Iteration 70: x = 1.3164036458569655e-06, f(x) = 3.949212670489455e-06
Iteration 80: x = 1.4134776518227082e-07, f(x) = 4.2404331552600316e-07
Iteration 90: x = 1.5177100720513518e-08, f(x) = 4.553130239188494e-08
