# 线性回归中使用梯度下降法(向量化)
> 方法使用上一章我们封装好的梯度下降法，我们进一步进行了向量化提取和表示，如下图

![J的导数dJ的向量化简化1](images/J的导数dJ的向量化简化1.png)

> 下面是线性代数的向量化操作

![J的导数dJ的向量化简化2](images/J的导数dJ的向量化简化2.png)

> 最终的损失函数J的导数dJ的向量化表示

![最终的损失函数J的导数dJ的向量化表示](images/最终的损失函数J的导数dJ的向量化表示.png)

> 下面是具体的代码实现

In [1]:
import numpy as np
from sklearn import datasets

In [2]:
boston = datasets.load_boston()

In [3]:
X = boston.data
y = boston.target

X = X[y < 50.0]
y = y[y < 50.0]
print(X.shape)
print(y.shape)

(490, 13)
(490,)


In [4]:
from playML.model_selection import train_test_split

In [5]:
X_train, X_test, y_train, y_test  = train_test_split(X, y, seed=666)

## 使用常规方法

In [6]:
from playML.LinearRegression import LinearRegression

In [7]:
lin_reg1 = LinearRegression()

In [8]:
%time lin_reg1.fit_normal(X_train, y_train)

Wall time: 321 ms


LinearRegression()

In [9]:
lin_reg1.score(X_test, y_test)

0.8129794056212907

## 使用梯度下降法(向量化)

In [10]:
lin_reg2 = LinearRegression()

In [11]:
lin_reg2.fit_gd(X_train, y_train) # 因为数据集的偏差很大，所以会导致结果不收敛

  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
  return np.sum((y - X_b.dot(theta)) ** 2) / len(y)
  if (abs(J(theta, X_b, y) - J(last_theta, X_b, y)) < epsilon):


LinearRegression()

In [12]:
lin_reg2.fit_gd(X_train, y_train, eta=0.000001) # eta太小，结果的准确率很低

LinearRegression()

In [13]:
lin_reg2.score(X_test, y_test) # 可能陷入了局部最优解，最好多循环一些

0.27586818724477224

In [14]:
%time lin_reg2.fit_gd(X_train, y_train, eta=0.000001, n_iters=1e6)

Wall time: 22.7 s


LinearRegression()

In [15]:
lin_reg2.score(X_test, y_test)

0.7542932581943915

## 使用梯度下降法之前，最好要先进行数据归一化(强烈推荐)
![使用梯度下降法之前最好要先进行数据归一化](images/使用梯度下降法之前最好要先进行数据归一化.png)

In [16]:
from sklearn.preprocessing import StandardScaler

In [17]:
standardScaler = StandardScaler()

In [18]:
standardScaler.fit(X_train)

StandardScaler(copy=True, with_mean=True, with_std=True)

In [19]:
X_train_standard = standardScaler.transform(X_train)

In [20]:
lin_reg3 = LinearRegression()

In [21]:
%time lin_reg3 = lin_reg3.fit_gd(X_train_standard, y_train) # 归一化后解决了算法的方差大的问题，不用再循环多次了(n_iters)

Wall time: 104 ms


In [22]:
X_test_standard = standardScaler.transform(X_test)

In [23]:
lin_reg3.score(X_test_standard, y_test)

0.8129873310487505

## 梯度下降法的优势

> 数据量越大，梯度下降法相对于线性回归法的耗时优势越大

In [None]:
import numpy as np 
m = 1000
n = 5000

big_X = np.random.normal(size=(m, n)) # m行代表样本数, n列代表特征数，每一行代表一个多元线性回归方程
true_theta = np.random.uniform(0.0, 100.0, size=n+1) # 生成n+1个0~100的theta，即多元线性方程的系数
big_y = big_X.dot(true_theta[1:]) + true_theta[0] + np.random.normal(0., 10.0, size=m) # 利用矩阵运算法计算出预测值y_hat

In [None]:
# 先使用传统的方法进行训练
big_reg1 = LinearRegression()
%time big_reg1.fit_normal(big_X, big_y)

In [None]:
big_reg2 = LinearRegression()
%time big_reg2.fit_gd(big_X, big_y)

> 梯度下降法的缺点：每一个样本都参与了运算，这使得当样本数较大时，计算梯度也会很慢，所以有了下一节的`随机梯度下降法`