# 多元线性回归
## 目标：
- 找到一个向量$\theta$，使得损失函数$\hat y^i = \theta_0 + \theta_1X^i_1 + \theta_2X^i_2+...+\theta_nX^i_n$尽可能小
## 寻找：
1. 假设第i行数据的特征向量为：$X^i = (X_0^i,X^i,...,X_n^i)$，且$X_0^i ≡ 1$
2. 根据损失函数，我们可以归纳出$\hat y^i = X^i · \theta$
3. $\hat y = X_b · \theta$
4. $X_b = \left[
\begin{matrix}
1 & X_1^1 & ... & X_1^n \\
1 & X_1^2 & ... & X_2^n \\
. & ..... & ... & ..... \\
1 & X_1^m & ... & X_m^n \\
\end{matrix} \right]$
5. 最后可以得出结论公式：损失函数$(y - X_b·\theta)^T(y - X_b·\theta)$
6. 根据上式可以推导出多元线性回归的**正规方程解**：$\theta = (X^T_bX_b)^{-1}X^T_by$
7. 缺点：时间复杂度非常高
8. 优点：不需要数据归一化处理，因为数据是否统一都无所谓

In [1]:
# 引入boston数据
from sklearn.datasets import load_boston

boston = load_boston()
X = boston.data
y = boston.target
X = X[y < 50.0]
y = y[y < 50.0]

In [2]:

from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X, y, random_state=666)

## 构建$X_b$

In [3]:
import numpy as np
X_b = np.hstack([np.ones((len(X_train), 1)), X_train])


In [4]:
X_b

array([[1.00000e+00, 9.25200e-02, 3.00000e+01, ..., 1.66000e+01,
        3.83780e+02, 7.37000e+00],
       [1.00000e+00, 8.66400e-02, 4.50000e+01, ..., 1.52000e+01,
        3.90490e+02, 2.87000e+00],
       [1.00000e+00, 2.87500e-02, 2.80000e+01, ..., 1.82000e+01,
        3.96330e+02, 6.21000e+00],
       ...,
       [1.00000e+00, 1.06718e+01, 0.00000e+00, ..., 2.02000e+01,
        4.30600e+01, 2.39800e+01],
       [1.00000e+00, 9.59571e+00, 0.00000e+00, ..., 2.02000e+01,
        3.76110e+02, 2.03100e+01],
       [1.00000e+00, 2.06080e-01, 2.20000e+01, ..., 1.91000e+01,
        3.72490e+02, 1.25000e+01]])

In [5]:
X.shape

(490, 13)

In [6]:
X_b.shape

(367, 14)

In [7]:
from moon.linear_model import MultipleLinearRegression

regression = MultipleLinearRegression()
regression.fit_normal(X_train,y_train)
regression.score(X_test,y_test)

0.8009390227581066