绪论：

总结常见的回归算法的评价指标
以及sklearn中的评价指标

## 均方误差MSE(Mean Squared Error)
$$MSE = \frac{1}{m}\sum\limits_{i=1}^m(y^{(i)}-\hat y^{(i)})^2$$
由于存在量纲问题，所以常常采用均方根误差

In [5]:
import numpy as np
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

data = load_boston()
X = data.data
y = data.target
X_train,X_test,y_train,y_test = train_test_split(X,y)

reg =  LinearRegression()
reg.fit(X_train,y_train)
predict = reg.predict(X_test)

In [6]:
MSE = np.sum((predict - y_test)**2)/len(predict)

In [7]:
MSE

19.76819049113384

In [12]:
from sklearn.metrics import mean_squared_error
mean_squared_error(predict,y_test)

19.76819049113384

## 均方根误差RMSE(Root Mean Squared Error)
$$RMSE = \sqrt{\frac{1}{m}\sum\limits_{i=1}^m(y^{(i)}-\hat y^{(i)})^2}=\sqrt{MSE}$$

In [8]:
RMSE = np.sqrt(MSE)

In [13]:
RMSE

4.446143327776764

scikit_learn中没有RMSE，需要求出MSE后自己开方

## 平均绝对误差MAE(Mean Absolute Error)
$$MAE = \frac{1}{m}\sum\limits_{i=1}^m |y^{(i)}- \hat y^{(i)}|$$
为何loss function一把采用平方：
1. 可导
2. 放大较大误差

In [10]:
MAE = np.sum(np.abs(predict - y_test))/len(predict)

In [11]:
MAE

3.2311500100934514

In [14]:
from sklearn.metrics import mean_absolute_error
mean_absolute_error(predict,y_test)

3.2311500100934514

## R Squared(最好的衡量线性回归的指标)
为什么要采用R Squared:
因为已有的MSE，RMSE，MAE的问题在于，无法比较算法在不同场景下的优劣。比如同一个算法即可以预测学生的成绩也可以预测房产，那这个算法是在哪个场景下预测效果更好呢？

$$R^2 = 1-\frac{SS_{residual}}{S}$$
$$R^2 = 1 - \frac{\sum(\hat y - y )^2}{\sum (\bar y - y)^2}$$
$$R^2 = 1-\frac{MSE}{Var}$$

In [16]:
R_squared = 1 - MSE/np.var(y_test)

In [17]:
R_squared

0.7506797717494971

In [18]:
from sklearn.metrics import r2_score
r2_score(y_test,predict)#y_True,y_predict

0.7506797717494971

In [19]:
reg.score(X_test,y_test)

0.7506797717494971

在sklearn中，内置的score就是对应的r2值