# 最好的衡量线性回归法的指标R²
上面小结看得出MSE、RMSE、MAE都无法很直观的看出一个合理范围的值，所以我们引入了R²，将分类准确度统一在0到1之间：0最差，1最好

1. 直接给出计算指标公式：$R^2 = 1 - \frac{SS_{residual}}{SS_{total}}$
2. 一般$SS_{residual}$解释为：Residual Sum of Squares，暨$SS_{residual} = \sum_{i=1}^{m}(y^i - y^i_h) ^ 2$
3. $SS_{total}$解释为：Total Sum of Squares，暨$SS_{total} = \sum_{i=1}^m(y_{mean} - y^i) ^ 2$
$$ R^2 = 1 - \frac {\sum_{i=1}^{m}(y^i - y^i_h) ^ 2}{\sum_{i=1}^m(y_{mean} - y^i) ^ 2}$$
4. R²的意义是：对于$SS_{residual}$来说，我们可以理解成我们预测模型计算出来产生的错误
5. 分母$SS_{total}$可以理解成基本错误，暨$y = y_{mean}$，记作Baseline Model，这个错误是比较多的预测
6. $R^2 <= 1$
7. $R ^ 2$越大越好，当且仅当预测模型没有任何错误的时候，$R^2==1$
8. $R^2 == 0$说明我们的模型等于baselin，训练无效
9. $R^2 < 0$说明我们学习到的模型还不如基本模型，此时可以判断得出我们的数据不是线性数据
10. 公式中分子分母同时除以m，可以得到变形公式：$$ R^2 = 1 - \frac {{\sum_{i=1}^{m}(y^i - y^i_h) ^ 2}\div{m}}{{\sum_{i=1}^m(y_{mean} - y^i) ^ 2} \div {m}}$$
11. 我么可以看出：${\sum_{i=1}^{m}(y^i - y^i_h) ^ 2}\div{m}$就是MSE
12. ${\sum_{i=1}^m(y_{mean} - y^i) ^ 2} \div {m}$就是方差
13. 暨$R ^ 2 = 1 - \frac{MSE(y^{h},y)}{var(y)}$
14. $R^2$具有统计意义的

In [None]:

from sklearn.datasets import load_boston
from moon.model_selection import train_test_split
from moon.linear_model import LinearRegression
from moon import metrics
import numpy as np

boston = load_boston()
X = boston.data[:, 5]  # average number of rooms per dwelling
y = boston.target
X = X[y < 50]
y = y[y < 50]
X_train, X_test, y_train, y_test = train_test_split(X, y)

regression = LinearRegression()
regression.fit(X_train, y_train)
y_predict = regression.predict(X_test)

mse = metrics.mean_squared_error(y_test, y_predict)
var_y = np.var(y_test)

r_s = 1 - mse / var_y
r_s

In [2]:
r_s = metrics.r2_score(y_test, y_predict)

r_s

0.4254103857097776

In [None]:
from sklearn.metrics import r2_score
r2_score(y_test,y_predict)