---
> 逐步回归是从众多变量中有效地挑选重要变量的方法

#### 1. 前进法
每次引入一个使得统计量 $F$ 最大的自变量，直到所有未被引入方程的自变量的 $F$ 值均小于 $F_\alpha(1, n-p-1)$，$p$ 为选入变量的个数
#### 2. 后退法
以所有自变量作为解释变量拟合方程，每一步都在未通过 $t$ 检验的自变量中选择一个 $\left|t_j\right|$ 值从模型中剔除，直至所有的自变量都通过 $t$ 检验
#### 3. 逐步回归法
基本思想是有进有出。将变量一个一个地引入，每引入一个自变量后，对已进入的变量进行检验，当原引入的变量由于后面变量的引入而变得不再显著时，要将其剔除。引入一个变量或从回归方程中剔除一个变量，为逐步回归的一步，每一步都要进行 $F$ 检验，以确保每次引入新的变量之前回归方程中只包含显著的变量。反复进行此过程，直到既无显著的自变量选入回归方程，又无显著的自变量从回归方程中剔除为止

In [1]:
import numpy as np
import statsmodels.formula.api as smf

data = np.loadtxt('../../10第10章  回归分析/data10_6.txt')
x1, x2, x3, x4, y = data.T
mod_dic = {'x1': x1, 'x2': x2, 'x3': x3, 'x4': x4, 'y': y}
mod1 = smf.ols('y~x1+x2+x3+x4', mod_dic).fit()
print(mod1.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.982
Model:                            OLS   Adj. R-squared:                  0.974
Method:                 Least Squares   F-statistic:                     111.5
Date:                Sun, 21 Aug 2022   Prob (F-statistic):           4.76e-07
Time:                        16:11:15   Log-Likelihood:                -26.918
No. Observations:                  13   AIC:                             63.84
Df Residuals:                       8   BIC:                             66.66
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     62.4054     70.071      0.891      0.3



变量 $x_3$ 对应的统计量 $t_3$ 的绝对值最小，剔除

In [2]:
mod2 = smf.ols('y~x1+x2+x4', mod_dic).fit()
print(mod2.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.982
Model:                            OLS   Adj. R-squared:                  0.976
Method:                 Least Squares   F-statistic:                     166.8
Date:                Sun, 21 Aug 2022   Prob (F-statistic):           3.32e-08
Time:                        16:11:15   Log-Likelihood:                -26.933
No. Observations:                  13   AIC:                             61.87
Df Residuals:                       9   BIC:                             64.13
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     71.6483     14.142      5.066      0.0



变量 $x_4$ 对应的统计量 $t_4$ 的绝对值最小，剔除

In [3]:
mod3 = smf.ols('y~x1+x2', mod_dic).fit()
print(mod3.summary())
print("\n残差方差：", mod3.mse_resid)

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.979
Model:                            OLS   Adj. R-squared:                  0.974
Method:                 Least Squares   F-statistic:                     229.5
Date:                Sun, 21 Aug 2022   Prob (F-statistic):           4.41e-09
Time:                        16:11:15   Log-Likelihood:                -28.156
No. Observations:                  13   AIC:                             62.31
Df Residuals:                      10   BIC:                             64.01
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     52.5773      2.286     22.998      0.0

