<div align='center'>多元回归分析(MLR)</div>

In [1]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import statsmodels.formula.api as smf

from scipy import stats

import warnings

warnings.filterwarnings('ignore')
    
%matplotlib inline

plt.style.use('ggplot')

In [2]:
# 贾俊平: 多元回归章节
db = '/home/lidong/Datasets/'
bad_loans_df = pd.read_excel(os.path.join(db, "Statistics/bad-loans.xls"), usecols="B:F")
bad_loans_df[-5:]

Unnamed: 0,不良贷款 (亿元）,各项贷款余额 (亿元),本年累计应收贷款 (亿元),贷款项目个数 (个),本年固定资产投资额 (亿元)
20,11.6,368.2,16.8,32,163.9
21,1.6,95.7,3.8,10,44.5
22,1.2,109.6,10.3,14,67.9
23,7.2,196.2,15.8,16,39.7
24,3.2,102.2,12.0,10,97.1


In [3]:
# 有问题Buglist
# bad_loans_df.columns
# formula_model = smf.ols(
#     formula='不良贷款\n(亿元）~各项贷款余额\n(亿元)+本年累计应收贷款\n(亿元)+贷款项目个数\n(个)+本年固定资产投资额\n(亿元)',
#     data=bad_loans_df)
# 
# formula_model_result = formula_model.fit()

In [4]:
# 不良贷款 
y = bad_loans_df.iloc[:, 0]
# 各项贷款余额 + 本年累计应收贷款 + 贷款项目个数 + 本年固定资产投资额
x = bad_loans_df.iloc[:, [1,2,3,4]]
x = sm.add_constant(x)

model_result = sm.OLS(endog=y, exog=x).fit()
model_result.summary()

0,1,2,3
Dep. Variable:,不良贷款 (亿元）,R-squared:,0.798
Model:,OLS,Adj. R-squared:,0.757
Method:,Least Squares,F-statistic:,19.7
Date:,"Tue, 25 Dec 2018",Prob (F-statistic):,1.04e-06
Time:,22:02:27,Log-Likelihood:,-47.082
No. Observations:,25,AIC:,104.2
Df Residuals:,20,BIC:,110.3
Df Model:,4,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
const,-1.0216,0.782,-1.306,0.206,-2.654,0.610
各项贷款余额 (亿元),0.0400,0.010,3.837,0.001,0.018,0.062
本年累计应收贷款 (亿元),0.1480,0.079,1.879,0.075,-0.016,0.312
贷款项目个数 (个),0.0145,0.083,0.175,0.863,-0.159,0.188
本年固定资产投资额 (亿元),-0.0292,0.015,-1.937,0.067,-0.061,0.002

0,1,2,3
Omnibus:,0.316,Durbin-Watson:,2.626
Prob(Omnibus):,0.854,Jarque-Bera (JB):,0.442
Skew:,0.22,Prob(JB):,0.802
Kurtosis:,2.52,Cond. No.,352.0


调整后的多重判定系数: Adj. R-squared

$R_\alpha^2 = 1 - (1 - R^2)(\dfrac{n - 1}{n - k -1})$

$R_\alpha^2$为75.7%, 它的意义是, 用样本量和自变量个数对$R^2$调整后, 在Y的变差中, 能被多元回归方程解释的比例为75.7%

-----

观察几个自变量的p值, 只有**贷款余额** 0.001 < 0.005, 说明只有它的影响是显著的, 其他3个自变量对预测**不良贷款**的作用不大.

| 自变量 | P值 |
|:------:|:----:|
|贷款余额 (亿元)      | 0.001  |
|累计应收贷款 (亿元)  | 0.075  |
|贷款项目个数 (个)        | 0.0145 |
|固定资产投资额 (亿元)| 0.067  |

观察**本年固定资产投资额 (亿元)**系数-0.0292, 和实际有些矛盾, 只是由于**多重共线**导致的问题, 可以单独对该自变量做一元回归.

In [5]:
bad_loans_df.columns = ['y', 'x1', 'x2', 'x3', 'x4']
formula_model_fitted = smf.ols(formula='y ~ x4', data=bad_loans_df).fit()
formula_model_fitted.summary()

0,1,2,3
Dep. Variable:,y,R-squared:,0.269
Model:,OLS,Adj. R-squared:,0.237
Method:,Least Squares,F-statistic:,8.458
Date:,"Tue, 25 Dec 2018",Prob (F-statistic):,0.00792
Time:,22:02:28,Log-Likelihood:,-63.137
No. Observations:,25,AIC:,130.3
Df Residuals:,23,BIC:,132.7
Df Model:,1,,
Covariance Type:,nonrobust,,

0,1,2,3,4,5,6
,coef,std err,t,P>|t|,[0.025,0.975]
Intercept,0.9800,1.136,0.863,0.397,-1.370,3.330
x4,0.0466,0.016,2.908,0.008,0.013,0.080

0,1,2,3
Omnibus:,10.58,Durbin-Watson:,2.047
Prob(Omnibus):,0.005,Jarque-Bera (JB):,8.772
Skew:,1.224,Prob(JB):,0.0125
Kurtosis:,4.559,Cond. No.,128.0


通过对**固定资产投资额**单独进行一元回归, 系数为正, 证实**多重共线**对回归有影响(这里对系数的正负符号有影响)