### AI-05 Quiz  

#### Import libraries  

In [15]:
import pandas as pd
import statsmodels.api as sm
from sklearn import preprocessing

#### Read CSV file  

In [16]:
csv_in = 'reg1.csv'
df = pd.read_csv(csv_in, delimiter=',', skiprows=0, header=0)
print(df.shape)
print(df.info())
display(df.head())

(50, 5)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   x1      50 non-null     float64
 1   x2      50 non-null     float64
 2   x3      50 non-null     float64
 3   x4      50 non-null     float64
 4   y       50 non-null     float64
dtypes: float64(5)
memory usage: 2.1 KB
None


Unnamed: 0,x1,x2,x3,x4,y
0,6.7,3.1,3.1,3.6,-17.3
1,4.8,3.4,9.1,2.0,27.7
2,4.4,3.7,4.5,1.0,3.5
3,5.7,3.8,3.2,0.7,14.8
4,5.3,0.9,8.0,2.6,12.8


#### Separate explanatory variables and objective variable  
説明変数と目的変数を分ける  

In [17]:
X = df.loc[:, 'x1':'x4']  # explanatory variables
y = df['y']  # objective variable
print('X:', X.shape)
display(X.head())
print('y:', y.shape)
print(y.head())

X: (50, 4)


Unnamed: 0,x1,x2,x3,x4
0,6.7,3.1,3.1,3.6
1,4.8,3.4,9.1,2.0
2,4.4,3.7,4.5,1.0
3,5.7,3.8,3.2,0.7
4,5.3,0.9,8.0,2.6


y: (50,)
0   -17.3
1    27.7
2     3.5
3    14.8
4    12.8
Name: y, dtype: float64


#### MLR calculation using all variables  
全説明変数を用いて、標準化なしで線形重回帰分析  

In [18]:
X_c = sm.add_constant(X)
model = sm.OLS(y, X_c)
results = model.fit()
print(results.summary())

                            OLS Regression Results                            
Dep. Variable:                      y   R-squared:                       0.854
Model:                            OLS   Adj. R-squared:                  0.841
Method:                 Least Squares   F-statistic:                     65.66
Date:                Sat, 03 Apr 2021   Prob (F-statistic):           3.33e-18
Time:                        23:20:39   Log-Likelihood:                -178.20
No. Observations:                  50   AIC:                             366.4
Df Residuals:                      45   BIC:                             376.0
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         19.1721      7.322      2.618      0.0

#### Adj R2 and partial regression coefficients   
自由度調整済み決定係数と偏回帰係数  

In [19]:
print('Adj R2:', results.rsquared_adj)
print(results.params)

Adj R2: 0.8407195895168194
const    19.172142
x1       -5.274581
x2        1.546305
x3        3.489137
x4       -4.393240
dtype: float64


**Ans.1: Adj_R2:0.841 (Rather Good)**  
**Ans.2: x2:1.55 (increased)**   

#### Standardization of variables  
Compare coefficients for explanatory variables  
全説明変数と目的変数を標準化して線形重回帰分析  
得られた標準化偏回帰係数を比較すると、各説明変数の目的変数に対する影響の大きさがわかる  

In [20]:
# NOTE: after scaling, X_scaled and Y_scaled are ndarray, not DataFrame.
df_X_scaled = pd.DataFrame(preprocessing.scale(X), columns=X.columns)
ser_y_scaled = pd.Series(preprocessing.scale(y), index=y.index)
model_scaled = sm.OLS(ser_y_scaled, df_X_scaled)
results_scaled = model_scaled.fit()
print(results_scaled.summary())

                                 OLS Regression Results                                
Dep. Variable:                      y   R-squared (uncentered):                   0.854
Model:                            OLS   Adj. R-squared (uncentered):              0.841
Method:                 Least Squares   F-statistic:                              67.12
Date:                Sat, 03 Apr 2021   Prob (F-statistic):                    1.30e-18
Time:                        23:20:39   Log-Likelihood:                         -22.891
No. Observations:                  50   AIC:                                      53.78
Df Residuals:                      46   BIC:                                      61.43
Df Model:                           4                                                  
Covariance Type:            nonrobust                                                  
                 coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------

#### Partial regression coefficients   
偏回帰係数  

In [21]:
print(results_scaled.params)

x1   -0.236746
x2    0.133598
x3    0.507625
x4   -0.661768
dtype: float64


**Ans.3 x4**  
**x4 (negative) > x3 (positive) > x1 (negative) > x2 (positive)**  