# Chapter 3. Multiple Regression Analysisl
[Home](http://solomonegash.com/) | [Stata](http://solomonegash.com/woodridge1/index.html) | [R](http://solomonegash.com/econometrics/rbook1/index.html)

In [1]:
import statsmodels.api as sm
import statsmodels.formula.api as smf

from wooldridge import *

### Example3.1  Determinants of College GPA

In [2]:
dataWoo()

  J.M. Wooldridge (2016) Introductory Econometrics: A Modern Approach,
  Cengage Learning, 6th edition.

  401k       401ksubs    admnrev       affairs     airfare
  alcohol    apple       approval      athlet1     athlet2
  attend     audit       barium        beauty      benefits
  beveridge  big9salary  bwght         bwght2      campus
  card       catholic    cement        census2000  ceosal1
  ceosal2    charity     consump       corn        countymurders
  cps78_85   cps91       crime1        crime2      crime3
  crime4     discrim     driving       earns       econmath
  elem94_95  engin       expendshares  ezanders    ezunem
  fair       fertil1     fertil2       fertil3     fish
  fringe     gpa1        gpa2          gpa3        happiness
  hprice1    hprice2     hprice3       hseinv      htv
  infmrt     injury      intdef        intqrt      inven
  jtrain     jtrain2     jtrain3       kielmc      lawsch85
  loanapp    lowbrth     mathpnl       meap00_01   meap01
  meap93    

In [3]:
df = dataWoo('gpa1')
dataWoo('gpa1', description=True)

name of dataset: gpa1
no of variables: 29
no of observations: 141

+----------+--------------------------------+
| variable | label                          |
+----------+--------------------------------+
| age      | in years                       |
| soph     | =1 if sophomore                |
| junior   | =1 if junior                   |
| senior   | =1 if senior                   |
| senior5  | =1 if fifth year senior        |
| male     | =1 if male                     |
| campus   | =1 if live on campus           |
| business | =1 if business major           |
| engineer | =1 if engineering major        |
| colGPA   | MSU GPA                        |
| hsGPA    | high school GPA                |
| ACT      | 'achievement' score            |
| job19    | =1 if job <= 19 hours          |
| job20    | =1 if job >= 20 hours          |
| drive    | =1 if drive to campus          |
| bike     | =1 if bicycle to campus        |
| walk     | =1 if walk to campus           |
| voluntr  | 

In [4]:
gpa_multiple = smf.ols(formula='colGPA ~ hsGPA + ACT + 1', data=df).fit()
print(gpa_multiple.summary())

                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.176
Model:                            OLS   Adj. R-squared:                  0.164
Method:                 Least Squares   F-statistic:                     14.78
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           1.53e-06
Time:                        18:48:18   Log-Likelihood:                -46.573
No. Observations:                 141   AIC:                             99.15
Df Residuals:                     138   BIC:                             108.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.2863      0.341      3.774      0.0

In [5]:
gpa_simple = smf.ols(formula='colGPA ~ACT +1', data=df).fit()
print(gpa_simple.summary())

                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.043
Model:                            OLS   Adj. R-squared:                  0.036
Method:                 Least Squares   F-statistic:                     6.207
Date:                Sun, 30 Jun 2024   Prob (F-statistic):             0.0139
Time:                        18:48:18   Log-Likelihood:                -57.177
No. Observations:                 141   AIC:                             118.4
Df Residuals:                     139   BIC:                             124.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.4030      0.264      9.095      0.0

In [6]:
from statsmodels.iolib.summary2 import summary_col

print(summary_col([gpa_multiple,gpa_simple],stars=True,float_format='%0.2f',
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.2f}".format(x.rsquared)}))


               colGPA I colGPA II
---------------------------------
ACT            0.01     0.03**   
               (0.01)   (0.01)   
Intercept      1.29***  2.40***  
               (0.34)   (0.26)   
R-squared      0.18     0.04     
R-squared Adj. 0.16     0.04     
hsGPA          0.45***           
               (0.10)            
N              141      141      
R2             0.18     0.04     
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01


### Example 3.2. Wage equation

In [7]:
df = dataWoo('wage1')
wage_multiple = smf.ols(formula='lwage ~ educ + exper + tenure + 1', data=df).fit()
print(wage_multiple.summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.312
Method:                 Least Squares   F-statistic:                     80.39
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           9.13e-43
Time:                        18:48:18   Log-Likelihood:                -313.55
No. Observations:                 526   AIC:                             635.1
Df Residuals:                     522   BIC:                             652.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2844      0.104      2.729      0.0

### Example 3.3. Participation in 401(k) pension plans

In [8]:
df = dataWoo('401k')
pension_multiple = smf.ols(formula='prate ~ mrate + age + 1', data=df).fit()
print(pension_multiple.summary())

                            OLS Regression Results                            
Dep. Variable:                  prate   R-squared:                       0.092
Model:                            OLS   Adj. R-squared:                  0.091
Method:                 Least Squares   F-statistic:                     77.79
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           6.67e-33
Time:                        18:48:18   Log-Likelihood:                -6422.3
No. Observations:                1534   AIC:                         1.285e+04
Df Residuals:                    1531   BIC:                         1.287e+04
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     80.1190      0.779    102.846      0.0

### Example 3.4. Determinants of College GPA, R-squared.

In [9]:
df = dataWoo('gpa1')
gpa_multiple = smf.ols(formula='colGPA ~ hsGPA + ACT + 1', data=df).fit()
print(gpa_multiple.summary())

                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.176
Model:                            OLS   Adj. R-squared:                  0.164
Method:                 Least Squares   F-statistic:                     14.78
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           1.53e-06
Time:                        18:48:18   Log-Likelihood:                -46.573
No. Observations:                 141   AIC:                             99.15
Df Residuals:                     138   BIC:                             108.0
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.2863      0.341      3.774      0.0

### Example3.5 Arrest records

In [10]:
df = dataWoo('crime1')
crime_multiple = smf.ols(formula='narr86 ~ pcnv + ptime86 + qemp86 + 1', data=df).fit()
print(crime_multiple.summary())

                            OLS Regression Results                            
Dep. Variable:                 narr86   R-squared:                       0.041
Model:                            OLS   Adj. R-squared:                  0.040
Method:                 Least Squares   F-statistic:                     39.10
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           9.91e-25
Time:                        18:48:18   Log-Likelihood:                -3394.7
No. Observations:                2725   AIC:                             6797.
Df Residuals:                    2721   BIC:                             6821.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7118      0.033     21.565      0.0

In [11]:
crime_multiple_2 = smf.ols(formula='narr86 ~ avgsen + pcnv + ptime86 + qemp86 + 1', data=df).fit()
print(crime_multiple_2.summary())

                            OLS Regression Results                            
Dep. Variable:                 narr86   R-squared:                       0.042
Model:                            OLS   Adj. R-squared:                  0.041
Method:                 Least Squares   F-statistic:                     29.96
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           2.01e-24
Time:                        18:48:18   Log-Likelihood:                -3393.5
No. Observations:                2725   AIC:                             6797.
Df Residuals:                    2720   BIC:                             6826.
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.7068      0.033     21.319      0.0

In [12]:
print(summary_col([crime_multiple,crime_multiple_2],stars=True,float_format='%0.2f',
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.2f}".format(x.rsquared)}))


               narr86 I narr86 II
---------------------------------
Intercept      0.71***  0.71***  
               (0.03)   (0.03)   
R-squared      0.04     0.04     
R-squared Adj. 0.04     0.04     
avgsen                  0.01     
                        (0.00)   
pcnv           -0.15*** -0.15*** 
               (0.04)   (0.04)   
ptime86        -0.03*** -0.04*** 
               (0.01)   (0.01)   
qemp86         -0.10*** -0.10*** 
               (0.01)   (0.01)   
N              2725     2725     
R2             0.04     0.04     
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01


### Example 3.6 Wage equation

In [13]:
df = dataWoo('wage1')
wage_simple = smf.ols(formula='lwage ~ educ + 1', data=df).fit()
print(wage_simple.summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.186
Model:                            OLS   Adj. R-squared:                  0.184
Method:                 Least Squares   F-statistic:                     119.6
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           3.27e-25
Time:                        18:48:18   Log-Likelihood:                -359.38
No. Observations:                 526   AIC:                             722.8
Df Residuals:                     524   BIC:                             731.3
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5838      0.097      5.998      0.0