# Chapter 4. Multiple Regression Analysis: Inference
[Home](http://solomonegash.com/) | [Stata](http://solomonegash.com/woodridge1/index.html) | [R](http://solomonegash.com/econometrics/rbook1/index.html)

In [1]:
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf

from wooldridge import *

### Example 4.1  Wage equation

In [2]:
df = dataWoo('wage1')
wage_multiple = smf.ols(formula='lwage ~ educ + exper + tenure + 1', data=df).fit()
print(wage_multiple.summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.316
Model:                            OLS   Adj. R-squared:                  0.312
Method:                 Least Squares   F-statistic:                     80.39
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           9.13e-43
Time:                        18:51:19   Log-Likelihood:                -313.55
No. Observations:                 526   AIC:                             635.1
Df Residuals:                     522   BIC:                             652.2
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2844      0.104      2.729      0.0

### Example 4.2. Student performance

In [3]:
df = dataWoo('meap93')
math_lin_lin = smf.ols(formula='math10 ~ totcomp + staff + enroll + 1', data=df).fit()
print(math_lin_lin.summary())

                            OLS Regression Results                            
Dep. Variable:                 math10   R-squared:                       0.054
Model:                            OLS   Adj. R-squared:                  0.047
Method:                 Least Squares   F-statistic:                     7.697
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           5.18e-05
Time:                        18:51:19   Log-Likelihood:                -1526.2
No. Observations:                 408   AIC:                             3060.
Df Residuals:                     404   BIC:                             3076.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      2.2740      6.114      0.372      0.7

In [4]:
math_lin_log = smf.ols(formula='math10 ~ ltotcomp + lstaff + lenroll + 1', data=df).fit()
print(math_lin_log.summary())

                            OLS Regression Results                            
Dep. Variable:                 math10   R-squared:                       0.065
Model:                            OLS   Adj. R-squared:                  0.058
Method:                 Least Squares   F-statistic:                     9.420
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           4.97e-06
Time:                        18:51:19   Log-Likelihood:                -1523.7
No. Observations:                 408   AIC:                             3055.
Df Residuals:                     404   BIC:                             3072.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   -207.6648     48.703     -4.264      0.0

In [5]:
from statsmodels.iolib.summary2 import summary_col

print(summary_col([math_lin_lin,math_lin_log],stars=True,float_format='%0.3f',
                  model_names=['math10\n(Lin_Lin)','math10\n(Lin_Log)'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.3f}".format(x.rsquared)}))


                 math10     math10  
               (Lin_Lin)  (Lin_Log) 
------------------------------------
Intercept      2.274     -207.665***
               (6.114)   (48.703)   
R-squared      0.054     0.065      
R-squared Adj. 0.047     0.058      
enroll         -0.000               
               (0.000)              
lenroll                  -1.268*    
                         (0.693)    
lstaff                   3.980      
                         (4.190)    
ltotcomp                 21.155***  
                         (4.056)    
staff          0.048                
               (0.040)              
totcomp        0.000***             
               (0.000)              
N              408       408        
R2             0.054     0.065      
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01


### Example 4.3. Collage GPA

In [6]:
df = dataWoo('gpa1')
gpa_mols = smf.ols(formula='colGPA ~ hsGPA + ACT + skipped + 1', data=df).fit()
print(gpa_mols.summary())

                            OLS Regression Results                            
Dep. Variable:                 colGPA   R-squared:                       0.234
Model:                            OLS   Adj. R-squared:                  0.217
Method:                 Least Squares   F-statistic:                     13.92
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           5.65e-08
Time:                        18:51:19   Log-Likelihood:                -41.501
No. Observations:                 141   AIC:                             91.00
Df Residuals:                     137   BIC:                             102.8
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      1.3896      0.332      4.191      0.0

### Example 4.4. Campus crime & enrollment

In [7]:
df = dataWoo('campus')
crime_ols = smf.ols(formula='lcrime ~ lenroll + 1', data=df).fit()
print(crime_ols.summary())

                            OLS Regression Results                            
Dep. Variable:                 lcrime   R-squared:                       0.585
Model:                            OLS   Adj. R-squared:                  0.580
Method:                 Least Squares   F-statistic:                     133.8
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           7.83e-20
Time:                        18:51:19   Log-Likelihood:                -125.83
No. Observations:                  97   AIC:                             255.7
Df Residuals:                      95   BIC:                             260.8
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -6.6314      1.034     -6.416      0.0

### Example 4.5. Housing prices

In [8]:
df = dataWoo('hprice2')
ldist=np.log(df.dist)
hprice_mols = smf.ols(formula='lprice ~ lnox +ldist + rooms + stratio + 1', data=df).fit()
print(hprice_mols.summary())

                            OLS Regression Results                            
Dep. Variable:                 lprice   R-squared:                       0.584
Model:                            OLS   Adj. R-squared:                  0.581
Method:                 Least Squares   F-statistic:                     175.9
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           5.53e-94
Time:                        18:51:20   Log-Likelihood:                -43.495
No. Observations:                 506   AIC:                             96.99
Df Residuals:                     501   BIC:                             118.1
Df Model:                           4                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     11.0839      0.318     34.843      0.0

### Example 4.6. Participation rates in 401k plans

In [9]:
df = dataWoo('401k')
pension_multiple = smf.ols(formula='prate ~ mrate + age + totemp + 1', data=df).fit()
print(pension_multiple.summary())

                            OLS Regression Results                            
Dep. Variable:                  prate   R-squared:                       0.100
Model:                            OLS   Adj. R-squared:                  0.098
Method:                 Least Squares   F-statistic:                     56.38
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           1.45e-34
Time:                        18:51:20   Log-Likelihood:                -6416.1
No. Observations:                1534   AIC:                         1.284e+04
Df Residuals:                    1530   BIC:                         1.286e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     80.2941      0.778    103.242      0.0

### Example4.7. Job training (only for the year 1987 and for nonunionized firms)

In [10]:
df = dataWoo('jtrain')
dataWoo('jtrain', description=True)

name of dataset: jtrain
no of variables: 30
no of observations: 471

+----------+---------------------------------+
| variable | label                           |
+----------+---------------------------------+
| year     | 1987, 1988, or 1989             |
| fcode    | firm code number                |
| employ   | # employees at plant            |
| sales    | annual sales, $                 |
| avgsal   | average employee salary         |
| scrap    | scrap rate (per 100 items)      |
| rework   | rework rate (per 100 items)     |
| tothrs   | total hours training            |
| union    | =1 if unionized                 |
| grant    | = 1 if received grant           |
| d89      | = 1 if year = 1989              |
| d88      | = 1 if year = 1988              |
| totrain  | total employees trained         |
| hrsemp   | tothrs/totrain                  |
| lscrap   | log(scrap)                      |
| lemploy  | log(employ)                     |
| lsales   | log(sales)               

In [11]:
df = df[(df['year']==1987) & (df['union']==0)] #regress if year=1987 & union=0

job_multiple = smf.ols(formula='lscrap ~ hrsemp + lsales + lemploy + 1', data=df).fit()
print(job_multiple.summary())


                            OLS Regression Results                            
Dep. Variable:                 lscrap   R-squared:                       0.262
Model:                            OLS   Adj. R-squared:                  0.174
Method:                 Least Squares   F-statistic:                     2.965
Date:                Sun, 30 Jun 2024   Prob (F-statistic):             0.0513
Time:                        18:51:20   Log-Likelihood:                -48.254
No. Observations:                  29   AIC:                             104.5
Df Residuals:                      25   BIC:                             110.0
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     12.4584      5.687      2.191      0.0

### Example 4.8. RD and Sales

In [12]:
df = dataWoo('rdchem')
lrd_ols = smf.ols(formula='lrd ~ lsales + profmarg + 1', data=df).fit()
print(lrd_ols.summary())

                            OLS Regression Results                            
Dep. Variable:                    lrd   R-squared:                       0.918
Model:                            OLS   Adj. R-squared:                  0.912
Method:                 Least Squares   F-statistic:                     162.2
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           1.79e-16
Time:                        18:51:20   Log-Likelihood:                -22.511
No. Observations:                  32   AIC:                             51.02
Df Residuals:                      29   BIC:                             55.42
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -4.3783      0.468     -9.355      0.0

### Example 4.9. Parent's education on birth weight

In [13]:
df = dataWoo('bwght')
birthw_ols = smf.ols(formula='bwght ~ cigs + parity + faminc + motheduc + fatheduc + 1', data=df).fit()
print(birthw_ols.summary())

                            OLS Regression Results                            
Dep. Variable:                  bwght   R-squared:                       0.039
Model:                            OLS   Adj. R-squared:                  0.035
Method:                 Least Squares   F-statistic:                     9.553
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           5.99e-09
Time:                        18:51:20   Log-Likelihood:                -5242.2
No. Observations:                1191   AIC:                         1.050e+04
Df Residuals:                    1185   BIC:                         1.053e+04
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    114.5243      3.728     30.716      0.0

In [14]:
birthw_ols_r = smf.ols(formula='bwght ~ cigs + parity + faminc + 1', data=df).fit()
print(birthw_ols_r.summary())

                            OLS Regression Results                            
Dep. Variable:                  bwght   R-squared:                       0.035
Model:                            OLS   Adj. R-squared:                  0.033
Method:                 Least Squares   F-statistic:                     16.63
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           1.28e-10
Time:                        18:51:20   Log-Likelihood:                -6126.8
No. Observations:                1388   AIC:                         1.226e+04
Df Residuals:                    1384   BIC:                         1.228e+04
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept    114.2143      1.469     77.734      0.0

In [15]:
import statsmodels.stats as ss

ss.anova.anova_lm(birthw_ols_r, birthw_ols)

Unnamed: 0,df_resid,ssr,df_diff,ss_diff,F,Pr(>F)
0,1384.0,554615.198655,0.0,,,
1,1185.0,464041.13513,199.0,90574.063525,1.162285,0.075222


### Exaploring further example 4.5. 

In [16]:
df = dataWoo('attend')
attend_ols_r = smf.ols(formula='atndrte ~ priGPA + 1', data=df).fit()
print(attend_ols_r.summary())

                            OLS Regression Results                            
Dep. Variable:                atndrte   R-squared:                       0.182
Model:                            OLS   Adj. R-squared:                  0.181
Method:                 Least Squares   F-statistic:                     151.3
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           1.54e-31
Time:                        18:51:20   Log-Likelihood:                -2824.3
No. Observations:                 680   AIC:                             5653.
Df Residuals:                     678   BIC:                             5662.
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     47.1270      2.873     16.406      0.0

In [17]:
attend_ols = smf.ols(formula='atndrte ~ priGPA + ACT + 1', data=df).fit()
print(attend_ols.summary())

                            OLS Regression Results                            
Dep. Variable:                atndrte   R-squared:                       0.291
Model:                            OLS   Adj. R-squared:                  0.288
Method:                 Least Squares   F-statistic:                     138.7
Date:                Sun, 30 Jun 2024   Prob (F-statistic):           3.39e-51
Time:                        18:51:20   Log-Likelihood:                -2776.1
No. Observations:                 680   AIC:                             5558.
Df Residuals:                     677   BIC:                             5572.
Df Model:                           2                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     75.7004      3.884     19.490      0.0

In [18]:
print(summary_col([attend_ols_r, attend_ols], stars=True,float_format='%0.3f',
                  model_names=['attend_ols_r','attend_ols'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.3f}".format(x.rsquared)}))


               attend_ols_r attend_ols
--------------------------------------
ACT                         -1.717*** 
                            (0.169)   
Intercept      47.127***    75.700*** 
               (2.873)      (3.884)   
R-squared      0.182        0.291     
R-squared Adj. 0.181        0.288     
priGPA         13.369***    17.261*** 
               (1.087)      (1.083)   
N              680          680       
R2             0.182        0.291     
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01


### Example4.10. Salary-pension tradeoff for teachers

In [19]:
df = dataWoo('meap93')

meap_ols1 = smf.ols(formula='lsalary ~ bensal + 1', data=df).fit()
meap_ols2 = smf.ols(formula='lsalary ~ bensal + lenroll + lstaff + 1', data=df).fit()
meap_ols3 = smf.ols(formula='lsalary ~ bensal + lenroll + lstaff + droprate + gradrate + 1', data=df).fit()

print(summary_col([meap_ols1, meap_ols2, meap_ols3], stars=True,float_format='%0.2f',
                  model_names=['meap_ols1','meap_ols2', 'meap_ols3'],
                  info_dict={'N':lambda x: "{0:d}".format(int(x.nobs)),
                             'R2':lambda x: "{:.2f}".format(x.rsquared)}))


               meap_ols1 meap_ols2 meap_ols3
--------------------------------------------
Intercept      10.52***  10.84***  10.74*** 
               (0.04)    (0.25)    (0.26)   
R-squared      0.04      0.35      0.36     
R-squared Adj. 0.04      0.35      0.35     
bensal         -0.83***  -0.60***  -0.59*** 
               (0.20)    (0.17)    (0.16)   
droprate                           -0.00    
                                   (0.00)   
gradrate                           0.00     
                                   (0.00)   
lenroll                  0.09***   0.09***  
                         (0.01)    (0.01)   
lstaff                   -0.22***  -0.22*** 
                         (0.05)    (0.05)   
N              408       408       408      
R2             0.04      0.35      0.36     
Standard errors in parentheses.
* p<.1, ** p<.05, ***p<.01
