# Chapter 17. Limited Dependent Variable Models and Sample Selection
[Home](http://solomonegash.com/) | [Stata](http://solomonegash.com/woodridge1/index.html) | [R](http://solomonegash.com/econometrics/rbook1/index.html)


In [1]:
import statsmodels
import statsmodels.api as sm
import statsmodels.formula.api as smf
from statsmodels.iolib.summary2 import summary_col

from wooldridge import *

### Example 17.1. Married Women’s Labor Force Participation

In [2]:
df = dataWoo('mroz')
print(smf.ols('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                   inlf   R-squared:                       0.264
Model:                            OLS   Adj. R-squared:                  0.257
Method:                 Least Squares   F-statistic:                     38.22
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           6.90e-46
Time:                        18:19:47   Log-Likelihood:                -423.89
No. Observations:                 753   AIC:                             863.8
Df Residuals:                     745   BIC:                             900.8
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5855      0.154      3.798      0.0

In [3]:
mLogit = sm.Logit.from_formula('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit()
print(mLogit.summary())
print(mLogit.get_margeff().summary())

Optimization terminated successfully.
         Current function value: 0.533553
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:                   inlf   No. Observations:                  753
Model:                          Logit   Df Residuals:                      745
Method:                           MLE   Df Model:                            7
Date:                Tue, 02 Jul 2024   Pseudo R-squ.:                  0.2197
Time:                        18:19:47   Log-Likelihood:                -401.77
converged:                       True   LL-Null:                       -514.87
Covariance Type:            nonrobust   LLR p-value:                 3.159e-45
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.4255      0.860      0.494      0.621      -1.261       2.112
nwifeinc      -0.0213      0.

In [4]:
mProbit = sm.Probit.from_formula('inlf ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit()
print(mProbit.summary())
print(mProbit.get_margeff().summary())

Optimization terminated successfully.
         Current function value: 0.532938
         Iterations 5
                          Probit Regression Results                           
Dep. Variable:                   inlf   No. Observations:                  753
Model:                         Probit   Df Residuals:                      745
Method:                           MLE   Df Model:                            7
Date:                Tue, 02 Jul 2024   Pseudo R-squ.:                  0.2206
Time:                        18:19:47   Log-Likelihood:                -401.30
converged:                       True   LL-Null:                       -514.87
Covariance Type:            nonrobust   LLR p-value:                 2.009e-45
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.2701      0.509      0.531      0.595      -0.727       1.267
nwifeinc      -0.0120      0.

### Example 17.2. Married Women’s Annual Labor Supply

In [5]:
print(smf.ols('hours ~ nwifeinc + educ + exper + expersq + age + kidslt6 + kidsge6', data = df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                  hours   R-squared:                       0.266
Model:                            OLS   Adj. R-squared:                  0.259
Method:                 Least Squares   F-statistic:                     38.50
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           3.42e-46
Time:                        18:19:47   Log-Likelihood:                -6049.5
No. Observations:                 753   AIC:                         1.212e+04
Df Residuals:                     745   BIC:                         1.215e+04
Df Model:                           7                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept   1330.4824    270.785      4.913      0.0

### Tobit model ?? 

In [6]:
df = dataWoo("mroz").dropna()
X = df[['nwifeinc' , 'educ' , 'exper' , 'expersq' , 'age' , 'kidslt6' , 'kidsge6']]
X = sm.add_constant(X)
y = df[['hours']]

### Example 17.3. Poisson Regression for Number of Arrests

In [7]:
df = dataWoo("crime1")
print(smf.ols('narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + inc86 + black + hispan + born60', data=df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                 narr86   R-squared:                       0.072
Model:                            OLS   Adj. R-squared:                  0.069
Method:                 Least Squares   F-statistic:                     23.57
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           3.72e-39
Time:                        18:19:47   Log-Likelihood:                -3349.7
No. Observations:                2725   AIC:                             6719.
Df Residuals:                    2715   BIC:                             6778.
Df Model:                           9                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept      0.5766      0.038     15.215      0.0

In [8]:
from statsmodels.genmod.generalized_estimating_equations import GEE
from statsmodels.genmod.cov_struct import (Exchangeable,
    Independence,Autoregressive)
from statsmodels.genmod.families import Poisson

print(GEE.from_formula('narr86 ~ pcnv + avgsen + tottime + ptime86 + qemp86 + inc86 + black + hispan +born60', 'black', data=df, cov_struct=Independence(), family=Poisson()).fit().summary())

                               GEE Regression Results                              
Dep. Variable:                      narr86   No. Observations:                 2725
Model:                                 GEE   No. clusters:                        2
Method:                        Generalized   Min. cluster size:                 439
                      Estimating Equations   Max. cluster size:                2286
Family:                            Poisson   Mean cluster size:              1362.5
Dependence structure:         Independence   Num. iterations:                     2
Date:                     Tue, 02 Jul 2024   Scale:                           1.000
Covariance type:                    robust   Time:                         18:19:47
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.5996      0.007    -81.445      0.000      -0.614      -0.585
pcnv   

### Example 17.5. Wage Offer Equation for Married Women

In [9]:
df = dataWoo("mroz")
print(smf.ols('lwage ~ educ + exper + expersq', data=df).fit().summary())

                            OLS Regression Results                            
Dep. Variable:                  lwage   R-squared:                       0.157
Model:                            OLS   Adj. R-squared:                  0.151
Method:                 Least Squares   F-statistic:                     26.29
Date:                Tue, 02 Jul 2024   Prob (F-statistic):           1.30e-15
Time:                        18:19:47   Log-Likelihood:                -431.60
No. Observations:                 428   AIC:                             871.2
Df Residuals:                     424   BIC:                             887.4
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -0.5220      0.199     -2.628      0.0