# Modelado Estocástico

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm

Los datos provienen del ejemplo del libro de Wooldridge (página 249).

In [None]:
df = pd.read_stata("MROZ.DTA")
df

Unnamed: 0,inlf,hours,kidslt6,kidsge6,age,educ,wage,repwage,hushrs,husage,...,faminc,mtr,motheduc,fatheduc,unem,city,exper,nwifeinc,lwage,expersq
0,1,1610,1,0,32,12,3.3540,2.65,2708,34,...,16310.0,0.7215,12,7,5.0,0,14,10.910060,1.210154,196
1,1,1656,0,2,30,12,1.3889,2.65,2310,30,...,21800.0,0.6615,7,7,11.0,1,5,19.499981,0.328512,25
2,1,1980,1,3,35,12,4.5455,4.04,3072,40,...,21040.0,0.6915,12,7,5.0,0,15,12.039910,1.514138,225
3,1,456,0,3,34,12,1.0965,3.25,1920,53,...,7300.0,0.7815,7,7,5.0,0,6,6.799996,0.092123,36
4,1,1568,1,2,31,14,4.5918,3.60,2000,32,...,27300.0,0.6215,12,14,9.5,1,7,20.100058,1.524272,49
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
748,0,0,0,2,40,13,,0.00,3020,43,...,28200.0,0.6215,10,10,9.5,1,5,28.200001,,25
749,0,0,2,3,31,12,,0.00,2056,33,...,10000.0,0.7715,12,12,7.5,0,14,10.000000,,196
750,0,0,0,0,43,12,,0.00,2383,43,...,9952.0,0.7515,10,3,7.5,0,4,9.952000,,16
751,0,0,0,0,60,12,,0.00,1705,55,...,24984.0,0.6215,12,12,14.0,1,15,24.983999,,225


## PROBIT

In [None]:
y = df['inlf']
X = df[['educ','nwifeinc','exper','expersq','age','kidslt6','kidsge6']]
X = sm.add_constant(X)

probit_mod = sm.Probit(y, X)
probit_res = probit_mod.fit()
print(probit_res.summary())

Optimization terminated successfully.
         Current function value: 0.532938
         Iterations 5
                          Probit Regression Results                           
Dep. Variable:                   inlf   No. Observations:                  753
Model:                         Probit   Df Residuals:                      745
Method:                           MLE   Df Model:                            7
Date:                Sat, 16 Aug 2025   Pseudo R-squ.:                  0.2206
Time:                        14:46:51   Log-Likelihood:                -401.30
converged:                       True   LL-Null:                       -514.87
Covariance Type:            nonrobust   LLR p-value:                 2.009e-45
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.2701      0.509      0.531      0.595      -0.727       1.267
educ           0.1309      0.

### Efectos marginales

Para obtener los efectos marginales vamos a utilizar `get_margeff`. El parámetro `at` nos indica:

`overall`, The average of the marginal effects at each observation.

`mean`, The marginal effects at the mean of each regressor.

`median`, The marginal effects at the median of each regressor.

`zero`, The marginal effects at zero for each regressor.

`all`, The marginal effects at each observation. If at is all only margeff will be available from the returned object.

Efectos marginales evaluados en la media de las variables explicativas:

In [None]:
mfx_mem = probit_res.get_margeff(at='mean')
print(mfx_mem.summary())

       Probit Marginal Effects       
Dep. Variable:                   inlf
Method:                          dydx
At:                              mean
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
educ           0.0511      0.010      5.186      0.000       0.032       0.070
nwifeinc      -0.0047      0.002     -2.484      0.013      -0.008      -0.001
exper          0.0482      0.007      6.575      0.000       0.034       0.063
expersq       -0.0007      0.000     -3.141      0.002      -0.001      -0.000
age           -0.0206      0.003     -6.241      0.000      -0.027      -0.014
kidslt6       -0.3392      0.046     -7.316      0.000      -0.430      -0.248
kidsge6        0.0141      0.017      0.828      0.408      -0.019       0.047


### Average Marginal Effects (AME). Promedio de los efectos marginales de cada observación.

In [None]:
mfx_ame = probit_res.get_margeff(at='overall')
print(mfx_ame.summary())

       Probit Marginal Effects       
Dep. Variable:                   inlf
Method:                          dydx
At:                           overall
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
educ           0.0394      0.007      5.452      0.000       0.025       0.054
nwifeinc      -0.0036      0.001     -2.509      0.012      -0.006      -0.001
exper          0.0371      0.005      7.200      0.000       0.027       0.047
expersq       -0.0006      0.000     -3.205      0.001      -0.001      -0.000
age           -0.0159      0.002     -6.739      0.000      -0.021      -0.011
kidslt6       -0.2612      0.032     -8.197      0.000      -0.324      -0.199
kidsge6        0.0108      0.013      0.829      0.407      -0.015       0.036


Índice lineal

$x\beta = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_k x_k$

In [None]:
xb_probit = probit_res.predict(which='linear')

Aplicamos la función de distribución acumulada normal estándar ($\phi$) sobre el índice lineal

$P(y = 1|x) = \phi(x\beta)$


El resultado son probabilidades predichas entre 0 y 1.

In [None]:
phat_probit = probit_res.predict()

El test de Wald está evaluando la hipótesis nula conjunta:


$H_0: \beta_{exper} = 0$

$H_0: \beta_{expersq} = 0$

(contra la alternativa de que al menos uno de ellos sea distinto de cero.)

In [None]:
print(probit_res.wald_test('exper = 0, expersq = 0', scalar= True))

<Wald test (chi2): statistic=95.6709910827676, p-value=1.6799959983724357e-21, df_denom=2>


Estimado de Probit con errores robustos

In [None]:
probit_rob = probit_mod.fit(cov_type='HC1')

Optimization terminated successfully.
         Current function value: 0.532938
         Iterations 5


In [None]:
print(probit_rob.summary())

                          Probit Regression Results                           
Dep. Variable:                   inlf   No. Observations:                  753
Model:                         Probit   Df Residuals:                      745
Method:                           MLE   Df Model:                            7
Date:                Sat, 16 Aug 2025   Pseudo R-squ.:                  0.2206
Time:                        14:46:51   Log-Likelihood:                -401.30
converged:                       True   LL-Null:                       -514.87
Covariance Type:                  HC1   LLR p-value:                 2.009e-45
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.2701      0.505      0.535      0.593      -0.719       1.260
educ           0.1309      0.026      5.073      0.000       0.080       0.181
nwifeinc      -0.0120      0.005     -2.266      0.0

## LOGIT

En el caso de Logit no se aplica $\phi$ sino la función logística $\Lambda$

$P(y = 1|x) = \Lambda(x\beta)$

In [None]:
logit_res = sm.Logit(y, X).fit()
print(logit_res.summary())

Optimization terminated successfully.
         Current function value: 0.533553
         Iterations 6
                           Logit Regression Results                           
Dep. Variable:                   inlf   No. Observations:                  753
Model:                          Logit   Df Residuals:                      745
Method:                           MLE   Df Model:                            7
Date:                Sat, 16 Aug 2025   Pseudo R-squ.:                  0.2197
Time:                        14:46:51   Log-Likelihood:                -401.77
converged:                       True   LL-Null:                       -514.87
Covariance Type:            nonrobust   LLR p-value:                 3.159e-45
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.4255      0.860      0.494      0.621      -1.261       2.112
educ           0.2212      0.

### Average Marginal Effects (AME)

In [None]:
logit_res.get_margeff(at='overall').summary()

0,1
Dep. Variable:,inlf
Method:,dydx
At:,overall

Unnamed: 0,dy/dx,std err,z,P>|z|,[0.025,0.975]
educ,0.0395,0.007,5.414,0.0,0.025,0.054
nwifeinc,-0.0038,0.001,-2.571,0.01,-0.007,-0.001
exper,0.0368,0.005,7.139,0.0,0.027,0.047
expersq,-0.0006,0.0,-3.176,0.001,-0.001,-0.0
age,-0.0157,0.002,-6.603,0.0,-0.02,-0.011
kidslt6,-0.2578,0.032,-8.07,0.0,-0.32,-0.195
kidsge6,0.0107,0.013,0.805,0.421,-0.015,0.037


### odds y odds ratios

Los coeficientes en logit se interpretan en términos de log-odds. Podemos hacer la conversión a odds ratios


In [None]:
logit_res.params

Unnamed: 0,0
const,0.425452
educ,0.22117
nwifeinc,-0.021345
exper,0.20587
expersq,-0.003154
age,-0.088024
kidslt6,-1.443354
kidsge6,0.060112


odds ratios

In [None]:
np.exp(logit_res.params)

Unnamed: 0,0
const,1.530283
educ,1.247536
nwifeinc,0.978881
exper,1.228593
expersq,0.996851
age,0.915739
kidslt6,0.236134
kidsge6,1.061956


Intervalo de confianza del 95% para los odds ratios

In [None]:
np.exp(logit_res.conf_int())


Unnamed: 0,0,1
const,0.283415,8.262655
educ,1.145717,1.358404
nwifeinc,0.962856,0.995172
exper,1.153775,1.308263
expersq,0.994868,0.998838
age,0.889953,0.942272
kidslt6,0.158441,0.351926
kidsge6,0.91716,1.22961
