# Assignment 2 Structural Econometrics: Question 2
## November 9, 2018 
## Eric Schulman 

In [2]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import math

import statsmodels.api as sm
from statsmodels.sandbox.regression.gmm import GMM
from statsmodels.base.model import GenericLikelihoodModel

from scipy.stats import norm
from scipy.stats import multivariate_normal

In [3]:
#load data into memory
data = pd.DataFrame(data = np.genfromtxt('ps2.dat', delimiter='  '), columns=['y','x1','x2','z'])

print data.mean()

y      0.568393
x1    42.537849
x2    12.286853
z      9.029880
dtype: float64


### Part a

An economic story where $x_{2i}$ is correlated with $\epsilon_i$ involves simultaneity between a woman's decision of an education level and years she wants to work. Women who intend work may select more education. If women make this decision simultaneously, you would expect correlation between $x_{2i}$ and $\epsilon_i$ and an upward bias on $\theta_2$

### Part b


### Part c

If $x_{2i}$ is exogenous, we have $\rho = 0$. This is because

$$E(x_{2i} \epsilon_i) = E(\eta_i \epsilon_i) = \rho$$

Since we believe $x_{2i}$ is endogoenous, I would expect $\rho > 0$.  One might expect parents education to be positively related to your education (i.e. your parents can serve as a role model).

Assuming that $z_i$ does not directly determine $y_i$ is the exclusion restriction. Without this restriction, $\theta_2$ would not be identified.

### Part d

In order to estimate the model we must derive the likelihood function.

$$p(y_i,x_{2i} | x_{1i}, z_i, \theta) = p(x_{2i} | x_{1i}, z_i, \theta) p(y_i | x_{1i},x_{2i}, z_i, \theta)  = p(x_{2i} | x_{1i}, z_i, \theta) p(y_i | x_{1i},x_{2i}, z_i, \eta_i, \theta)$$

Preforming a change of variable of $\eta_i$ for $x_{2i}$, we can write

$$p(x_{2i} | x_{1i}, z_i, \theta) = p(\eta_i|x_i,z_i,\theta)\dfrac{dx_{2i}}{d\eta_i} = \phi(\dfrac{\eta_i}{\sigma_\eta})  \dfrac{1}{\sigma_\eta}$$

We can derive an analytic experession for  $p(y_i | x_{1i},x_{2i}, z_i, \theta)$ below:

For notational convenience let, $\gamma_i = \theta_2 \eta_i + \theta_0 + \theta_2\theta_3 + (\theta_1 + \theta_2\theta_4) + \theta_2\theta_5 z_i$

When $y_i =1$, we have $p(y_i | x_{1i},x_{2i}, z_i, \theta) = 1 - P(\epsilon_i + \gamma_i > 0) $

When $y_i = 0$, we have $p(y_i | x_{1i},x_{2i}, z_i, \theta) =   P(\epsilon_i + \gamma_i > 0) $

So, we have 

$$p(y_i | x_{1i},x_{2i}, z_i, \theta) = (1-y_i) P(\epsilon_i + \gamma_i > 0) + y_i (1 - P(\epsilon_i + \gamma_i > 0) ) $$

Using results about the distribution of conditional normals we know

$\epsilon_i|\eta_i \sim N(\eta_i \dfrac{\rho}{\sigma_\eta^2}, 1 - \dfrac{\rho^2}{\sigma_\eta^2})$


So, $p(y_i | x_{1i},x_{2i}, z_i, \theta) = y_i (1 -\Phi(\dfrac{- \gamma_i - \frac{\rho}{\sigma_\eta^2}}{1 - \frac{\rho^2}{\sigma_\eta^2}})) + (1-y_i)\Phi(\dfrac{- \gamma_i - \frac{\rho}{\sigma_\eta^2}}{1 - \frac{\rho^2}{\sigma_\eta^2}}) $

Now we can write

$$L = \sum_i log(p(x_{2i} | x_{1i}, z_i, \theta)) + log(p(y_i | x_{1i},x_{2i}, z_i, \eta_i, \theta))$$

In [14]:
class part_d(GenericLikelihoodModel):
    """class for evaluating question 1 part b"""
    
    def nloglikeobs(self, params):
        """evaluate the likelihood function as derived above"""
        t0,t1,t2,t3,t4,t5,rho,sigma = params

        y,x2 = self.endog.transpose()
        x1,z = self.exog.transpose()
        
        eta = x2 - t3 - t4*x1 - t5*z
        
        mu_epsilon = (rho/sigma**2)*eta
        var_epsilon = np.sqrt( abs(1 - (rho/sigma)**2) )
        
        #pr(eta | ... )
        pr_eta = norm(0,sigma).pdf(eta)
        
        #pr(y|x2 ... )
        gamma = t0 + t2*t3 + (t1 + t2*t4)*x1 + t2*t5*z + t2*eta
        
        pr_epsilon = (y*(1 - norm(mu_epsilon,var_epsilon).cdf(-gamma))
                      + (1-y)*norm(mu_epsilon,var_epsilon).cdf(-gamma))
        
        likelihood = np.log( pr_epsilon*pr_eta )

        return -( likelihood.sum() ) 
    
    
    def fit(self, start_params=None, maxiter=2000, maxfun=5000, **kwds):
        """fit the likelihood function using the right start parameters"""
        # we have one additional parameter and we need to add it for summary
        if start_params == None:
            start_params = [-.39,.01,.08,9.1, 0.0, .36, .11, 1.9]
            
        return super(part_d, self).fit(start_params=start_params,
                                       maxiter=maxiter, maxfun=maxfun, **kwds)

    
model_d = part_d(data[['y','x2']],data[['x1','z']])

result_d = model_d.fit()
print(result_d.summary(xname=['theta_0', 'theta_1', 'theta_2',
                              'theta_3','theta_4','theta_5',
                              'rho', 'sigma']))

Optimization terminated successfully.
         Current function value: 2.765785
         Iterations: 1171
         Function evaluations: 1699
                                part_d Results                                
Dep. Variable:            ['y', 'x2']   Log-Likelihood:                -2082.6
Model:                         part_d   AIC:                             4169.
Method:            Maximum Likelihood   BIC:                             4179.
Date:                Wed, 07 Nov 2018                                         
Time:                        14:46:51                                         
No. Observations:                 753                                         
Df Residuals:                     751                                         
Df Model:                           1                                         
                 coef    std err          z      P>|z|      [0.025      0.975]
--------------------------------------------------------------------

### Part e

Using results from the table above, we can see that $\rho$ is roughly .12 and it's standard error is .19. As a result, we fail to reject the null hypothesis beyond a .50 confidence level. Either $x_{2i}$ is exogenous or $z_i$ is not.

### Part f

$\tau_i$ is a random coefficeint on $x_{2i}$ to capture a heterogenous response in labor force participation to education. Simply put, education might make a bigger difference for some people's labor force participation than others. If this is the case, you should include a random coefficient.


### Part g

As in part d)

$$p(y_i,x_{2i} | x_{1i}, z_i, \theta) = p(x_{2i} | x_{1i}, z_i, \theta) p(y_i | x_{1i},x_{2i}, z_i, \theta)  = p(x_{2i} | x_{1i}, z_i, \theta) p(y_i | x_{1i},x_{2i}, z_i, \eta_i, \theta)$$

This holds because $p(x_{2i} | x_{1i}, z_i, \tau_i, \theta) = p(x_{2i} | x_{1i}, z_i, \theta) $
Now, we can write 

$$p(y_i | x_{1i},x_{2i}, z_i, \eta_i, \theta) = \int p(y_i | x_{1i},x_{2i}, \tau_i, z_i, \eta_i, \theta)p(\tau_i)d \tau'_i$$ and simulate to get

$$p(y_i | x_{1i},x_{2i}, z_i, \eta_i, \theta) = \frac{1}{S} \sum_s p(y_i | x_{1i},x_{2i}, \tau_{i,s}, z_i, \eta_i, \theta)$$

We can modify $\gamma_i(\tau_i)$ from before to include the simulated $\tau_i$. Specifically,

$$\gamma_i(\tau_i) = (\theta_2 + \sigma_\tau \tau_i) \eta_i + \theta_0 + (\theta_2+ \sigma_\tau \tau_i)\theta_3 + (\theta_1 + (\theta_2+ \sigma_\tau \tau_i)\theta_4) + (\theta_2+ \sigma_\tau \tau_i)\theta_5 z_i$$

Now we can write

$$L = \sum_i log(p(x_{2i} | x_{1i}, z_i, \theta)) + log(p(y_i | x_{1i},x_{2i}, z_i, \eta_i, \theta))$$

Where

$$p(x_{2i} | x_{1i}, z_i, \theta) = p(\eta_i|x_i,z_i,\theta)\dfrac{dx_{2i}}{d\eta_i} = \phi(\dfrac{\eta_i}{\sigma_\eta})  \dfrac{1}{\sigma_\eta}$$


$$p(y_i | x_{1i},x_{2i}, z_i, \eta_i, \theta) = \frac{1}{S} \sum_s y_i (1 -\Phi(\dfrac{- \gamma_i(\tau_i) - \frac{\rho}{\sigma_\eta^2}}{1 - \frac{\rho^2}{\sigma_\eta^2}})) + (1-y_i)\Phi(\dfrac{- \gamma_i(\tau_i) - \frac{\rho}{\sigma_\eta^2}}{1 - \frac{\rho^2}{\sigma_\eta^2}})$$


### Part h


We can use the following conditions

1. $E(\eta_i) = 0$
2. $E(\eta_i x_{1i}) = 0$
3. $E(\eta_i z_i) = 0$


Now letting,

$$E( g( x_{i1}, x_{i2}, z{i}) ) = \frac{1}{S} \sum \sum_s \textbf{1}( (\theta_2 + \sigma_\tau \tau_i) \eta_i + \theta_0 + (\theta_2+ \sigma_\tau \tau_i)\theta_3 + (\theta_1 + (\theta_2+ \sigma_\tau \tau_i)\theta_4) + (\theta_2+ \sigma_\tau \tau_i)\theta_5 z_i  + \epsilon_i > 0)$$

Where $S$ reflects simualtions of both $\tau_i$ and $\epsilon_i$ using the conditional distribution of $\epsilon_i$ derived in part d)

4. $E(  y_i - g(x_{i1}, x_{i2}, z{i}) ) = 0$
5. $E( ( y_i - g(x_{i1}, x_{i2}, z{i})) x_{1i} ) = 0$
6. $E( ( y_i - g(x_{i1}, x_{i2}, z{i})) z_{i} ) = 0$

The following moments will help idetnify $\rho$ aand $\sigma_\eta$

7. $E(\eta_i x_{1i}^2) = 0$
8. $E(\eta_i z_i^2) = 0$
9. $E(( y_i - g(x_{i1}, x_{i2}, z{i})) x_{1i}^2 ) = 0$
10. $E( ( y_i - g(x_{i1}, x_{i2}, z{i})) z_{i}^2 ) = 0$

From class we know that

$$m^{opt}(z_i) =  E( \dfrac{ \partial g^{-1}(x_i,y_i,\theta)}{\partial \theta} )$$

Squaring $z_i$, and $x_{i1}$ are the most 'direct' way to make inferences about $\rho$ and $\sigma_\eta$. This is because the squared terms interact directly with these parameters in the model.

### Part i

* Compared to MSM, SML will require more simulation draws because assymptotically we assume that $S \rightarrow \infty$. MSM is consistent reguardless of the number of simulations.  The efficiency is still related to the number of simulations. Also, technically MSM does not rely on a distributional assumption on $\eta_i$

* On the other hand, we only need to simulate $\tau_i$ in SML. As shown in part d) we can derive an analytic distribution for the likelihood of $\epsilon_i$ conditional on $\eta_i$. 

### Part j

No they are not smooth in $\theta_2$. We could simulate $E( g( x_{i1}, x_{i2}, z{i}) )$ using importance sampling to make the functions smooth.

This would involve letting

$$E( g( x_{i1}, x_{i2}, z{i}) )= $$ 

$$ \int \int \textbf{1}( (\theta_2 + \sigma_\tau \tau_i) \eta_i + \theta_0 + (\theta_2+ \sigma_\tau \tau_i)\theta_3 + (\theta_1 + (\theta_2+ \sigma_\tau \tau_i)\theta_4) + (\theta_2+ \sigma_\tau \tau_i)\theta_5 z_i  + \epsilon_i > 0) p(\epsilon_i | \eta_i ) p(\tau_i)  d\tau'_i d \epsilon'_i $$

$$ = \int \int \textbf{1}( (\theta_2 + \sigma_\tau \tau_i) \eta_i + \theta_0 + (\theta_2+ \sigma_\tau \tau_i)\theta_3 + (\theta_1 + (\theta_2+ \sigma_\tau \tau_i)\theta_4) + (\theta_2+ \sigma_\tau \tau_i)\theta_5 z_i  + \epsilon_i > 0) \dfrac{  p(\epsilon_i | \eta_i ) p(\tau_i) h(\eta_i) } {h(\eta_i)}  d\tau'_i d \epsilon'_i $$

Let $h(\eta_i) = \dfrac{1}{p(\eta_i)}$

Since $p(\epsilon_i | \eta_i ) = \dfrac{p(\epsilon_i, \eta_i)}{p(\eta_i)}$, 

$$ \int \int \textbf{1}( (\theta_2 + \sigma_\tau \tau_i) \eta_i + \theta_0 + (\theta_2+ \sigma_\tau \tau_i)\theta_3 + (\theta_1 + (\theta_2+ \sigma_\tau \tau_i)\theta_4) + (\theta_2+ \sigma_\tau \tau_i)\theta_5 z_i  + \epsilon_i > 0) \dfrac{p(\epsilon_i, \eta_i)}{p(\eta_i)} p(\tau_i)  d\tau'_i d \epsilon'_i $$

Instead of simulating from  $p(\epsilon_i | \eta_i )$ we can simulate from $p(\epsilon_i, \eta_i)$

$$ = \frac{1}{S} \sum \sum_s  \textbf{1}( (\theta_2 + \sigma_\tau \tau_i) \eta_i + \theta_0 + (\theta_2+ \sigma_\tau \tau_i)\theta_3 + (\theta_1 + (\theta_2+ \sigma_\tau \tau_i)\theta_4) + (\theta_2+ \sigma_\tau \tau_i)\theta_5 z_i  + \epsilon_i > 0) \dfrac{1}{p(\eta_i)} $$

### Part k

This assumption may not be reasonable. If education and labor force partiticipation are simultaneously determined, then you would expect the error $\eta_i$ to reflect this simultaneous decision. Additionally, an individual's $\tau_i$ would influence this simultaneous decision. As a result you would expect $\tau_i$ and $\eta_i$ to be positively correlated and $\sigma_\tau$ to be biased downward

### Part l

In order to produce consistent estimates, “ivprobit” requires

* The endogenous regressors are continuous.
* The error term inside the indicator function, $\epsilon_i$, is homoskedastic. If it is heteroskedastic, point estimates will be inconsistent as with most other probit models.
* ($\epsilon_i$, $\eta_i$) is i.i.d. multivariate normal $\forall i$.


This command cannot estimate the model in part f) because the estimate for $\sigma_\eta$ and $\tau_i$  will depend on $x_{2i}$. As a result, including $\tau_i$ in the model will violate our assumption about homoskedasticity