Hint: the following packages will be useful in solving this prolem set.

In [2]:
using Optim
using Statistics
using ForwardDiff
using Plots
using LinearAlgebra
using CSV
using DataFrames
using StatsFuns

# Problem Set 2
**Due: May 3, 2021** (in class; subject to change if COVID restrictions apply)

A binary response is a variable that takes on only two values, customarily
0 and 1, which can be thought of as codes for whether or not a condisiton
is satisfied. For example, 0=drive to work, 1=take the bus. Often
the observed binary variable, say $y$, is related to an unobserved
(latent) continuous varable, say $y^{*}$. We would like to know the
effect of covariates, $x$, on $y.$ The model can be represented
as 
\begin{eqnarray*}
y^{*} & = & g(x)-\varepsilon\\
y & = & 1(y^{*}>0)\\
Pr(y=1) & = & F_{\varepsilon}[g(x)]\\
 & \equiv & p(x,\theta)
\end{eqnarray*}

For the logit model, the probability has the specific form
$$
p(x,\theta)=\frac{1}{1+\exp(-x^{\prime}\theta)}
$$

## Problem 1 (MLE)

We will consider maximum likelihood estimation of
the logit model for binary 0/1 dependent variables. We will use the
BFGS algorithm to find the MLE. 

The log-likelihood function is

$$
s_{n}(\theta)=\frac{1}{n}\sum_{i=1}^{n}\left(y_{i}\ln p(x_{i},\theta)+(1-y_{i})\ln\left[1-p(x_{i},\theta)\right]\right)
$$



The following code generates that follow a logit model with given $\theta$:

In [4]:
function LogitDGP(n, theta)
    k = size(theta,1)
    x = ones(n,1)
    if k>1 
        x = [x  randn(n,k-1)]
    end
    y = Float64.((1.0 ./ (1.0 .+ exp.(-x*theta)) .> rand(n,1)))
    return y, x
end

LogitDGP (generic function with 1 method)

Let us estimate $\hat{\theta}$ from the dataset with 100 points ($n=100$) and generated from the true $\theta$ ($\theta_0$) value of $[0.5,0.5]$.

In [10]:
n=100
theta = [0.75,0.25]
(y,x) = LogitDGP(n,theta)

([1.0; 0.0; … ; 1.0; 0.0], [1.0 0.278557437317092; 1.0 0.09507474449420693; … ; 1.0 -0.9180350890683213; 1.0 -0.16918013820719])

**(1. a) Estimate $\hat{\theta}$.**

Hint: 

1. Refer to [Nerlove lecture notes](https://github.com/minyoungrho/Econometrics2/blob/main/lectures/Nerlove.ipynb) for an example code for mle estimation.
2. Code for the log likelihood function 
$$
s_{n}(\theta)=\frac{1}{n}\sum_{i=1}^{n}\left(y_{i}\ln p(x_{i},\theta)+(1-y_{i})\ln\left[1-p(x_{i},\theta)\right]\right)
$$
is written as below:

**(1. b) Empirically prove consistency of $\hat{\theta}$ by increasing the number of n in DGP and re-estimate.**

Hint: Refer to [GMM lecture notes](https://github.com/minyoungrho/Econometrics2/blob/main/lectures/GMM.ipynb) for an example code for empirically proving consistency.

**(1. c) Empirically prove asymptotic normality of $\hat{\theta}$ by repeatedly generate data.**


Hint: Refer to [GMM lecture notes](https://github.com/minyoungrho/Econometrics2/blob/main/lectures/GMM.ipynb) for an example code for empirically proving asymptotic normality.

## Problem 2 (GMM)
Recall from [GMM lecture notes](https://github.com/minyoungrho/Econometrics2/blob/main/lectures/GMM.ipynb):

Suppose the model
is 
$$
\begin{eqnarray*}
y_{t}^{*} & = & \alpha+\rho y_{t-1}^{*}+\beta x_{t}+\epsilon_{t}\\
y_{t} & = & y_{t}^{*}+\upsilon_{t}
\end{eqnarray*}
$$
where $\epsilon_{t}$ and $\upsilon_{t}$ are independent Gaussian
white noise errors. Suppose that $y_{t}^{*}$ is not observed, and
instead we observe $y_{t}$. If we estimate the equation 
$$
y_{t}=\alpha+\rho y_{t-1}+\beta x_{t}+\nu_{t}
$$
this the estimator is biased and inconsistent. 

What about using the GIV
estimator? 

Consider using as instruments $Z=\left[1\,x_{t}\,x_{t-1}\,x_{t-2}\right]$.
The lags of $x_{t}$ are correlated with $y_{t-1}$ as long as $\beta$
is different from zero, and by assumption $x_{t}$ and its lags are
uncorrelated with $\epsilon_{t}$ and $\upsilon_{t}$ (and thus they're
also uncorrelated with $\nu_{t})$. Thus, these are legitimate instruments.
As we have 4 instruments and 3 parameters, this is an overidentified
situation. 

In [238]:
function lag(x::Array{Float64,2},p::Int64)
	n,k = size(x)
	lagged_x = [ones(p,k); x[1:n-p,:]]
end

function lag(x::Array{Float64,1},p::Int64)
	n = size(x,1)
	lagged_x = [ones(p); x[1:n-p]]
end	 


function  lags(x::Array{Float64,2},p)
	n, k = size(x)
	lagged_x = zeros(eltype(x),n,p*k)
	for i = 1:p
		lagged_x[:,i*k-k+1:i*k] = lag(x,i)
	end
    return lagged_x
end	

function  lags(x::Array{Float64,1},p)
	n = size(x,1)
	lagged_x = zeros(eltype(x), n,p)
	for i = 1:p
		lagged_x[:,i] = lag(x,i)
	end
    return lagged_x
end	 

lags (generic function with 2 methods)

Given $[\alpha_0,\rho_0,\beta_0] = [0, 0.9, 1]$, let us generate data using the pre-defined  lag function above:

In [280]:
n = 100
sig = 1

x = randn(n) # an exogenous regressor
e = randn(n) # the error term
ystar = zeros(n)
# generate the dep var
for t = 2:n
  ystar[t] = 0.0 + 0.9*ystar[t-1] + 1.0*x[t] + e[t]
end
# add measurement error
y = ystar + sig*randn(n)
ylag = lag(y,1)
data = [y ylag x];
data = data[2:end,:] # drop first obs, missing due to lag
theta = [0, 0.9, 1]

3-element Array{Float64,1}:
 0.0
 0.9
 1.0

**(2. a) Given the following GIVmoments function, write down the moment conditions for each data point. In other words, write down $m_t(\theta)$ $\forall t$ where $t$ is an index for each data points and $\bar{m_n}(\theta)$.**


In [281]:
# moment condition
function GIVmoments(theta, data)
	data = [data lags(data,2)]
    data = data[3:end,:] # get rid of missings
	n = size(data,1)
	y = data[:,1]
	ylag = data[:,2]
	x = data[:,3]
	xlag = data[:,6]
	xlag2 = data[:,9]
	X = [ones(n,1) ylag x]
	e = y - X*theta
	Z = [ones(n,1) x xlag xlag2]
	m = e.*Z
end

GIVmoments (generic function with 1 method)

**(2. b) Calculate $\hat{\theta}$ using two step GMM.**

Compare the estimates with the true parameter value: $[\alpha_0,\rho_0,\beta_0] = [0, 0.9, 1]$

## Problem 3 (Nerlove)

Recall [the Nerlove model from the previous lecture](https://github.com/minyoungrho/Econometrics2/blob/main/lectures/Nerlove.ipynb).

We have explored the property of HOD1 (homogeneous degree 1):
$$
\sum_{i=1}^{g}\beta_{i}=1
$$
In other words, the cost shares add up to 1. 


Now, let's explore the property of CRTS (constant returns to scale): 
$$
\gamma=\frac{1}{\beta_{q}}=1
$$
so $\beta_{q}=1.$



Let us first load the Nerlove data and log:

In [3]:
data = DataFrame(CSV.File("../data/nerlove.csv"))
data = log.(data[:,[:cost,:output,:labor,:fuel,:capital]])
first(data,6)

Unnamed: 0_level_0,cost,output,labor,fuel,capital
Unnamed: 0_level_1,Float64,Float64,Float64,Float64,Float64
1,-2.50104,0.693147,0.737164,2.8848,5.20949
2,-0.414001,1.09861,0.71784,3.5582,5.15906
3,-0.0100503,1.38629,0.71784,3.5582,5.14166
4,-1.15518,1.38629,0.604316,3.47197,5.11199
5,-1.62455,1.60944,0.751416,3.35341,5.45104
6,-2.32279,2.19722,0.751416,3.35341,5.273


Then, assign y and x values

In [5]:
n = size(data,1)
y = data[:,1]
x = data[:,2:end]
x[!,:intercept]=ones(size(data,1))
x = x[!,[:intercept,:output,:labor,:fuel,:capital]];

y = convert(Array,y)
x = convert(Array,x);

**3 (a). Estimate the parameters using Restricted OLS using the CRTS restrictions**

**3 (b). Calculate Wald, LR, and LM statistics and comment on the hypothesis test on restrictions.**