# Homework on MSLE

Recall that in the previous homework we have Model A and its log-likelihood function of observation $i$ is as follows.

\begin{align}
 y_i & = \alpha + x_i' \beta + v_i - u_i,\\
 v_i & \sim N(0, \sigma_v^2), \\
 u_i & \sim N^+(0, \sigma_u^2),
\end{align}
 
\begin{aligned}
L_i = - \ln \left(\frac{1}{2}\right) -\frac{1}{2}\ln (\sigma_v^2 + \sigma_u^2) + \ln
\phi\left(\frac{\epsilon_i}{\sqrt{\sigma_v^2 + \sigma_u^2}} \right) +
\ln \Phi\left(\frac{\mu_{*i}}{\sigma_*} \right),
\end{aligned}


where $\phi(z)$ and $\Phi(z)$ are the PDF and CDF of a standard normal distribution. Also,

\begin{aligned}
 \mu_{*i}  = \frac{-\sigma_u^2 \epsilon_i}{\sigma_v^2 + \sigma_u^2},\qquad
 \sigma_*^2  = \frac{\sigma_v^2  \sigma_u^2}{\sigma_v^2 + \sigma_u^2}. 
\end{aligned}


###### 1. Write a Julia function of the Model's log-likelihood function.
Write a function for Model A's log-likelihood that is suitable for estimating using the Monte Carlo simulation approach (MCSA of Section 1.4 in the lecture note). 

  - Name the function `NHN_msle`.
  - Use the Monte Carlo method (not the Quasi Monte Carlo method) to draw samples of $u_i^s$, $s=1,\ldots,S$.


###### 2. Empirical Estimation
The attached dataset, `sampledata.csv`, contains data of agricultural production from India. The variables are the follows. They have been converged to appropriate units (using log, etc.) so no further data processing is necessary.


|          |  |        |
|-------------------------------------|---|---------------------|
| yvar: rice output                   |   | Lland: land         |
| Plland: irrigated land              |   | Llabor: labor       |
| Lbull: bull cost                    |   | Lcost: other costs  |
| yr: production year                 |   | age: age of farmers |
| school: farmers' years of schooling |   | yr_1: same as year  |


Your work is to use `NHN_msle` to estimate the data with the following specification.

$yvar_i = \alpha + \beta_1 Lland_i + \beta_2 Plland_i + \beta_3 Llabor_i + \beta_4 Lbull_i + \beta_5 Lcost_i + \beta_6 yr_i + v_i - u_i$.
  
You may use the following code to read in the data.

```julia
####################
using DataFrames, CSV
df = DataFrame(CSV.File("sampledata.csv"))
y = df[:, "yvar"]       # the dep var
x = Matrix(df[:, 2:7])  # the indep vars, not including a constant
####################
```

***The required result is a table with three columns: the 1st column is the coefficients, the 2nd column is the standard errors, and the 3rd column is the $t$ statistics.***

  - The table is preferably in Dataframe. How to convert a matrix to a DataFrame? Ask ChatGPT!


###### 2.1: The Estimation Guidelines
  
  - You may use $-0.1$ as the initial values for all parameters. Or you may use the OLS result as the initial values for the $\alpha$ and $\beta$ coefficients (`x2=hcat(ones(size(y,1), 1), x); ols=inv(x2'x2)*(x2'y)`). Or, you may choose any initial values that seem to be reasonable choices. However, you _**cannot**_ use the true answer as the initial values.

  - Set the value of $S$ (the number of random draws on $u$) to be $S=2^{10} -1$.

  - I strongly suggest that your program uses the `autodiff = :forward` option (which uses automatic differentiation) in the estimation.
  
    - The `autodiff = :forward` option puts stringent requirements on data types which may not easy to work out. I suggest that you start with your program without the option (which would then default to numerical finite differences that is easier on the data). After you have a working program, you may add the option back and see if it works. Most likely you'll have error messages and you have to work out the issues. If you can't make it work, that's fine. Try your best.  
  

- Hint 1: Look at $|g(x)|$ to judge whether it is converged. It should be smaller than the convergence criterion. 
  
- Hint 2: The answer from the MSLE should be close to but not exactly the same as that from the MLE.


In [10]:
using Random
function test()
    Random.seed!(123)
    rand(4) |> display
end

test()
test()

4-element Vector{Float64}:
 0.521213795535383
 0.5868067574533484
 0.8908786980927811
 0.19090669902576285

4-element Vector{Float64}:
 0.521213795535383
 0.5868067574533484
 0.8908786980927811
 0.19090669902576285

In [2]:
using Random, Distributions, Optim, StatsFuns, DataFrames, CSV, LinearAlgebra
using HaltonSequences


#### define programs

pdf_N(e, sigma_v) = pdf(Normal(0, sigma_v), e)  # define normal density     

function NHN_msle(y, x, α, β, log_σᵤ², log_σᵥ²; draws=2^10-1)
   
   σᵤ = exp(0.5*log_σᵤ²) 
   σᵥ = exp(0.5*log_σᵥ²)    
      
   u_list = quantile( truncated(Normal(0, σᵤ), lower=0.0), rand(Xoshiro(123), draws) )  
 # u_list = quantile( truncated(Normal(0, σᵤ), 0.0, Inf), rand(Xoshiro(123), draws) ) # no, doesn't work with ForwardDiff b/c of "Inf"
 # u_list = quantile( truncated(Normal(0, σᵤ), 0.0, 999), rand(Xoshiro(123), draws) ) # good, replace "Inf" by a large number like "999"
 # u_list = quantile(Normal(0, σᵤ), 0.5 * rand(Xoshiro(123), draws) .+ 0.5)           # better!
 # u_list = rand( truncated(Normal(0, σᵤ), lower=0.0), draws)                         # no, the `rand( truncated())` does not work with ForwardDiff 
 # u_list = abs.(rand( Normal(0, σᵤ), draws))                                         # it works for this case too
  
    ϵ  = y .- α .- x*β
    
    llike = Array{Real}(undef, size(y,1))  # "Real" is important to work with ForwardDiff
    for i in 1:size(y,1) 
      llike[i] = log(mean(pdf_N.(ϵ[i,1] .+ u_list, σᵥ) ))
    end
    
    return -sum(llike)
end


##### Read in data

df = DataFrame(CSV.File("sampledata.csv"))
y = df[:, "yvar"]        # the dep var
x = Matrix(df[:, 2:7])   # not including the constant

nofx = size(x,2)    # not including the constant
nofpara = 1+nofx+2  # number of parameters; "1": constant; "2": sigma_v and sigma_u

#### prepare for estimation

init = -0.1*ones(nofpara)  # initial values


#******** start the estimation ************#

func = TwiceDifferentiable(vars -> NHN_msle(y, x, vars[1], vars[2:1+nofx], vars[end-1], vars[end], draws=2^10-1),
                           ones(nofpara); autodiff = :forward)

hwk14 = optimize(func, init, Newton(),
                 Optim.Options(g_tol = 1e-7,
                               iterations = 1000) )


if isnan(Optim.g_residual(func)) || (Optim.g_residual(func) > 1e-7) 
    println("The gradients are problematic. There is a problem in the convergence. See below.\n")
    @show hwk14 
    @show Optim.minimizer(hwk14)
    throw("try again")
end 

_coevec = Optim.minimizer(hwk14)
hwk14_coeff = deepcopy(_coevec)                # keep _coevec untouched
hwk14_coeff[end-1:end, 1] = exp.(hwk14_coeff[end-1:end, 1])     # convert unit of the last two elements

hwk14_coeff |> display
hwk14 |> display


9-element Vector{Float64}:
  1.5935055530497055
  0.29500356731181615
  0.23468433121898327
  1.1560600232811382
 -0.42852865254127254
  0.008369944027572155
  0.03346338738562464
  0.2345170099875552
  0.03705618467482772

 * Status: success (objective increased between iterations)

 * Candidate solution
    Final objective value:     9.393060e+01

 * Found with
    Algorithm:     Newton's Method

 * Convergence measures
    |x - x'|               = 3.96e-08 ≰ 0.0e+00
    |x - x'|/|x'|          = 1.20e-08 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.85e-13 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.97e-15 ≰ 0.0e+00
    |g(x)|                 = 1.67e-12 ≤ 1.0e-07

 * Work counters
    Seconds run:   12  (vs limit Inf)
    Iterations:    20
    f(x) calls:    86
    ∇f(x) calls:   86
    ∇²f(x) calls:  20


In [2]:
 # use Hessian matrix to obtain std. err.

hwk14_Hessian  = Optim.hessian!(func, _coevec)  # Hessain evaluated at the coeff vector

var_cov_matrix = inv(hwk14_Hessian)             # don't need the "negative" of Hessian. Why?
stderror  = sqrt.(diag(var_cov_matrix))
stderror[end-1:end] = hwk14_coeff[end-1:end] .* stderror[end-1:end]  # convert the unit using the delta method
t_stats = hwk14_coeff ./ stderror
hwk14_table = hcat(hwk14_coeff, stderror, t_stats)

 # convert it to a DataFrame

column_names = ["coeff", "std err", "t stat"]
res = DataFrame(hwk14_table, column_names)
row_names = ["const", "Lland", "Plland", "Llabor", "Lbull", "Lcost", "yr", "σᵤ²", "σᵥ²"]

res[!, :_RowName] = row_names

insertcols!(res, 1, :RowName => res[!, :_RowName])
select!(res, Not(:_RowName))


Unnamed: 0_level_0,RowName,coeff,std err,t stat
Unnamed: 0_level_1,String,Float64,Float64,Float64
1,const,1.59351,0.367522,4.33581
2,Lland,0.295004,0.0717048,4.11414
3,Plland,0.234684,0.183638,1.27797
4,Llabor,1.15606,0.0861071,13.4258
5,Lbull,-0.428529,0.0604068,-7.09405
6,Lcost,0.00836994,0.0129658,0.645542
7,yr,0.0334634,0.00832977,4.01732
8,σᵤ²,0.234517,0.0466006,5.03249
9,σᵥ²,0.0370562,0.010178,3.6408
