# Homework on MSLE

Recall that in the previous homework we have Model A and its log-likelihood function of observation $i$ is as follows.

\begin{align}
 y_i & = \alpha + x_i' \beta + v_i - u_i,\\
 v_i & \sim N(0, \sigma_v^2), \\
 u_i & \sim N^+(0, \sigma_u^2),
\end{align}
 
\begin{aligned}
L_i = - \ln \left(\frac{1}{2}\right) -\frac{1}{2}\ln (\sigma_v^2 + \sigma_u^2) + \ln
\phi\left(\frac{\epsilon_i}{\sqrt{\sigma_v^2 + \sigma_u^2}} \right) +
\ln \Phi\left(\frac{\mu_{*i}}{\sigma_*} \right),
\end{aligned}


where $\phi(z)$ and $\Phi(z)$ are the PDF and CDF of a standard normal distribution. Also,

\begin{aligned}
 \mu_{*i}  = \frac{-\sigma_u^2 \epsilon_i}{\sigma_v^2 + \sigma_u^2},\qquad
 \sigma_*^2  = \frac{\sigma_v^2  \sigma_u^2}{\sigma_v^2 + \sigma_u^2}. 
\end{aligned}


###### 1. Write a Julia function of the Model's log-likelihood function.
Write a function for Model A's log-likelihood that is suitable for estimating using the Monte Carlo simulation approach (MCSA of Section 1.4 in the lecture note). 

  - Name the function `NHN_msle`.
  - Use the Monte Carlo method (not the Quasi Monte Carlo method) to draw samples of $u_i^s$, $s=1,\ldots,S$.




###### 2. Empirical Estimation
The attached dataset, `sampledata.csv`, contains data of agricultural production from India. The variables are the follows. They have been converged to appropriate units (using log, etc.) so no further data processing is necessary.


|          |  |        |
|-------------------------------------|---|---------------------|
| yvar: rice output                   |   | Lland: land         |
| Plland: irrigated land              |   | Llabor: labor       |
| Lbull: bull cost                    |   | Lcost: other costs  |
| yr: production year                 |   | age: age of farmers |
| school: farmers' years of schooling |   | yr_1: same as year  |


Your work is to use `NHN_msle` to estimate the data with the following specification.

$yvar_i = \alpha + \beta_1 Lland_i + \beta_2 Plland_i + \beta_3 Llabor_i + \beta_4 Lbull_i + \beta_5 Lcost_i + \beta_6 yr_i + v_i - u_i$.
  
You may use the following code to read in the data.

```julia
####################
using DataFrames, CSV
df = DataFrame(CSV.File("sampledata.csv"))
y = df[:, "yvar"]       # the dep var
x = Matrix(df[:, 2:7])  # the indep vars, not including a constant
####################
```

***The required result is a table with three columns: the 1st column is the coefficients, the 2nd column is the standard errors, and the 3rd column is the $t$ statistics.***

  - The table is preferably in Dataframe. How to convert a matrix to a DataFrame? Ask ChatGPT!


###### 2.1: The Estimation Guidelines
  
  - You may use $-0.1$ as the initial values for all parameters. Or you may use the OLS result as the initial values for the $\alpha$ and $\beta$ coefficients (`x2=hcat(ones(size(y,1), 1), x); ols=inv(x2'x2)*(x2'y)`). Or, you may choose any initial values that seem to be reasonable choices. However, you _**cannot**_ use the true answer (provided below) as the initial values.

  - Set the value of $S$ (the number of random draws on $u$) to be $S=2^{10} -1$.

  - I strongly suggest that your program uses the `autodiff = :forward` option (which uses automatic differentiation) in the estimation.
  
    - The `autodiff = :forward` option puts stringent requirements on data types which may not easy to work out. I suggest that you start with your program without the option (which would then default to numerical finite differences that is easier on the data). After you have a working program, you may add the option back and see if it works. Most likely you'll have error messages and you have to work out the issues. If you can't make it work, that's fine. Try your best.  
  

- Hint 1: Look at $|g(x)|$ to judge whether it is converged. It should be smaller than the convergence criterion. 
  
- Hint 2: The answer from the MSLE should be close to but not exactly the same as that from the MLE.


In [1]:
using DataFrames, CSV
df = DataFrame(CSV.File("sampledata.csv"))
y = df[:, "yvar"]       # the dep var
x = Matrix(df[:, 2:7])  # the indep vars, not including a constant

271×6 Matrix{Float64}:
 -0.210721  0.0       5.26269  4.41884  0.0       6.0
 -0.210721  0.0       4.84419  4.20469  0.0       7.0
 -0.210721  0.0       4.39445  3.78419  0.0       8.0
 -0.210721  0.0       4.79579  4.23411  4.51501   9.0
 -0.211476  0.0       5.54908  4.95583  4.31431  10.0
  0.482426  0.0       5.53733  5.06259  0.0       6.0
  0.19062   0.0       5.17048  4.58497  0.0       7.0
  0.482426  0.0       5.57595  4.86754  0.0       9.0
  0.481671  0.0       5.38907  4.96981  0.0      10.0
 -0.210721  0.0       4.31749  3.68888  0.0       7.0
 -0.916291  0.0       4.07754  3.43399  0.0       8.0
 -0.210721  0.0       5.96101  4.02535  3.76898   6.0
 -0.210721  0.0       5.47646  4.64439  3.8907    7.0
  ⋮                                               ⋮
  1.58858   0.669422  8.49208  6.82546  7.15436  10.0
  2.06051   0.168153  8.52258  7.06133  7.66435   1.0
  2.09924   0.11152   8.70814  7.32251  7.11658   2.0
  1.63705   0.315175  8.16877  6.63857  6.88085   3.0
  1.699

In [11]:
using Distributions, Optim, LinearAlgebra, ForwardDiff, Random


function NHN_msle(y, x, α, β, log_σ²_v, log_σ²_u)
    σ²_v = exp(log_σ²_v)
    σ²_u = exp(log_σ²_u)
    σ_v  = exp(0.5*log_σ²_v)
    σ_u  = exp(0.5*log_σ²_u)
    ϵ  = y .- α .- x*β

    f(e, sigma_v) = pdf(Normal(0, sigma_v), e) 
    S=2^11 -1
    
    rng = Xoshiro(123)
    disTN = TruncatedNormal(0.0, σ_u, 0.0, Inf)    # half-normal dist  
    u_list = rand(rng, disTN, S)

    logLike = Array{Real}(undef, size(y,1))
    for i in 1:size(y,1) 
       logLike[i] = log(sum(f.(ϵ[i,1] .+ u_list, σ_v))/S)
    end
    
    return  logLike = -sum(logLike)  # better than running sum
    
end

nofxvar=6
func = TwiceDifferentiable(vars -> NHN_msle(y, x, vars[1], vars[2:nofxvar+1], vars[end-1], vars[end]),
                           ones(nofxvar+3))

x2=hcat(ones(size(y,1), 1), x); ols=inv(x2'x2)*(x2'y)
#myinit = vcat(ols, -0.1, -0.1) # or -0.1*ones(nofxvar+3)
myinit = [1.59, 0.29, 0.23, 1.15,-0.42, 0.007, 0.034, log(0.256), log(0.033)]

res= optimize(func, myinit, Newton(),
                Optim.Options(g_tol = 1e-7,
                              iterations = 2000) )

@show res
@show Optim.minimizer(res)
@show exp(Optim.minimizer(res)[end-1])
@show exp(Optim.minimizer(res)[end])

res =  * Status: success

 * Candidate solution
    Final objective value:     9.211587e+01

 * Found with
    Algorithm:     Newton's Method

 * Convergence measures
    |x - x'|               = 9.23e-09 ≰ 0.0e+00
    |x - x'|/|x'|          = 2.71e-09 ≰ 0.0e+00
    |f(x) - f(x')|         = 1.99e-13 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 2.16e-15 ≰ 0.0e+00
    |g(x)|                 = 2.35e-08 ≤ 1.0e-07

 * Work counters
    Seconds run:   6  (vs limit Inf)
    Iterations:    9
    f(x) calls:    25
    ∇f(x) calls:   25
    ∇²f(x) calls:  9

Optim.minimizer(res) = [1.572047299380956, 0.284785604988363, 0.23794041558422452, 1.1540155975595687, -0.4169973004689726, 0.00545148812134091, 0.03399107182667159, -3.40983942221118, -1.3991363475467133]
exp((Optim.minimizer(res))[end - 1]) = 0.03304650648479347
exp((Optim.minimizer(res))[end]) = 0.24681003000875973


0.24681003000875973

In [12]:
# coefficient vector
_coevec = Optim.minimizer(res)
res_coeff = deepcopy(_coevec)      # keep _coevec untouched
res_coeff[end-1] = exp(_coevec[end-1])       # convert the unit 
res_coeff[end] = exp(_coevec[end])           # convert the unit 

 
# use Hessian matrix to obtain std. err.

res_Hessian  = Optim.hessian!(func, _coevec)  # Hessain evaluated at the coeff vector, this is why we used deepcopy(), func is still in log unit, so we need to use _coevec (in log unit as well)
var_cov_matrix = inv(res_Hessian)             # don't need the "negative" of Hessian. Why? Since already take negative in NHN_mle
stderror  = sqrt.(diag(var_cov_matrix))
stderror[end-1] = res_coeff[end-1]*stderror[end-1]  # convert the unit using the delta method, exp(hat rho)*sigma_rho
stderror[end] = res_coeff[end]*stderror[end]
t_stats = res_coeff ./ stderror
res_table = hcat(res_coeff, stderror, t_stats)

println("The estimation table is")
res_table |> display


using DataFrames

DataFrame(res_table,[:res_coeff, :stderror, :t_stats])

The estimation table is


9×3 Matrix{Float64}:
  1.57205     0.345642     4.5482
  0.284786    0.0689715    4.12903
  0.23794     0.172136     1.38228
  1.15402     0.0793789   14.5381
 -0.416997    0.0582806   -7.155
  0.00545149  0.013043     0.417963
  0.0339911   0.00786219   4.32336
  0.0330465   0.00905215   3.65068
  0.24681     0.0418909    5.89173

Row,res_coeff,stderror,t_stats
Unnamed: 0_level_1,Float64,Float64,Float64
1,1.57205,0.345642,4.5482
2,0.284786,0.0689715,4.12903
3,0.23794,0.172136,1.38228
4,1.15402,0.0793789,14.5381
5,-0.416997,0.0582806,-7.155
6,0.00545149,0.013043,0.417963
7,0.0339911,0.00786219,4.32336
8,0.0330465,0.00905215,3.65068
9,0.24681,0.0418909,5.89173
