In [1]:
cd(joinpath(pwd(),".."))

using Pkg
Pkg.activate(".") ;

[32m[1m Activating[22m[39m environment at `C:\Users\paulc\.julia\dev\ECON627_2020\Project.toml`


## Add Optim and Automatic Differentiation Pkgs

In [2]:
#Pkg.add("Optim")
#Pkg.add("ForwardDiff")

## Load Packages

In [3]:
using Parameters, Optim, ForwardDiff, LinearAlgebra, Distributions, Random

## DGP 

Suppose we have the following non-linear model generated by the logistic link function.

In [4]:
Random.seed!(1234);

In [5]:
n = 1000
β = [1.0; -1.0];
m_fn = (x, b) -> (1 .+ exp.(-x * b)) .^ -1; # Logistic link function

x = 2 * rand( n, length(β))
u = rand(n,1)
y = m_fn(x,β) + u ;

Recall that the sample moment condition here is that the errors are orthogonal to the gradient of the function evaluated at $\hat{\beta}$. In other words,


$$ 0 = \frac{1}{n} \sum_{i=1}^n (Y_i - f(X_i, \hat{\beta}) \cdot \frac{\partial f(X_i, \hat{\beta})}{\partial\beta}$$





 We can compute the gradient at observation 1 and evaluated at $\hat{\beta} = [3,-2]$ using the following code 

In [6]:
ForwardDiff.gradient(z -> m_fn(x[1, :]', z), [3.0 ; -2.0])

2-element Array{Float64,1}:
 0.2900647377513582
 0.4683554496565841

Similarly, we can compute it for all observations using the `map()` function as follows.

In [7]:
v = map(i -> ForwardDiff.gradient(z -> m_fn(x[i, :]', z),[3.0 ; -2.0]), 1:n)

1000-element Array{Array{Float64,1},1}:
 [0.2900647377513582, 0.4683554496565841]
 [0.10053398134796412, 0.06628174136046258]
 [0.19399676562712462, 0.18232432397782836]
 [0.2273859726762808, 0.3677378752612256]
 [0.0262847551718364, 0.005766875473504174]
 [0.10197880038334955, 0.07277473863927474]
 [0.07437804513139447, 0.21547855428304005]
 [0.1491911672038609, 0.2307506304075694]
 [0.054308175681279804, 0.18807143725686665]
 [0.2396274604950775, 0.2678559293567952]
 [0.28238657497254493, 0.3415641492706213]
 [0.005364359455578175, 0.03956468758145818]
 [0.01698647048089965, 0.13611592691318045]
 ⋮
 [0.029070583551457176, 0.13456507829961215]
 [0.011718257290027649, 0.0024993215949869296]
 [0.08612433651200735, 0.04060249009009753]
 [0.1832384079134702, 0.17591502948557103]
 [0.05192609392556134, 0.1884843186991116]
 [0.2876529819960016, 0.42112085402348143]
 [0.005975597105910294, 0.06503971406834456]
 [0.01708144678497192, 0.0036041947956482363]
 [0.16302776484764445, 0.14175607664

Notice that this is an Array of Arrays, so the next code will convert it to a Matrix.

In [8]:
vcat(v'...)

1000×2 Array{Float64,2}:
 0.290065    0.468355
 0.100534    0.0662817
 0.193997    0.182324
 0.227386    0.367738
 0.0262848   0.00576688
 0.101979    0.0727747
 0.074378    0.215479
 0.149191    0.230751
 0.0543082   0.188071
 0.239627    0.267856
 0.282387    0.341564
 0.00536436  0.0395647
 0.0169865   0.136116
 ⋮           
 0.0290706   0.134565
 0.0117183   0.00249932
 0.0861243   0.0406025
 0.183238    0.175915
 0.0519261   0.188484
 0.287653    0.421121
 0.0059756   0.0650397
 0.0170814   0.00360419
 0.163028    0.141756
 0.189239    0.27794
 0.122968    0.0915069
 0.0845513   0.0275312

Now, we can minimize the sample criterion function using the function ``Optim.optimize()``, which can take as an input a differentiable object, an initial value, an algorithm (i.e. Newton, LBFGS, etc.), and we're using the default Optim options (tolerance level, max number of iterations, etc.). You can see more details at this [link](http://julianlsolvers.github.io/Optim.jl/v0.9.3/).

In [9]:
function nls(f, y, x ) 

    n = size(x, 1)

    if n != length(y)
        error("Incompatible data.") 
    end

    # Objective function
    obj = b -> sum((y - f(x, b)) .^ 2);
    
    #Initial Value 
    beta0 = zeros(size(x,2))

    # We set the criterion function as an instance that we can differentiate twice
    td = TwiceDifferentiable(obj, beta0 ; autodiff = :forward)
    o = optimize(td, beta0, Newton(), Optim.Options() )

    if !Optim.converged(o)
        error("Minimization failed.")
    end

    βhat = Optim.minimizer(o)


    # Get residuals
    r_hat = y - f(x, βhat)

    # Get asyvar, we compute the gradient of f with respect to b
    v = map(i -> ForwardDiff.gradient(z -> f(x[i, :]', z), βhat), 1:n)
    md = vcat(v'...)

    me = md .* r_hat; mmd = md' * md
    avar = mmd \ (me' * me) / mmd

    se = sqrt.(diag(avar));

    return (βhat = βhat, se = se )
end

nls (generic function with 1 method)

In [10]:
nls(m_fn,y,x)

(βhat = [8.237881250619758, 0.21694719413437033], se = [1.4995775264787221, 0.17344780809640403])