# Maximum Likelihood Estimation: The Normal Linear Model

The following tutorial will introduce maximum likelihood estimation in Julia for the normal linear model.

The normal linear model (sometimes referred to as the OLS model) is the workhorse of regression modeling and is utilized across a number of diverse fields. In this tutorial, we will utilize simulated data to demonstrate how Julia can be used to recover the parameters of interest.

The first order of business is to use the `Optim` package and also include the `NLSolversBase` routine:

In [2]:
using Optim, NLSolversBase, Random
using LinearAlgebra: diag
Random.seed!(0);                            # Fix random seed generator for reproducibility

In [3]:
nₒᵦₛ = 500 # Number of observations
nᵥₐᵣ = 1; # Number of variables

The first item that needs to be addressed is the `data generating process` or `DGP`. The following code will produce data from a normal linear model:

In [4]:
β = ones(nᵥₐᵣ)*3.0                  # True coefficients
x = [ones(nₒᵦₛ) randn(nₒᵦₛ,nᵥₐᵣ-1)]  # X matrix of explanatory variables plus constant
ε = randn(nₒᵦₛ)*0.5;                 # Error variance

In [5]:
y = x*β + ε;                        # Generate Data

In the above example, we have 500 observations, 2 explanatory variables plus an intercept, an error variance equal to 0.5, coefficients equal to 3.0, and all of these are subject to change by the user. Since we know the true value of these parameters, we should obtain these values when we maximize the likelihood function.

The next step in our tutorial is to define a Julia function for the `likelihood function`. The following function defines the `likelihood function` for the `normal linear model`:

In [6]:
function Log_Likelihood(X, Y, β, log_σ, n)
    σ = exp(log_σ)
    llike = -n/2*log(2π) - n/2* log(σ^2) - (sum((Y - X * β).^2) / (2σ^2))
    llike = -llike
end

Log_Likelihood (generic function with 1 method)

In [7]:
Log_Likelihood(X,Y, pars) = Log_Likelihood(X,Y, pars...)

Log_Likelihood (generic function with 2 methods)

The log likelihood function accepts 4 inputs: the matrix of explanatory variables (X), the dependent variable (Y), the β's, and the error varicance. Note that we exponentiate the error variance in the second line of the code because the error variance cannot be negative and we want to avoid this situation when maximizing the likelihood.

The next step in our tutorial is to optimize our function. We first use the `TwiceDifferentiable` command in order to obtain the `Hessian` matrix later on, which will be used to help form `t-statistics`:

In [8]:
optimizingFunction = TwiceDifferentiable(
        vars -> Log_Likelihood(x, y, [vars[1:nᵥₐᵣ],vars[nᵥₐᵣ + 1], nₒᵦₛ])
            , ones(nᵥₐᵣ+1); autodiff=:forward
    );

In [9]:
opt = optimize(optimizingFunction, ones(nᵥₐᵣ+1))

 * Status: success

 * Candidate solution
    Final objective value:     3.767039e+02

 * Found with
    Algorithm:     Newton's Method

 * Convergence measures
    |x - x'|               = 2.15e-06 ≰ 0.0e+00
    |x - x'|/|x'|          = 7.07e-07 ≰ 0.0e+00
    |f(x) - f(x')|         = 6.11e-09 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.62e-11 ≰ 0.0e+00
    |g(x)|                 = 1.97e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    6
    f(x) calls:    30
    ∇f(x) calls:   30
    ∇²f(x) calls:  6


The optimization routine stores several quantities and we can obtain the `maximum likelihood estimates` with the following command:

In [10]:
parameters = Optim.minimizer(opt)

2-element Vector{Float64}:
  3.043104697731327
 -0.6655307426013836

In [12]:
parameters[nᵥₐᵣ+1] = exp(parameters[nᵥₐᵣ+1])

0.5140006532561675

In order to obtain the correct Hessian matrix, we have to "push" the actual parameter values that maximizes the likelihood function since the TwiceDifferentiable command uses the next to last values to calculate the Hessian:

In [14]:
numerical_hessian = hessian!(optimizingFunction,parameters)

2×2 Matrix{Float64}:
 178.861         3.72539e-10
   3.72539e-10  94.5088

We can now invert our Hessian matrix to obtain the variance-covariance matrix:

In [15]:
var_cov_matrix = inv(numerical_hessian)

2×2 Matrix{Float64}:
  0.00559095   -2.20387e-14
 -2.20387e-14   0.010581

In this example, we are only interested in the statistical significance of the coefficient estimates so we obtain those with the following command:

In [16]:
β = parameters[1:nᵥₐᵣ]

1-element Vector{Float64}:
 3.043104697731327

We now need to obtain those elements of the variance-covariance matrix needed to obtain our t-statistics, and we can do this with the following commands:

In [18]:
temp = diag(var_cov_matrix)
temp1 = temp[1:nᵥₐᵣ]

1-element Vector{Float64}:
 0.005590945907491946

The t-statistics are formed by dividing element-by-element the coefficients by their standard errors, or the square root of the diagonal elements of the variance-covariance matrix:

In [19]:
t_stats = β./sqrt.(temp1)

1-element Vector{Float64}:
 40.69811074655974

# References
- [ ] [Maximum Likelihood: Normal Linear Model](https://discourse.julialang.org/t/maximum-likelihood-normal-linear-model/11664)