## BIOSTAT 257 Homework 2

Consider a linear mixed effects model

$$Y_i = X_i\beta + Z_i\gamma + \epsilon_i, \quad i = 1,\ldots,n$$

where

- $Y_i \in \mathbb{R}^{n_i}$ is the reponse vector of the $i$-th individual,
- $X_i \in \mathbb{R}^{n_i \times p}$ is the fixed effect predictor matrix of  $i$-th individual,
- $Z_i \in \mathbb{R}^{n_i \times q}$ is the random effect predictor matrix of  $i$-th individual, 
- $\epsilon_i \in \mathbb{R}^{n_i}$ are multivariate normal $N(0_{n_i}, \sigma^2 I_{n_i})$,
- $\beta \in \mathbb{R}^{p}$ are fixed effects, and
- $\gamma \in \mathbb{R}^{q}$ are random effects assumed to be $N(0_{q}, \Sigma_{q \times q})$ independent of $\epsilon_i$.

### Question 1: Formula

Write down the log-likelihood of the  $i$-th datum  $(Y_i,X_i,Z_i)$  given parameters $(\beta,\Sigma,\sigma^2)$.

The marginal distribution of $Y_i \sim N(X_i \beta, Z_i \Sigma Z_i^T + \sigma^2 I_{n_i})$

$$\ell(\beta,\Sigma,\sigma^2) = -\frac{n_i}{2}\text{log}(2\pi) - \frac{1}{2}\text{log}|Z_i \Sigma Z_i^T + \sigma^2 I_{n_i}| - \frac{1}{2}(Y_i - X_i \beta)^T(Z_i \Sigma Z_i^T + \sigma^2 I_{n_i})^{-1}(Y_i - X_i \beta)$$

We can use Woodbury to help us calculate the inverse of the covariance matrix as

$$(Z_i \Sigma Z_i^T + \sigma^2 I_{n_i})^{-1} = \frac{1}{\sigma^2}I -\frac{1}{\sigma^4}Z\Bigg(\Sigma^{-1} + \frac{1}{\sigma^2}Z^TZ\Bigg)^{-1}Z^T$$

and we can use the subsequent properties of determinants to 

$$\text{det}(\sigma^2I + Z\Sigma Z^T) = \sigma^{2n}\text{det}(\Sigma)\text{det}\Bigg(\Sigma^{-1} +\frac{1}{\sigma^2}Z^TZ\Bigg
)$$

### Question 2: Start-up Code

Use the following template to define a type `LmmObs` that holds an LMM datum $(y_i,X_i,Z_i)$.

In [16]:
# define a type that holds LMM datum
struct LmmObs{T <: AbstractFloat}
    # data
    y :: Vector{T}
    X :: Matrix{T}
    Z :: Matrix{T}
    # working arrays
    # whatever intermediate arrays you may want to pre-allocate
    res        :: Vector{T}
    storage_q  :: Vector{T}
    ztz        :: Matrix{T}
    storage_qq :: Matrix{T}
end

# constructor
function LmmObs(
        y::Vector{T}, 
        X::Matrix{T}, 
        Z::Matrix{T}) where T <: AbstractFloat
    res        = similar(y)
    storage_q  = Vector{T}(undef, size(Z, 2))
    ztz        = transpose(Z) * Z
    storage_qq = similar(ztz)
    LmmObs(y, X, Z, res, storage_q, ztz, storage_qq)
end


LmmObs

Write a function, with interface `logl!(obs, β, L, σ²)` that evaluates the log-likelihood of the $i$-th datum. Here `L` is the lower triangular Cholesky factor from the Cholesky decomposition `Σ=LL'`. Make your code efficient in the $n_i≫q$ case. Think the intensive longitudinal measurement setting.

In [50]:
using BenchmarkTools, Distributions, LinearAlgebra, Random

function logl!(
        obs :: LmmObs{T}, 
        β   :: Vector{T}, 
        L   :: Matrix{T}, 
        σ²  :: T) where T <: AbstractFloat
    n, p, q = size(obs.X, 1), size(obs.X, 2), size(obs.Z, 2)    
    # TODO: compute and return the log-likelihood
    res = y - X * β
    ztz = transpose(Z) * Z
    c = -(n/2) * log(2 * π)
    logdet = n * log(σ²) + 2 * LinearAlgebra.logdet(L) + LinearAlgebra.logdet(inv(L * L') + (1/σ²) * ztz)
    quad = res' * inv(Z * L * L' * Z' + σ² * I) * res
    l = c - (1/2) * logdet - (1/2) * quad
    return l
end

logl! (generic function with 1 method)

Hint: This function shouldn't be very long. Mine, obeying 80-character rule, is 25 lines. If you find yourself writing very long code, you're on the wrong track. Think about algorithm first then use BLAS functions to reduce memory allocations.

### Question 3: Correctness

Compare your result (both accuracy and timing) to the Distributions.jl package using following data.


In [51]:
Random.seed!(257)
# dimension
n, p, q = 2000, 5, 3
# predictors
X  = [ones(n) randn(n, p - 1)]
Z  = [ones(n) randn(n, q - 1)]
# parameter values
β  = [2.0; -1.0; rand(p - 2)]
σ² = 1.5
Σ  = fill(0.1, q, q) + 0.9I
# generate y
y  = X * β + Z * rand(MvNormal(Σ)) + sqrt(σ²) * randn(n)

# form an LmmObs object
obs = LmmObs(y, X, Z)

LmmObs{Float64}([5.739048710854997, 5.705395720270055, 2.7368899643050355, 1.4201223592870755, -0.2099433929180451, 3.5886971824690486, -1.3778538474575956, -0.08406026821055246, -2.208007878450787, 1.309558511583542  …  1.2947876180172684, -1.9701265304395086, -2.040383092851745, -1.4590296825658675, 0.18616271231054726, 1.0681247149968018, 2.2292080864625254, 1.1952385354603545, 1.1310626949609701, -0.43507816286713785], [1.0 -2.506566300781151 … 0.5863780184080776 1.1092991040518192; 1.0 -0.974090320735282 … 1.4143507320583761 0.45608259198567447; … ; 1.0 -1.0076371084863895 … -1.3241972696483915 1.4547609424344008; 1.0 0.38036793320364776 … -0.5857507269707397 1.796804266836504], [1.0 -0.6380567326757537 1.4738982136806946; 1.0 -2.0711110232845926 0.21422658785510312; … ; 1.0 0.5917731507133951 -0.9163364468263059; 1.0 0.9463732120394507 -0.325860403600768], [0.9505714856316271, 0.37092858612266166, -0.44449895072661133, 0.7146876210613555, 0.16037645936873557, 0.6082404954570331, 

This is the standard way to evaluate log-density of a multivariate normal, using the Distributions.jl package. Let's evaluate the log-likelihood of this datum.

In [52]:
μ  = X * β
Ω  = Z * Σ * transpose(Z) +  σ² * I
mvn = MvNormal(μ, Symmetric(Ω)) # MVN(μ, Σ)
logpdf(mvn, y)

-3247.456858063827

Check that your answer matches that from Distributions.jl

In [53]:
L = Matrix(cholesky(Σ).L)
logl!(obs, β, L, σ²)

-3247.456858063826

You will lose all 15 + 30 + 30 = 75 points if the following statement throws AssertError.

In [54]:
@assert logl!(obs, β, Matrix(cholesky(Σ).L), σ²) ≈ logpdf(mvn, y)

### Question 4: Efficiency

Benchmarking your code and compare to the Distributions.jl function `logpdf`.

In [55]:
# benchmark the `logpdf` function in Distribution.jl
bm1 = @benchmark logpdf($mvn, $y)

BenchmarkTools.Trial: 
  memory estimate:  30.55 MiB
  allocs estimate:  5
  --------------
  minimum time:     12.160 ms (0.00% GC)
  median time:      12.785 ms (0.00% GC)
  mean time:        15.033 ms (12.64% GC)
  maximum time:     28.803 ms (41.98% GC)
  --------------
  samples:          334
  evals/sample:     1

In [56]:
# benchmark your implementation
L = Matrix(cholesky(Σ).L)
bm2 = @benchmark logl!($obs, $β, $L, $σ²)

BenchmarkTools.Trial: 
  memory estimate:  92.69 MiB
  allocs estimate:  47
  --------------
  minimum time:     182.091 ms (0.00% GC)
  median time:      201.078 ms (3.16% GC)
  mean time:        206.315 ms (2.93% GC)
  maximum time:     251.909 ms (5.56% GC)
  --------------
  samples:          25
  evals/sample:     1

The points you will get is
$$\frac{x}{1000} \times 30,$$

where $x$ is the speedup of your program against the standard method.

In [57]:
# this is the points you'll get
clamp(median(bm1).time / median(bm2).time / 1000 * 30, 0, 30)

0.0019074081911456295

Hint: Apparently I am using 1000 as denominator because I expect your code to be at least  1000×  faster than the standard method.

### Question 5: Memory

You want to avoid memory allocation in the "hot" function `logl!`. You will lose 1 point for each `1 KiB = 1024 bytes` memory allocation. In other words, the points you get for this question is

In [134]:
clamp(30 - median(bm2).memory / 1024, 0, 30)

0.0

Hint: I am able to reduce the memory allocation to 0 bytes.

### Question 6: Misc.

Coding style, Git workflow, etc. For reproducibity, make sure we (TA and myself) can run your Jupyter Notebook. That is how we grade Q4 and Q5. If we cannot run it, you will get zero points.