<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc" style="margin-top: 1em;"><ul class="toc-item"><li><span><a href="#Packages" data-toc-modified-id="Packages-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Packages</a></span></li><li><span><a href="#Setup" data-toc-modified-id="Setup-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Setup</a></span></li><li><span><a href="#Data" data-toc-modified-id="Data-3"><span class="toc-item-num">3&nbsp;&nbsp;</span>Data</a></span></li><li><span><a href="#Question-1" data-toc-modified-id="Question-1-4"><span class="toc-item-num">4&nbsp;&nbsp;</span>Question 1</a></span></li><li><span><a href="#Question-2" data-toc-modified-id="Question-2-5"><span class="toc-item-num">5&nbsp;&nbsp;</span>Question 2</a></span></li></ul></div>

# Packages

In [185]:
using RCall
RCall.rcall_p(:options, rcalljl_options=Dict(:width => 800, :height => 400));

In [186]:
using Distributions, Statistics

In [187]:
using JuMP, Ipopt, ForwardDiff

# Setup

In [203]:
R"""
require(rethinking)
require(ggplot2)
require(dplyr)
"""

│ 
│ Attaching package: 'dplyr'
│ 
│ The following objects are masked from 'package:stats':
│ 
│     filter, lag
│ 
│ The following objects are masked from 'package:base':
│ 
│     intersect, setdiff, setequal, union
│ 
└ @ RCall C:\Users\mshukri\.julia\packages\RCall\lAV2K\src\io.jl:113


RObject{LglSxp}
[1] TRUE


# Data

Get the Howel1 dataset:

In [189]:
file_url = "https://raw.githubusercontent.com/rmcelreath/rethinking/master/data/Howell1.csv"
df = HTTP.get(file_url).body |> IOBuffer |> CSV.read
head(df, 5)

│   caller = top-level scope at In[189]:3
└ @ Core In[189]:3


Unnamed: 0_level_0,height,weight,age,male
Unnamed: 0_level_1,Float64,Float64,Float64,Int64
1,151.765,47.8256,63.0,1
2,139.7,36.4858,63.0,0
3,136.525,31.8648,65.0,0
4,156.845,53.0419,41.0,1
5,145.415,41.2769,51.0,0


# Question 1

**Question**

<img src="https://i.ibb.co/tLyJSsz/Untitled.png" alt="Untitled" border="0">

**Solution**

Get the adult heights:

In [190]:
adults = df |>
    @filter(_.age >= 18) |>
    DataFrame

first(adults, 5)

Unnamed: 0_level_0,height,weight,age,male
Unnamed: 0_level_1,Float64,Float64,Float64,Int64
1,151.765,47.8256,63.0,1
2,139.7,36.4858,63.0,0
3,136.525,31.8648,65.0,0
4,156.845,53.0419,41.0,1
5,145.415,41.2769,51.0,0


Build a linear regression model relating height with weight:

In [191]:
# the data in (x, y) format
data = zip(adults.weight, adults.height)
x̄ = mean(adults.weight)

# define the priors
prior_α = Normal(178, 20)
prior_β = LogNormal(0, 1)
prior_σ = Uniform(0, 50)

# log likelihood of the data
ll_data(α, β, σ) = begin
    log_probs = map(data) do (x, y)
        μ = α + β * (x - x̄)
        d = Normal(μ, σ)
        logpdf(d, y)
    end
    log_probs |> sum
end

# log of the joint probability of the priors, assuming independence
l_joint_priors(α, β, σ) = logpdf(prior_α, α) +
    logpdf(prior_β, β) +
    logpdf(prior_σ, σ)

objective_fn(α, β, σ) = ll_data(α, β, σ) + l_joint_priors(α, β, σ)


# maximize the objective function
model = Model(with_optimizer(Ipopt.Optimizer))

register(model, :objective_fn, 3, objective_fn, autodiff=true)

@variable(model, 98 <= α <= 258, start = rand(prior_α))
@variable(model, 0 <= β <= 10, start = rand(prior_β))
@variable(model, 0 <= σ <= 100, start = rand(prior_σ))

@NLobjective(model, Max, objective_fn(α, β, σ))

optimize!(model)

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        3
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        3
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du a

Extract the covariance matrix and optimal points to build the posterior distribution:

In [192]:
optimal_points = [α, β, σ] .|> value

f(x::Vector) = begin
    α, β, σ = x
    objective_fn(α, β, σ)
end

H(x::Vector) = ForwardDiff.hessian(f, x)

covar_mat = inv(-1 * H(optimal_points)) .|> 
    x -> round(x, digits=5)

posterior_d = MultivariateNormal(optimal_points, covar_mat)

FullNormal(
dim: 3
μ: [154.60136751206537, 0.903280886327661, 5.071880332262239]
Σ: [0.07307 -0.0 6.0e-5; -0.0 0.00176 -3.0e-5; 6.0e-5 -3.0e-5 0.03654]
)


Make predictions on the given individuals:

In [193]:
predict_heights(α, β, σ) = begin
    sample_weights = [45, 40, 65, 31, 53]
    μs = α .+ β .* (sample_weights .- x̄)
    ds = Normal.(μs, σ)
    ys = rand.(ds)
end

summarize_predictions(preds) = vcat(mean(preds), (rcall(:PI, preds, prob=0.89) |> rcopy)...);

In [194]:
sample_params = rand(posterior_d, 1000)

results = map(eachcol(sample_params)) do (α, β, σ)
   predict_heights(α, β, σ) 
end |> 
    m -> hcat(m...)' |>
    eachcol .|>
    summarize_predictions |>
    m -> hcat(m...)' .|>
    x -> round(x, digits=1)

5×3 Array{Float64,2}:
 154.9  146.8  162.7
 150.1  142.0  158.3
 172.6  164.4  180.7
 142.0  133.9  149.9
 162.1  154.0  170.0

Make each results nicer:

In [195]:
hcat([45, 40, 65, 31, 53], results) |>
    m -> DataFrame(m, [:weight, :Eh, :L89, :U89])

Unnamed: 0_level_0,weight,Eh,L89,U89
Unnamed: 0_level_1,Float64,Float64,Float64,Float64
1,45.0,154.9,146.8,162.7
2,40.0,150.1,142.0,158.3
3,65.0,172.6,164.4,180.7
4,31.0,142.0,133.9,149.9
5,53.0,162.1,154.0,170.0


# Question 2

**Question**
<img src="https://i.ibb.co/rbvHT5P/Untitled.png" alt="Untitled" border="0">

**Solution**

Build a linear model:

In [199]:
# the data in (x, y) format
data = zip(log.(df.weight), df.height)
x̄ = mean(log.(df.weight))

# define the priors
prior_α = Normal(178, 20)
prior_β = LogNormal(0, 1)
prior_σ = Uniform(0, 50)

# log likelihood of the data
ll_data(α, β, σ) = begin
    log_probs = map(data) do (x, y)
        μ = α + β * (x - x̄)
        d = Normal(μ, σ)
        logpdf(d, y)
    end
    log_probs |> sum
end

# log of the joint probability of the priors, assuming independence
l_joint_priors(α, β, σ) = logpdf(prior_α, α) +
    logpdf(prior_β, β) +
    logpdf(prior_σ, σ)

objective_fn(α, β, σ) = ll_data(α, β, σ) + l_joint_priors(α, β, σ)


# maximize the objective function
model = Model(with_optimizer(Ipopt.Optimizer))

register(model, :objective_fn, 3, objective_fn, autodiff=true)

@variable(model, 98 <= α <= 258, start = rand(prior_α))
@variable(model, 0 <= β <= 100, start = rand(prior_β))
@variable(model, 0 <= σ <= 100, start = rand(prior_σ))

@NLobjective(model, Max, objective_fn(α, β, σ))

optimize!(model)

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        3
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        3
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du a

In [202]:
optimal_points = [α, β, σ] .|> value

f(x::Vector) = begin
    α, β, σ = x
    objective_fn(α, β, σ)
end

H(x::Vector) = ForwardDiff.hessian(f, x)

covar_mat = inv(-1 * H(optimal_points)) .|> 
    x -> round(x, digits=5)

posterior_d = MultivariateNormal(optimal_points, covar_mat)

FullNormal(
dim: 3
μ: [138.26841036790285, 47.07112766715351, 5.134717086698649]
Σ: [0.04846 -0.0 5.0e-5; -0.0 0.14641 -0.00014; 5.0e-5 -0.00014 0.02423]
)


Plot the results: