# Introduction

This notebook contains my solution to the [week 3](https://github.com/rmcelreath/statrethinking_winter2019/blob/master/homework/week03.pdf) homework.

# Setup

In [1]:
cd("../")

# Packages

In [2]:
include("./src/Utils.jl")
using .Utils

In [3]:
using RCall
RCall.rcall_p(:options, rcalljl_options=Dict(:width => 800, :height => 400))
R"""
require(rethinking)
require(ggplot2)
require(dplyr)
"""
;

│ Loading required package: rstan
│ Loading required package: StanHeaders
│ Loading required package: ggplot2
│ rstan (Version 2.19.3, GitRev: 2e1f913d3ca3)
│ For execution on a local, multicore CPU with excess RAM we recommend calling
│ options(mc.cores = parallel::detectCores()).
│ To avoid recompilation of unchanged Stan programs, we recommend calling
│ rstan_options(auto_write = TRUE)
│ Loading required package: parallel
│ Loading required package: dagitty
│ rethinking (Version 2.00)
│ 
│ Attaching package: ‘rethinking’
│ 
│ The following object is masked from ‘package:stats’:
│ 
│     rstudent
│ 
└ @ RCall /home/user/.julia/packages/RCall/paaBQ/src/io.jl:113
│ 
│ Attaching package: ‘dplyr’
│ 
│ The following objects are masked from ‘package:stats’:
│ 
│     filter, lag
│ 
│ The following objects are masked from ‘package:base’:
│ 
│     intersect, setdiff, setequal, union
│ 
└ @ RCall /home/user/.julia/packages/RCall/paaBQ/src/io.jl:113


In [4]:
using StatsBase, Statistics, DataFrames, Distributions

In [5]:
using LinearAlgebra

# Data

In [6]:
foxes = get_data("foxes")
first(foxes, 5)

Unnamed: 0_level_0,group,avgfood,groupsize,area,weight
Unnamed: 0_level_1,Int64,Float64,Int64,Float64,Float64
1,1,0.37,2,1.09,5.02
2,1,0.37,2,1.09,2.84
3,2,0.53,2,2.05,5.33
4,2,0.53,2,2.05,6.07
5,3,0.49,2,2.12,5.85


Standardize the variables:

In [7]:
columns = filter(col -> !(col == :group), names(foxes))

foxes_std = standardize(ZScoreTransform, Matrix(foxes[!, columns]), dims=1, center=true, scale=true) |>
    m -> DataFrame(m, columns)

first(foxes_std, 5)

Unnamed: 0_level_0,avgfood,groupsize,area,weight
Unnamed: 0_level_1,Float64,Float64,Float64,Float64
1,-1.92483,-1.52409,-2.2396,0.414135
2,-1.92483,-1.52409,-2.2396,-1.42705
3,-1.11804,-1.52409,-1.20551,0.675954
4,-1.11804,-1.52409,-1.20551,1.30094
5,-1.31973,-1.52409,-1.13011,1.11513


# Questions

The DAG:

<img src="https://i.ibb.co/92hQJnh/dag.png" alt="dag" border="0">

## Question 1

**Question**

<img src="https://i.ibb.co/hFGKYxF/question1.png" alt="question1" border="0">

**Solution**

In [8]:
# variables for α, β_area, σ
vars = (VariableSpecification(-100, 100, Normal(0, 0.2)),
        VariableSpecification(-10, 10, Normal(0, 0.5)),
        VariableSpecification(0, 100, Exponential(1)),
    )
;

In [9]:
build_ll_data(data) = (α, β_area, σ) -> begin
    logprobs = map(data) do (area, weight)
        μ = α + β_area * area
        d = Normal(μ, σ)
        logpdf(d, weight)
    end
    
    logprobs |> sum
end;

ll_data = build_ll_data(zip(foxes_std.area,
                            foxes_std.weight
        ));

In [10]:
build_l_joint_priors(priors::NTuple{3, Distribution}) = 
    (α, β_area, σ) -> logpdf.(priors, (α, β_area, σ)) |> sum

l_joint_priors = build_l_joint_priors(map(v -> v.prior, vars));

In [11]:
# the objective function
f(α, β_area, σ) = ll_data(α, β_area, σ) + l_joint_priors(α, β_area, σ);
f(xs::Vector) = f(xs...)

f (generic function with 2 methods)

In [12]:
soln_q1, covarmat_q1 = quap(f, vars);


******************************************************************************
This program contains Ipopt, a library for large-scale nonlinear optimization.
 Ipopt is released as open source code under the Eclipse Public License (EPL).
         For more information visit http://projects.coin-or.org/Ipopt
******************************************************************************

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        3
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        3
                     variables with only upper bounds:        0
Total number of equ

Mean values of the parameters:

In [13]:
round.(soln_q1, digits=2)

3-element Array{Float64,1}:
 -0.0
  0.02
  0.99

Standard deviation of the parameters:

In [14]:
round.(covarmat_q1 |> diag .|> sqrt, digits=2)

3-element Array{Float64,1}:
 0.08
 0.09
 0.06

PI of parameters:

In [15]:
sample_params = rand(MultivariateNormal(soln_q1, covarmat_q1), 10_000)

map(eachrow(sample_params)) do (row)
    R"""
        PI($row, prob=0.89) %>%
            as.list %>%
            as_tibble
    """ |>
    rcopy
end |>
    dfs -> vcat(dfs...) .|>
    x -> round(x, digits=2)

Unnamed: 0_level_0,5%,94%
Unnamed: 0_level_1,Float64,Float64
1,-0.14,0.13
2,-0.13,0.16
3,0.89,1.09


Territory size, $\beta_\text{area}$, seem to have no total causal influence on weight, at least not in this sample.

# Question 2

**Question**

<img src="https://i.ibb.co/sg7yZMN/question2.png" alt="question2" border="0">

**Solution**

In [16]:
# variables for α, β_food, σ
vars = (VariableSpecification(-100, 100, Normal(0, 0.2)),
        VariableSpecification(-10, 10, Normal(0, 0.5)),
        VariableSpecification(0, 100, Exponential(1)),
    )
;

In [17]:
build_ll_data(data) = (α, β_food, σ) -> begin
    logprobs = map(data) do (food, weight)
        μ = α + β_food * food
        d = Normal(μ, σ)
        logpdf(d, weight)
    end
    
    logprobs |> sum
end;

ll_data = build_ll_data(zip(foxes_std.avgfood,
                            foxes_std.weight
        ));

In [18]:
build_l_joint_priors(priors::NTuple{3, Distribution}) = 
    (α, β_food, σ) -> logpdf.(priors, (α, β_food, σ)) |> sum

l_joint_priors = build_l_joint_priors(map(v -> v.prior, vars));

In [19]:
# the objective function
f(α, β_food, σ) = ll_data(α, β_food, σ) + l_joint_priors(α, β_food, σ);
f(xs::Vector) = f(xs...)

f (generic function with 2 methods)

In [20]:
soln_q2, covarmat_q2 = quap(f, vars);

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        3
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        3
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 

In [21]:
round.(soln_q2, digits=2)

3-element Array{Float64,1}:
 -0.0
 -0.02
  0.99

In [22]:
round.(covarmat_q2 |> diag .|> sqrt, digits=2)

3-element Array{Float64,1}:
 0.08
 0.09
 0.06

In [23]:
sample_params = rand(MultivariateNormal(soln_q2, covarmat_q2), 10_000)

map(eachrow(sample_params)) do (row)
    R"""
        PI($row, prob=0.89) %>%
            as.list %>%
            as_tibble
    """ |>
    rcopy
end |>
    dfs -> vcat(dfs...) .|>
    x -> round(x, digits=2)

Unnamed: 0_level_0,5%,94%
Unnamed: 0_level_1,Float64,Float64
1,-0.13,0.13
2,-0.17,0.12
3,0.89,1.09


Again nothing. Adding food does not change weight.

This shouldn't surprise you, if the DAG is correct, because `area` is an upstream of `avgfood`.

# Question 3

**Question**

<img src="https://i.ibb.co/qmgZhnF/question3.png" alt="question3" border="0">

**Solution**

In [24]:
first(foxes, 5)

Unnamed: 0_level_0,group,avgfood,groupsize,area,weight
Unnamed: 0_level_1,Int64,Float64,Int64,Float64,Float64
1,1,0.37,2,1.09,5.02
2,1,0.37,2,1.09,2.84
3,2,0.53,2,2.05,5.33
4,2,0.53,2,2.05,6.07
5,3,0.49,2,2.12,5.85


Paths from `groupsize` to `weight`:

1. `groupsize` -> `weight` : direct
2. `groupsize` <- `avgfood` -> `weight` : backdoor (fork)

By the [backdoor path criterion](https://medium.com/towards-artificial-intelligence/3-ways-linear-models-can-lead-to-erroneous-conclusions-842637fe122b), we need to include `avgfood` to shut this path.

In [25]:
# variables for α, β_food, β_size, σ
vars = (VariableSpecification(-100, 100, Normal(0, 0.2)),
        VariableSpecification(-10, 10, Normal(0, 0.5)),
        VariableSpecification(-10, 10, Normal(0, 0.5)),
        VariableSpecification(0, 100, Exponential(1)),
    )
;

In [26]:
build_ll_data(data) = (α, β_food, β_size, σ) -> begin
    logprobs = map(data) do (food, size, weight)
        μ = α + β_food * food + β_size * size
        d = Normal(μ, σ)
        logpdf(d, weight)
    end
    
    logprobs |> sum
end;

ll_data = build_ll_data(zip(foxes_std.avgfood,
                            foxes_std.groupsize,
                            foxes_std.weight
        ));

In [27]:
build_l_joint_priors(priors::NTuple{4, Distribution}) = 
    (α, β_food, β_size, σ) -> logpdf.(priors, (α, β_food, β_size, σ)) |> sum

l_joint_priors = build_l_joint_priors(map(v -> v.prior, vars));

In [28]:
f(α, β_food, β_size, σ) = ll_data(α, β_food, β_size, σ) + l_joint_priors(α, β_food, β_size, σ);
f(xs::Vector) = f(xs...)

f (generic function with 3 methods)

In [29]:
soln_q3, covarmat_q3 = quap(f, vars);

This is Ipopt version 3.12.10, running with linear solver mumps.
NOTE: Other linear solvers might be more efficient (see Ipopt documentation).

Number of nonzeros in equality constraint Jacobian...:        0
Number of nonzeros in inequality constraint Jacobian.:        0
Number of nonzeros in Lagrangian Hessian.............:        0

Total number of variables............................:        4
                     variables with only lower bounds:        0
                variables with lower and upper bounds:        4
                     variables with only upper bounds:        0
Total number of equality constraints.................:        0
Total number of inequality constraints...............:        0
        inequality constraints with only lower bounds:        0
   inequality constraints with lower and upper bounds:        0
        inequality constraints with only upper bounds:        0

iter    objective    inf_pr   inf_du lg(mu)  ||d||  lg(rg) alpha_du alpha_pr  ls
   0 

In [30]:
round.(soln_q3, digits=2)

4-element Array{Float64,1}:
 -0.0
  0.48
 -0.57
  0.94

In [31]:
round.(covarmat_q3 |> diag .|> sqrt, digits=2)

4-element Array{Float64,1}:
 0.08
 0.18
 0.18
 0.06

In [32]:
sample_params = rand(MultivariateNormal(soln_q3, covarmat_q3), 10_000)

map(eachrow(sample_params)) do (row)
    R"""
        PI($row, prob=0.89) %>%
            as.list %>%
            as_tibble
    """ |>
    rcopy
end |>
    dfs -> vcat(dfs...) .|>
    x -> round(x, digits=2)

Unnamed: 0_level_0,5%,94%
Unnamed: 0_level_1,Float64,Float64
1,-0.13,0.13
2,0.18,0.76
3,-0.86,-0.28
4,0.85,1.04


In [33]:
first(foxes, 5)

Unnamed: 0_level_0,group,avgfood,groupsize,area,weight
Unnamed: 0_level_1,Int64,Float64,Int64,Float64,Float64
1,1,0.37,2,1.09,5.02
2,1,0.37,2,1.09,2.84
3,2,0.53,2,2.05,5.33
4,2,0.53,2,2.05,6.07
5,3,0.49,2,2.12,5.85


It looks like `groupsize` is negatively associated with `weight`, controlling for food. Similarly, `avgfood` is positively associated with `weight`, controlling for `groupsize`.

So the causal influence of `groupsize` is to reduce `weight` - less food for each fox. And the *direct causal influence* of `avgfood` is positive, of course. But the *total causal* influence of `avgfood` is still nothing, since it causes larger groups.

This is a masking effect, like in the milk energy example. But the causal explanation here is that more foxes move into a territory until the food available to each is no better than teh food in a neighboring territory. Every territory ends up equally good/bad on average. This is known in behavioral ecology as an "ideal free distribution".