In [27]:
using Pkg
#Pkg.add("NBInclude")
#Pkg.add("ProgressMeter")

[32m[1m Resolving[22m[39m package versions...
[32m[1m  Updating[22m[39m `/opt/julia/environments/v1.1/Project.toml`
 [90m [92933f4c][39m[92m + ProgressMeter v1.1.0[39m
[32m[1m  Updating[22m[39m `/opt/julia/environments/v1.1/Manifest.toml`
[90m [no changes][39m


Latex Macros:
$$
\newcommand{\E}{\text{E}}
\newcommand{\mbf}{\mathbf}
\newcommand{\bs}{\boldsymbol}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\Var}{\text{Var}}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\A}[1]{\mathbf{A}_{#1}}
\newcommand{\Ai}[1]{\mathbf{A}^{#1}}
$$

In [1]:
macro javascript_str(s) display("text/javascript", s); end
javascript"""
    MathJax.Hub.Config({
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });
    MathJax.Hub.Queue( 
        ["resetEquationNumbers",MathJax.InputJax.TeX], 
        ["PreProcess",MathJax.Hub], 
        ["Reprocess",MathJax.Hub] 
    );
"""

# Gibbs Sampling of Fixed and Random Effects

### Extension for SNP Effects


We will initially consider a univariate mixed linear model of the form:

\begin{equation}
\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Zu} + \mathbf{W}\boldsymbol{\alpha} + \mathbf{e},
\label{eq:model}
\end{equation}

where $\boldsymbol{\beta}$ is a vector of fixed effects, $\mathbf{X}$ is an observed matrix that relates $\boldsymbol{\beta}$ to $\mathbf{y}$, $\mathbf{u}$ is a vector of multivariate normal random effects with null means and covariance matrix $\mathbf{G}\sigma^2_u$, $\mathbf{Z}$ is an observed matrix that relates $\mathbf{u}$ to $\mathbf{y}$, $\boldsymbol{\alpha}$ is a vector of random SNP effects with null means and covariance matrix $\mathbf{I}\sigma^2_{\alpha}$, $\mathbf{W}$ is an observed matrix of SNP covariates, and $\mathbf{e}$ is a vector of multivariate normal residuals with null means and covariance matrix $\mathbf{R}\sigma^2_e$. The fixed effects are usually assigned a flat prior distribution, and the matrices $\mathbf{G}$ and $\mathbf{R}$ are assumed to be known. It is assumed that vectors $\mbf{u}$, $\bs{\alpha}$ and $\mbf{e}$ are mutually independent. The variance components $\sigma^2_u$, $\sigma^2_{\alpha}$ and $\sigma^2_e$ are assigned scaled inverted chi-square prior distributions.

The MME for this model are:

\begin{equation}
\begin{bmatrix}
\mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & 
\mathbf{X}'\mathbf{R}^{-1}\mathbf{Z} &
\mathbf{X}'\mathbf{R}^{-1}\mathbf{W} \\
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{X} & 
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{Z} + \frac{\sigma_{e}^{2}}{\sigma_u^2}\mathbf{G}^{-1}&
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{W} \\
\mathbf{W}'\mathbf{R}^{-1}\mathbf{X} & 
\mathbf{W}'\mathbf{R}^{-1}\mathbf{Z} &
\mathbf{W}'\mathbf{R}^{-1}\mathbf{W} + \frac{\sigma_{e}^{2}}{\sigma_{\alpha}^2}\mathbf{I}       
\end{bmatrix}
\begin{bmatrix}
\hat{\bs{\beta}}\\
\hat{\mbf{u}} \\
\hat{\bs{\alpha}}      
\end{bmatrix}
=
\begin{bmatrix}
\mathbf{X}'\mathbf{R}^{-1}\mathbf{y}\\
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{y}\\
\mathbf{W}'\mathbf{R}^{-1}\mathbf{y}        
\end{bmatrix}.
\label{eq:MME}
\end{equation}

The rows and columns in (\ref{eq:MME}) corresponding to $\hat{\bs{\alpha}}$ are dense. Thus, an alternative sampling scheme is described below for sampling the location parameters in (\ref{eq:model}) without constrcting the MME in (\ref{eq:MME}).

In iteration $i$ of the Gibbs sampler, conditional on the sampled values of $\bs{\alpha} = \bs{\alpha}^{(i-1)}$ in the previous iteration, the model given by (\ref{eq:model}) can be written as:

\begin{align}
\mathbf{y} - \mathbf{W}\boldsymbol{\alpha}^{(i-1)} &= \mathbf{X}\boldsymbol{\beta} + \mathbf{Zu} + \mathbf{e},
\label{eq:model2}
\end{align}

where distributions of $\mbf{u}$, and $\mbf{e}$  in (\ref{eq:model2}) are identical to their distibutions in (\ref{eq:model}), because $\bs{\alpha}$ is independent of $\mbf{u}$, and $\mbf{e}$. The MME for (\ref{eq:model2}) will be sparse when $\mathbf{X}$, $\mathbf{Z}$ and $\mathbf{G}^{-1}$ are sparse as in the usual non-genomic models. Thus, the function `sampleLoc!`  can be applied to the sparse MME for (\ref{eq:model2}) to obtain single-site Gibbs samples of the elements in $\bs{\beta}$ and $\mbf{u}$. Recall that this function can be used for single-trait and multi-trait models.   

Similarly, for a single-trait model, to apply single-site Gibbs sampling to element $j$ of $\bs{\alpha}$, the model in (\ref{eq:model}) can be written as
\begin{align}
\mbf{y} - \mbf{X}\bs{\beta}^{(i)} - \mbf{Z}\mbf{u}^{(i)} 
        - \sum_{l < j}\mbf{w}_l\alpha_l^{(i)} - \sum_{l >j}\mbf{w}_l\alpha_l^{(i-1)}
                           &=  \mbf{w}_j\alpha_j+ \mathbf{e}\\
\mbf{y}_{\text{adj-j}}^{(j)} &= \mbf{w}_j\alpha_j + \mathbf{e},
\label{eq:model3}
\end{align}

where $\mbf{w}_j$ is column $j$ of $\mbf{W}$. In this model, $\alpha_j$ is the only unknown, and the MME corresponding to (\ref{eq:model3}) is:

\begin{equation}
(\mbf{w}_j'\mbf{w}_j + \frac{\sigma^2_e}{\sigma^2_{\alpha}})\hat{\alpha}_j = \mbf{w}_j'\mbf{y}_{\text{adj-j}}^{(j)}.
\label{eq:MMEAlpha}
\end{equation}
Thus, as we have seen previously, the full-conditional distribution of $\alpha_j$ can be shown to have a normal distribution with mean $\hat{\alpha}_j$ and variance $\frac{\sigma^2_e}{\mbf{w}_j'\mbf{w}_j + \frac{\sigma^2_e}{\sigma^2_{\alpha}}}$.

To compute the right-hand-side (RHS) of (\ref{eq:MMEAlpha}) efficiently, before sammpling $\alpha_j$, let 

$$
\mbf{y}_{\text{adj}}^{(j-1)} = \mbf{y} - \mbf{X}\bs{\beta}^{(i)} - \mbf{Z}\mbf{u}^{(i)} 
                               - \sum_{l < j}\mbf{w}_l\alpha_l^{(i)} - \mbf{w}_j\alpha_j^{(i-1)} - \sum_{l > j}\mbf{w}_l\alpha_l^{(i-1)},
$$

where $\mbf{y}$ has been adjusted for all the effects in the model with their sampled values, including $\alpha_j = \alpha_j^{(i-1)}$. Then, the RHS of (\ref{eq:MMEAlpha}) can be written as

$$
\mbf{w}_j'\mbf{y}_{\text{adj-j}}^{(j)} = \mbf{w}_j'\mbf{y}_{\text{adj}}^{(j)} + \mbf{w}_j'\mbf{w}_j\alpha_j^{(i-1)}.
$$

After sampling $\alpha_j$, $\mbf{y}_{\text{adj}}^{{j}}$ is obtained efficiently as

\begin{align}
\mbf{y}_{\text{adj}}^{(j)} &= 
   \mbf{y} - \mbf{X}\bs{\beta}^{(i)} - \mbf{Z}\mbf{u}^{(i)} 
           - \sum_{l < j}\mbf{w}_l\alpha_l^{(i)} - \mbf{w}_j\alpha_j^{(i)} - \sum_{l > j}\mbf{w}_l\alpha_l^{(i-1)}\\
   &= \mbf{y}_{\text{adj}}^{(j-1)} +  \mbf{w}_j\alpha_j^{(i-1)} - \mbf{w}_j\alpha_j^{(i)} \\
   &= \mbf{y}_{\text{adj}}^{(j-1)} +  \mbf{w}_j(\alpha_j^{(i-1)} - \alpha_j^{(i)}).
\end{align}

After all elements of $\bs{\alpha}$ have been sampled, adding $\mbf{X}\bs{\beta}^{(i)} + \mbf{Z}\mbf{u}^{(i)}$ to $\mbf{y}_{\text{adj}}^{{k}}$ gives:

$$
\mathbf{y} - \mathbf{W}\boldsymbol{\alpha}^{(i)} = \mbf{y}_{\text{adj}}^{{k}} + \mbf{X}\bs{\beta}^{(i)} + \mbf{Z}\mbf{u}^{(i)},
$$

which will be used in the next round of the Gibbs sampler for obtaining samples of $\bs{\beta}$ and $\mbf{u}$.

Consider now a two-trait model, where, to simplify the notation, we will denote $\mbf{y}_{\text{adj-j}}^{{j}}$ by $\mbf{y}_1$ for the first trait and by $\mbf{y}_2$ for the second trait. Then, the model for sampling the effects for locus $j$ can be written as:    

\begin{align}
\begin{bmatrix}
\mbf{y}_1\\
\mbf{y}_2
\end{bmatrix}
&=
\begin{bmatrix}
\mbf{w}_j & \mbf{0}\\
\mbf{0}   & \mbf{w}_j
\end{bmatrix}
\begin{bmatrix}
\hat{\alpha}_{j1}\\
\hat{\alpha}_{j2}
\end{bmatrix}
+
\begin{bmatrix}
\mbf{e}_1\\
\mbf{e}_2
\end{bmatrix}\\
&= (\mbf{I}_2\otimes \mbf{w}_j)\hat{\bs{\alpha}}_j 
+ 
\mbf{e},
\label{eq:MME2Trait}
\end{align}

where $\bs{\alpha}_j \sim \text{N}(\mbf{0},\bs{\Sigma})$, and $\mbf{e} \sim \text{N}(\mbf{0},\mbf{R}_0\otimes\mbf{I}_n)$. The MME for this model is: 

\begin{align}
[(\mbf{I}_2\otimes \mbf{w}'_j)(\mbf{R}_0^{-1}\otimes\mbf{I}_n)(\mbf{I}_2\otimes \mbf{w}_j) + \bs{\Sigma}^{-1}]\hat{\bs{\alpha}}_j &=  
(\mbf{I}_2\otimes \mbf{w}'_j)(\mbf{R}_0^{-1}\otimes\mbf{I}_n)\mbf{y} \\ 
(\begin{bmatrix}
\mbf{w}'_j & \mbf{0}\\
\mbf{0}    & \mbf{w}'_j
\end{bmatrix}
\begin{bmatrix}
r_0^{11}\mbf{I} & r_0^{12}\mbf{I}\\
r_0^{21}\mbf{I} & r_0^{22}\mbf{I}
\end{bmatrix}
\begin{bmatrix}
\mbf{w}_j & \mbf{0}\\
\mbf{0}   & \mbf{w}_j
\end{bmatrix}
+ 
\bs{\Sigma}^{-1})
\hat{\bs{\alpha}}_j
&=
(\begin{bmatrix}
\mbf{w}'_j & \mbf{0}\\
\mbf{0}    & \mbf{w}'_j
\end{bmatrix}
\begin{bmatrix}
r_0^{11}\mbf{I} & r_0^{12}\mbf{I}\\
r_0^{21}\mbf{I} & r_0^{22}\mbf{I}
\end{bmatrix}
\begin{bmatrix}
\mbf{y}_1\\
\mbf{y}_2
\end{bmatrix} \\
(\mbf{w}'_j\mbf{w}_j\mbf{R}_0^{-1} + \bs{\Sigma}^{-1})\hat{\bs{\alpha}}_j 
&= 
\begin{bmatrix}
r_0^{11}\mbf{w}' & r_0^{12}\mbf{w}'\\
r_0^{21}\mbf{w}' & r_0^{22}\mbf{w}'
\end{bmatrix}
\begin{bmatrix}
\mbf{y}_1\\
\mbf{y}_2
\end{bmatrix} \\
(\mbf{w}'_j\mbf{w}_j\mbf{R}_0^{-1} + \bs{\Sigma}^{-1})\hat{\bs{\alpha}}_j 
&= 
\mbf{R}_0^{-1}
\begin{bmatrix}
\mbf{w}'_j\mbf{y}_1\\
\mbf{w}'_j\mbf{y}_2
\end{bmatrix} 
\label{eq:MME2Trait}
\end{align}

When the multi-trait model is for $m$ traits, the coefficient matrix of the MME in (\ref{eq:MME2Trait}) expands to an $m\times m$ matrix and the RHS to an $m\times 1$ vector. Then, the `sampleLoc!` function can be applied to this MME to obtain single-site Gibbs samples of the elements in $\bs{\alpha}_j$.  

In [3]:
using NBInclude
@nbinclude("../MME/3.7.2.BuildMME.ipynb"; regex=r"#\s*EXECUTE")

updateLhsRhs! (generic function with 1 method)

In [4]:
pedigree = get_pedigree("simData.ped",separator=",",header=false);

[32mThe delimiter in simData.ped is ','.[39m


[32mcoding pedigree... 100%|████████████████████████████████| Time: 0:00:00[39m
[32mcalculating inbreeding... 100%|█████████████████████████| Time: 0:00:00[39m


Finished!


In [5]:
data = CSV.read("data.phen");

In [6]:
using Statistics

In [7]:
var(data[:,3])

27.49455811886987

In [8]:
varGen = 10.0
varRes = 10.0
mme = initMME("y = intercept + Ind",varRes);
setRandom!(mme,"Ind",varGen,pedigree,estimate=true);

In [12]:
lhs,rhs,names = getLhsRhs!(mme,data);

In [9]:
function sampleLoc!(mme)
    A = mme.mmeLhs
    b = mme.mmeRhs
    x = mme.mmeSpl
    n = size(x,1)
    for i=1:n
        cVarInv = 1.0/A[i,i]
        cMean   = cVarInv*(b[i] - A[:,i]'x) + x[i]
        x[i]    = randn()*sqrt(cVarInv) + cMean
    end
end

sampleLoc! (generic function with 1 method)