In [1]:
using Pkg
#Pkg.add("NBInclude")
#Pkg.add("ProgressMeter")

Latex Macros:
$$
\newcommand{\E}{\text{E}}
\newcommand{\mbf}{\mathbf}
\newcommand{\bs}{\boldsymbol}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\Var}{\text{Var}}
\newcommand{\Cov}{\text{Cov}}
\newcommand{\A}[1]{\mathbf{A}_{#1}}
\newcommand{\Ai}[1]{\mathbf{A}^{#1}}
$$

In [2]:
macro javascript_str(s) display("text/javascript", s); end
javascript"""
    MathJax.Hub.Config({
      TeX: { equationNumbers: { autoNumber: "AMS" } }
    });
    MathJax.Hub.Queue( 
        ["resetEquationNumbers",MathJax.InputJax.TeX], 
        ["PreProcess",MathJax.Hub], 
        ["Reprocess",MathJax.Hub] 
    );
"""

# Gibbs Sampling of Fixed and Random Effects

### Extension for SNP Effects


We will initially consider a univariate mixed linear model of the form:

\begin{equation}
\mathbf{y} = \mathbf{X}\boldsymbol{\beta} + \mathbf{Zu} + \mathbf{W}\boldsymbol{\alpha} + \mathbf{e},
\label{eq:model}
\end{equation}

where $\boldsymbol{\beta}$ is a vector of fixed effects, $\mathbf{X}$ is an observed matrix that relates $\boldsymbol{\beta}$ to $\mathbf{y}$, $\mathbf{u}$ is a vector of multivariate normal random effects with null means and covariance matrix $\mathbf{G}\sigma^2_u$, $\mathbf{Z}$ is an observed matrix that relates $\mathbf{u}$ to $\mathbf{y}$, $\boldsymbol{\alpha}$ is a vector of random SNP effects with null means and covariance matrix $\mathbf{I}\sigma^2_{\alpha}$, $\mathbf{W}$ is an observed matrix of SNP covariates, and $\mathbf{e}$ is a vector of multivariate normal residuals with null means and covariance matrix $\mathbf{R}\sigma^2_e$. The fixed effects are usually assigned a flat prior distribution, and the matrices $\mathbf{G}$ and $\mathbf{R}$ are assumed to be known. It is assumed that vectors $\mbf{u}$, $\bs{\alpha}$ and $\mbf{e}$ are mutually independent. The variance components $\sigma^2_u$, $\sigma^2_{\alpha}$ and $\sigma^2_e$ are assigned scaled inverted chi-square prior distributions.

The MME for this model are:

\begin{equation}
\begin{bmatrix}
\mathbf{X}'\mathbf{R}^{-1}\mathbf{X} & 
\mathbf{X}'\mathbf{R}^{-1}\mathbf{Z} &
\mathbf{X}'\mathbf{R}^{-1}\mathbf{W} \\
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{X} & 
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{Z} + \frac{\sigma_{e}^{2}}{\sigma_u^2}\mathbf{G}^{-1}&
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{W} \\
\mathbf{W}'\mathbf{R}^{-1}\mathbf{X} & 
\mathbf{W}'\mathbf{R}^{-1}\mathbf{Z} &
\mathbf{W}'\mathbf{R}^{-1}\mathbf{W} + \frac{\sigma_{e}^{2}}{\sigma_{\alpha}^2}\mathbf{I}       
\end{bmatrix}
\begin{bmatrix}
\hat{\bs{\beta}}\\
\hat{\mbf{u}} \\
\hat{\bs{\alpha}}      
\end{bmatrix}
=
\begin{bmatrix}
\mathbf{X}'\mathbf{R}^{-1}\mathbf{y}\\
\mathbf{Z}'\mathbf{R}^{-1}\mathbf{y}\\
\mathbf{W}'\mathbf{R}^{-1}\mathbf{y}        
\end{bmatrix}.
\label{eq:MME}
\end{equation}

The rows and columns in (\ref{eq:MME}) corresponding to $\hat{\bs{\alpha}}$ are dense. Thus, an alternative sampling scheme is described below for sampling the location parameters in (\ref{eq:model}) without constrcting the MME in (\ref{eq:MME}).

In iteration $i$ of the Gibbs sampler, conditional on the sampled values of $\bs{\alpha} = \bs{\alpha}^{(i-1)}$ in the previous iteration, the model given by (\ref{eq:model}) can be written as:

\begin{align}
\mathbf{y} - \mathbf{W}\boldsymbol{\alpha}^{(i-1)} &= \mathbf{X}\boldsymbol{\beta} + \mathbf{Zu} + \mathbf{e},
\label{eq:model2}
\end{align}

where distributions of $\mbf{u}$, and $\mbf{e}$  in (\ref{eq:model2}) are identical to their distibutions in (\ref{eq:model}), because $\bs{\alpha}$ is independent of $\mbf{u}$, and $\mbf{e}$. The MME for (\ref{eq:model2}) will be sparse when $\mathbf{X}$, $\mathbf{Z}$ and $\mathbf{G}^{-1}$ are sparse as in the usual non-genomic models. Thus, the function `sampleLoc!`  can be applied to the sparse MME for (\ref{eq:model2}) to obtain single-site Gibbs samples of the elements in $\bs{\beta}$ and $\mbf{u}$. Recall that this function can be used for single-trait and multi-trait models.   

Similarly, for a single-trait model, to apply single-site Gibbs sampling to element $j$ of $\bs{\alpha}$, the model in (\ref{eq:model}) can be written as
\begin{align}
\mbf{y} - \mbf{X}\bs{\beta}^{(i)} - \mbf{Z}\mbf{u}^{(i)} 
        - \sum_{l < j}\mbf{w}_l\alpha_l^{(i)} - \sum_{l >j}\mbf{w}_l\alpha_l^{(i-1)}
                           &=  \mbf{w}_j\alpha_j+ \mathbf{e}\\
\mbf{y}_{\text{adj-j}}^{(j)} &= \mbf{w}_j\alpha_j + \mathbf{e},
\label{eq:model3}
\end{align}

where $\mbf{w}_j$ is column $j$ of $\mbf{W}$. In this model, $\alpha_j$ is the only unknown, and the MME corresponding to (\ref{eq:model3}) is:

\begin{equation}
(\mbf{w}_j'\mbf{w}_j + \frac{\sigma^2_e}{\sigma^2_{\alpha}})\hat{\alpha}_j = \mbf{w}_j'\mbf{y}_{\text{adj-j}}^{(j)}.
\label{eq:MMEAlpha}
\end{equation}
Thus, as we have seen previously, the full-conditional distribution of $\alpha_j$ can be shown to have a normal distribution with mean $\hat{\alpha}_j$ and variance $\frac{\sigma^2_e}{\mbf{w}_j'\mbf{w}_j + \frac{\sigma^2_e}{\sigma^2_{\alpha}}}$.

To compute the right-hand-side (RHS) of (\ref{eq:MMEAlpha}) efficiently, before sammpling $\alpha_j$, let 

$$
\mbf{y}_{\text{adj}}^{(j-1)} = \mbf{y} - \mbf{X}\bs{\beta}^{(i)} - \mbf{Z}\mbf{u}^{(i)} 
                               - \sum_{l < j}\mbf{w}_l\alpha_l^{(i)} - \mbf{w}_j\alpha_j^{(i-1)} - \sum_{l > j}\mbf{w}_l\alpha_l^{(i-1)},
$$

where $\mbf{y}$ has been adjusted for all the effects in the model with their sampled values, including $\alpha_j = \alpha_j^{(i-1)}$. Then, the RHS of (\ref{eq:MMEAlpha}) can be written as

$$
\mbf{w}_j'\mbf{y}_{\text{adj-j}}^{(j)} = \mbf{w}_j'\mbf{y}_{\text{adj}}^{(j)} + \mbf{w}_j'\mbf{w}_j\alpha_j^{(i-1)}.
$$

After sampling $\alpha_j$, $\mbf{y}_{\text{adj}}^{{j}}$ is obtained efficiently as

\begin{align}
\mbf{y}_{\text{adj}}^{(j)} &= 
   \mbf{y} - \mbf{X}\bs{\beta}^{(i)} - \mbf{Z}\mbf{u}^{(i)} 
           - \sum_{l < j}\mbf{w}_l\alpha_l^{(i)} - \mbf{w}_j\alpha_j^{(i)} - \sum_{l > j}\mbf{w}_l\alpha_l^{(i-1)}\\
   &= \mbf{y}_{\text{adj}}^{(j-1)} +  \mbf{w}_j\alpha_j^{(i-1)} - \mbf{w}_j\alpha_j^{(i)} \\
   &= \mbf{y}_{\text{adj}}^{(j-1)} +  \mbf{w}_j(\alpha_j^{(i-1)} - \alpha_j^{(i)}).
\end{align}

After all elements of $\bs{\alpha}$ have been sampled, adding $\mbf{X}\bs{\beta}^{(i)} + \mbf{Z}\mbf{u}^{(i)}$ to $\mbf{y}_{\text{adj}}^{{k}}$ gives:

$$
\mathbf{y} - \mathbf{W}\boldsymbol{\alpha}^{(i)} = \mbf{y}_{\text{adj}}^{{k}} + \mbf{X}\bs{\beta}^{(i)} + \mbf{Z}\mbf{u}^{(i)},
$$

which will be used in the next round of the Gibbs sampler for obtaining samples of $\bs{\beta}$ and $\mbf{u}$.

Consider now a two-trait model, where, to simplify the notation, we will denote $\mbf{y}_{\text{adj-j}}^{{j}}$ by $\mbf{y}_1$ for the first trait and by $\mbf{y}_2$ for the second trait. Then, the model for sampling the effects for locus $j$ can be written as:    

\begin{align}
\begin{bmatrix}
\mbf{y}_1\\
\mbf{y}_2
\end{bmatrix}
&=
\begin{bmatrix}
\mbf{w}_j & \mbf{0}\\
\mbf{0}   & \mbf{w}_j
\end{bmatrix}
\begin{bmatrix}
\alpha_{j1}\\
\alpha_{j2}
\end{bmatrix}
+
\begin{bmatrix}
\mbf{e}_1\\
\mbf{e}_2
\end{bmatrix}\\
&= (\mbf{I}_2\otimes \mbf{w}_j)\bs{\alpha}_j 
+ 
\mbf{e},
\label{eq:MME2Trait}
\end{align}

where $\bs{\alpha}_j \sim \text{N}(\mbf{0},\bs{\Sigma})$, and $\mbf{e} \sim \text{N}(\mbf{0},\mbf{R}_0\otimes\mbf{I}_n)$. The MME for this model is: 

\begin{align}
[(\mbf{I}_2\otimes \mbf{w}'_j)(\mbf{R}_0^{-1}\otimes\mbf{I}_n)(\mbf{I}_2\otimes \mbf{w}_j) + \bs{\Sigma}^{-1}]\hat{\bs{\alpha}}_j &=  
(\mbf{I}_2\otimes \mbf{w}'_j)(\mbf{R}_0^{-1}\otimes\mbf{I}_n)\mbf{y} \\ 
(\begin{bmatrix}
\mbf{w}'_j & \mbf{0}\\
\mbf{0}    & \mbf{w}'_j
\end{bmatrix}
\begin{bmatrix}
r_0^{11}\mbf{I} & r_0^{12}\mbf{I}\\
r_0^{21}\mbf{I} & r_0^{22}\mbf{I}
\end{bmatrix}
\begin{bmatrix}
\mbf{w}_j & \mbf{0}\\
\mbf{0}   & \mbf{w}_j
\end{bmatrix}
+ 
\bs{\Sigma}^{-1})
\hat{\bs{\alpha}}_j
&=
(\begin{bmatrix}
\mbf{w}'_j & \mbf{0}\\
\mbf{0}    & \mbf{w}'_j
\end{bmatrix}
\begin{bmatrix}
r_0^{11}\mbf{I} & r_0^{12}\mbf{I}\\
r_0^{21}\mbf{I} & r_0^{22}\mbf{I}
\end{bmatrix}
\begin{bmatrix}
\mbf{y}_1\\
\mbf{y}_2
\end{bmatrix} \\
(\mbf{w}'_j\mbf{w}_j\mbf{R}_0^{-1} + \bs{\Sigma}^{-1})\hat{\bs{\alpha}}_j 
&= 
\begin{bmatrix}
r_0^{11}\mbf{w}' & r_0^{12}\mbf{w}'\\
r_0^{21}\mbf{w}' & r_0^{22}\mbf{w}'
\end{bmatrix}
\begin{bmatrix}
\mbf{y}_1\\
\mbf{y}_2
\end{bmatrix} \\
(\mbf{w}'_j\mbf{w}_j\mbf{R}_0^{-1} + \bs{\Sigma}^{-1})\hat{\bs{\alpha}}_j 
&= 
\mbf{R}_0^{-1}
\begin{bmatrix}
\mbf{w}'_j\mbf{y}_1\\
\mbf{w}'_j\mbf{y}_2
\end{bmatrix} 
%\label{eq:MME2Trait}
\end{align}

When the multi-trait model is for $m$ traits, the coefficient matrix of the MME in (\ref{eq:MME2Trait}) expands to an $m\times m$ matrix and the RHS to an $m\times 1$ vector. Then, the `sampleLoc!` function can be applied to this MME to obtain single-site Gibbs samples of the elements in $\bs{\alpha}_j$.  

In [3]:
using NBInclude
@nbinclude("../MME/3.7.3.BuildMME.ipynb"; regex=r"#\s*EXECUTE")

updateLhsRhs! (generic function with 1 method)

In [4]:
pedigree = get_pedigree("../MME/pedFile",separator=",",header=false);

[32mThe delimiter in pedFile is ','.[39m


[32mcoding pedigree... 100%|████████████████████████████████| Time: 0:00:00[39m
[32mcalculating inbreeding... 100%|█████████████████████████| Time: 0:00:00[39m


Finished!


In [5]:
#JWAS.PedModule.getinfo(pedigree)

In [6]:
data = CSV.read("../MME/data.phen")

Unnamed: 0_level_0,Ind,Mat,y1,y2,x
Unnamed: 0_level_1,Int64,Int64,Float64,Float64,Float64
1,3,2,8.9,9.2,11.9
2,4,2,9.7,5.7,10.8
3,5,4,8.8,8.5,11.7


In [7]:
using Statistics, Distributions

In [8]:
M = float.(rand(Binomial(2,0.5),5,10))

5×10 Array{Float64,2}:
 1.0  1.0  1.0  0.0  2.0  2.0  1.0  2.0  1.0  1.0
 0.0  0.0  2.0  0.0  2.0  0.0  1.0  0.0  2.0  2.0
 0.0  0.0  1.0  0.0  2.0  1.0  2.0  1.0  1.0  0.0
 0.0  1.0  0.0  0.0  1.0  2.0  1.0  0.0  1.0  1.0
 2.0  1.0  0.0  0.0  2.0  1.0  0.0  2.0  2.0  1.0

In [9]:
var(data[:,3])

0.24333333333333262

In [10]:
function sampleLoc!(mme,iter=0)
    A = mme.mmeLhs
    b = mme.mmeRhs
    x = mme.mmeSpl
    n = size(x,1)
    sampleLoc!(A,b,x,n)
    if iter > 0
        mme.meanEffects += (x - mme.meanEffects)/iter
    end
end

sampleLoc! (generic function with 2 methods)

In [11]:
function sampleLoc!(A,b,x,n)
    for i=1:n
        cVarInv = 1.0/A[i,i]
        cMean   = cVarInv*(b[i] - A[:,i]'x) + x[i]
        x[i]    = randn()*sqrt(cVarInv) + cMean
    end
end

sampleLoc! (generic function with 3 methods)

In [12]:
function sampleVar!(mme,iter=0)
    for randomEffect in mme.randomEffectsVec
        if randomEffect.estimate == false continue end
        modelTerm1 = randomEffect.modelTermVec[1]      
        k = modelTerm1.endPos - modelTerm1.startPos
        m = size(randomEffect.modelTermVec,1)
        
        S = zeros(m,m)
        for i=1:m
            modelTermi = randomEffect.modelTermVec[i]
            starti = modelTermi.startPos
            endi = modelTermi.endPos
            ui = @view mme.mmeSpl[starti:endi]
            for j=i:m
                modelTermj = randomEffect.modelTermVec[j]
                startj = modelTermj.startPos
                endj = modelTermj.endPos
                uj = @view mme.mmeSpl[startj:endj]
                S[i,j] = ui'randomEffect.Ai*uj
                S[j,i] = S[i,j]
            end
        end
        Vpo = randomEffect.Spr + S
        νpo = randomEffect.νpr + k 
        V = rand(InverseWishart(νpo,Vpo))
        randomEffect.Vi = inv(V)
        if iter>0
            randomEffect.meanV += (V - randomEffect.meanV)/iter
        end
        if randomEffect.outSamples==true && iter%randomEffect.outFreq==0
            for i=1:m,j=i:m
                if i==j==1
                    @printf(randomEffect.outStream,"%10.5e", V[i,j])
                else
                    @printf(randomEffect.outStream," %10.5e", V[i,j])    
                end
            end
            @printf(randomEffect.outStream,"\n")
        end
            
    end
end

sampleVar! (generic function with 2 methods)

In [13]:
function sampleVarRes!(mme,iter=0)
    m = size(mme.varRes,1)
    n = Int(size(mme.y,1)/m)
    S = zeros(m,m)
    e = mme.y - mme.X*mme.mmeSpl
    for i=1:m
        starti = (i - 1)*n + 1
        endi = starti + n - 1
        ei = @view e[starti:endi]
        for j=i:m
            startj = (j - 1)*n + 1
            endj = startj + n - 1
            ej = @view e[startj:endj]
            S[i,j] = ei'ej
            S[j,i] = S[i,j]
        end
    end
    νpo = n + mme.νRes
    Spo = S + mme.SRes
    mme.varRes = rand(InverseWishart(νpo,Spo))
    if iter>0
        mme.meanVarRes += (mme.varRes - mme.meanVarRes)/iter
    end  
end

sampleVarRes! (generic function with 2 methods)

In [14]:
function sampleGenEffects!(mme,iter=0)
    # y has already been corrected for SNP effects from previous iteration
    mme.yAdj[:] -= mme.X*mme.mmeSpl # adjust for effects in sparse MME
    nMarkers = size(mme.genotypes.M,2)
    Ri = inv(mme.varRes)
    nModels = size(mme.varRes,1)
    rhs = zeros(nModels)
    mmeGen = mme.genotypes
    for i=1:nMarkers
        lhs = mmeGen.MPMArray[i]*Ri + mmeGen.Vi
        for j=1:nModels
            rhs[j] = mmeGen.MArray[i]'mmeGen.yAdjArray[j] + mmeGen.MPMArray[i]*mmeGen.α[i,j]
        end
        rhs = Ri*rhs
        x = @view mmeGen.α[i,:]
        old = copy(x)
        sampleLoc!(lhs,rhs,x,nModels)
        for j=1:nModels
            mmeGen.yAdjArray[j][:] += mmeGen.MArray[i]*(old[j] - x[j])
        end
    end
    mme.yAdj[:] += mme.X*mme.mmeSpl # unadjust for effects in sparse MME
    if iter > 0
        mmeGen.αMean += (mmeGen.α - mmeGen.αMean)/iter
    end
    nothing
end

sampleGenEffects! (generic function with 2 methods)

In [15]:
using ProgressMeter
function runMCMC!(mme,nIter,burnIn)
    @showprogress "MCMC sampling" for iter = 1:nIter
        sampleLoc!(mme,iter-burnIn)
        if mme.genotypes !=false sampleGenEffects!(mme,iter-burnIn) end
        #sampleVar!(mme,iter-burnIn)
        #sampleVarRes!(mme,iter-burnIn)
        updateLhsRhs!(mme)
    end
    for randomEffect in mme.randomEffectsVec
        if randomEffect.outStream != nothing close(randomEffect.outStream) end
    end
    if mme.genotypes !=false && mme.genotypes.outStream != nothing close(mme.genotypes.outStream) end
end

runMCMC! (generic function with 1 method)

In [16]:
varGen = 5.0
varRes = 10.0
mme = initMME("y1 = intercept",varRes);
#setRandom!(mme,"Ind",varGen,pedigree,estimate=false);
#setRandom!(mme,"Ind",varGen,estimate=false);

In [17]:
idGeno = string.(1:5);
varSNPEffect = varGen/size(M,2)
addGenotypes!(mme,data,"Ind",M,idGeno,varSNPEffect,estimate=false,outSamples=false)

In [18]:
lhs,rhs,names = getLhsRhs!(mme,data);

In [19]:
nIter  = 100_000
burnIn = 1_0000
runMCMC!(mme,nIter,burnIn)

[32mMCMC sampling100%|██████████████████████████████████████| Time: 0:00:13[39m


In [20]:
for i=1:size(M,2)
    colNum  = 5+i
    colName = Symbol("m$i")
    insertcols!(data,colNum,colName => mme.genotypes.M[:,i])
end

In [21]:
data

Unnamed: 0_level_0,Ind,Mat,y1,y2,x,m1,m2,m3,m4,m5
Unnamed: 0_level_1,Int64,Int64,Float64,Float64,Float64,Float64,Float64,Float64,Float64,Float64
1,3,2,8.9,9.2,11.9,-0.6,-0.6,0.2,0.0,0.2
2,4,2,9.7,5.7,10.8,-0.6,0.4,-0.8,0.0,-0.8
3,5,4,8.8,8.5,11.7,1.4,0.4,-0.8,0.0,0.2


In [22]:
modelEqn = "y1 = intercept"
for i=1:size(M,2)
    modelEqn = modelEqn *  " + m$i"
end

In [23]:
modelEqn

"y1 = intercept + m1 + m2 + m3 + m4 + m5 + m6 + m7 + m8 + m9 + m10"

In [24]:
mme1 = initMME(modelEqn,varRes);
#setRandom!(mme1,"Ind",varGen,pedigree,estimate=false);
#setRandom!(mme1,"Ind",varGen,estimate=false);

In [25]:
varSNPEffect

0.5

In [26]:
mme1.covVec = []
for i=1:size(M,2)
    randomTrm = "m$i"
    push!(mme1.covVec,randomTrm)
    setRandom!(mme1,randomTrm,varSNPEffect)
end

In [27]:
nIter  = 100_000
burnIn = 1_000
lhs,rhs,names = getLhsRhs!(mme1,data)
runMCMC!(mme1,nIter,burnIn)

[32mMCMC sampling100%|██████████████████████████████████████| Time: 0:00:05[39m


In [28]:
names

11-element Array{String,1}:
 "1:intercept"
 "1:m1: m1"   
 "1:m2: m2"   
 "1:m3: m3"   
 "1:m4: m4"   
 "1:m5: m5"   
 "1:m6: m6"   
 "1:m7: m7"   
 "1:m8: m8"   
 "1:m9: m9"   
 "1:m10: m10" 

In [29]:
[names[1] mme1.meanEffects[1] mme.meanEffects]

1×3 Array{Any,2}:
 "1:intercept"  9.13158  9.12796

In [30]:
[mme1.meanEffects[2:11] mme.genotypes.αMean]

10×2 Array{Float64,2}:
 -0.0274914   -0.0246814  
  0.00936227   0.0102446  
 -0.00985962  -0.0128209  
  0.00381277  -0.00354914 
 -0.0224857   -0.0261506  
  0.0223116    0.0234557  
  0.00152241   0.000699012
 -0.0375149   -0.0357298  
 -0.0118666   -0.0147853  
  0.0131752    0.0123046  