# Linear Models for Genetic Prediction (1)

<h2 class="pm-node nj-subtitle">Code it from scratch or Run it in JWAS software</h2>

<p class="pm-node nj-authors">Hao Cheng, Debbie Chapman, Zigui Wang, Tianjing Zhao, Jiayi Qu</p>

A collection of linear models for genetic prediction are presented in this note. Three sections are included for each model: 1) Model description; 2) Code it from scratch; 3) Run the analysis in a genomic analysis package "[JWAS](https://github.com/reworkhow/JWAS.jl)".  Datasets and examples from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) " are used in this note. This note is written by the [QTL.ROCKS](http://qtl.rocks), quantitative genetics lab at UC Davis. 

# Genetic Covariance Between Relatives

# Numerator Relationship Matrix

## Do It From Scratch

## JWAS

# Animal Model

The following example is from Chapter 3.3 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) ". 

## Data and Model

Consider the following data set for the pre-weaning gain (WWG) of beef calves (calves assumed to be reared under the same management conditions). The objective is to estimate the effects of sex and predict the breeding values for all animals.

Assume $\sigma^2_a = 20$ and $\sigma^2_e = 40$, therefore $\lambda = \frac{40}{20} = 2$.

In [1]:
using JWAS, DataFrames, SparseArrays, LinearAlgebra, Statistics

data = DataFrame(ID  = [4,5,6,7,8], 
  							 sex = ["male","female","female","male","male"], 
                 sire= [1,3,1,4,3], 
                 dam = [missing,2,2,5,6], 
                 sireofsire = [missing,missing,missing,1,missing], 
                 damofsire  = [missing,missing,missing,missing,missing], 
                 WWG = [4.5,2.9,3.9,3.5,5.0])

The **model** to describe the observations is:

$$

y_{ij} = p_i + a_j +e_{ij}
$$
where:

$y_{ij}$ is the WWG of the $j$th calf of the $i$th sex

 $p_i$ is  the fixed effect of the $i$th sex

 $a_j$ is the random effect of the $j$th calf

$e_{ij}$ is the random error effect

**In matrix terms**, the model can be written as

 $y = Xb + Za + e$

or

$\begin{bmatrix}
y_{11}\\
y_{12}\\
y_{13}\\
y_{24}\\
y_{25}
\end{bmatrix} = 
\begin{bmatrix}
1 & 0\\
1 & 0\\
1 & 0\\
0 & 1\\
0 & 1
\end{bmatrix}
\begin{bmatrix}
b_1\\
b_2
\end{bmatrix} +
\begin{bmatrix}
1 & 0 & 0 & 0 & 0\\
0 & 1 & 0 & 0 & 0\\
0 & 0 & 1 & 0 & 0\\
0 & 0 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0
\end{bmatrix}
\begin{bmatrix}
a_1\\
a_2\\
a_3\\
a_4\\
a_5
\end{bmatrix} +
\begin{bmatrix}
e_{11}\\
e_{12}\\
e_{13}\\
e_{24}\\
e_{25}
\end{bmatrix}$

where:

$y$  is an $n \times 1$ vector of recorded pre-weaning weight measurements for $n$ animals

$b$ is a $p \times 1$ vector of fixed effects, which are the male and female sex effect.

$a$ is a $q \times 1$ vector of random effects for each animal, which are individual breeding values; $q$ is the *number* of animals to estimate the random effect for; $q$ may include founder animals with no recorded measurement

$e$ is an $n \times 1$ vector of the random residual effects for $n$ animals

$X$ is an $n \times p$ incidence matrix to relate recorded animals to their fixed effects

$Z$ is an $n \times q$ incidence matrix to relate recorded animals to their random animal effects

**Mixed Model Equations**

Because our model includes both fixed effects (sex) and random effects (breeding values), we use **mixed** model equations. In the equation below, we know all variables except for the estimate of fixed effects ($\hat{b}$) and the prediction of random effects ($\hat{a}$). Therefore, we will utilize this mixed model to solve for those effects.

$\begin{bmatrix} X'X & X'Z \\ Z'X & Z'Z + A^{-1}\frac{\sigma^2_e}{\sigma^2_a} \end{bmatrix}
\begin{bmatrix} \hat{b} \\ \hat{a} \end{bmatrix}
=
\begin{bmatrix} X'y \\ Z'y\end{bmatrix}$

Notice the $A^{-1} \frac{\sigma^2_e}{\sigma^2_a}$ component. $A^{-1}$ is the inverse of $A$, which is known as the numerator relationship matrix. The numerator relationship matrix is derived from the pedigree information, and accounts for the additive genetic covariances between related animals.

Because $Z'Z$ is the product of matrices that contain categorical information, $Z'Z$ is not a full rank matrix; the addition of $A^{-1} \frac{\sigma^2_e}{\sigma^2_a}$ adds random effects to the mixed model equation, and allows all parts of the matrix to be full rank.

## Do It From Scratch

[pedigree_3_3.csv](https://nextjournal.com/data/QmYdveacvrLidj1iwroCG21AC8LdkYr4tVzkRnqutcBrTB?content-type=application/vnd.ms-excel&node-id=5cc85bba-6cd8-40fe-a951-c6af5e738038&filename=pedigree_3_3.csv&node-kind=file)


In [1]:
#inverse of numerator relationship matrix
Ainv = [1.833 0.500 0.000 -.667 0.000 -1.00 0.000 0.000
				0.500 2.000 0.500 0.000 -1.00 -1.00 0.000 0.000
				0.000 0.500 2.000 0.000 -1.00 0.500 0.000 -1.00
				-.667 0.000 0.000 1.833 0.500 0.000 -1.00 0.000
				0.000 -1.00 -1.00 0.500 2.500 0.000 -1.00 0.000
				-1.00 -1.00 0.500 0.000 0.000 2.500 0.000 -1.00
				0.000 0.000 0.000 -1.00 -1.00 0.000 2.000 0.000
				0.000 0.000 -1.00 0.000 0.000 -1.00 0.000 2.000];

In [1]:
y = [4.5, 2.9, 3.9, 3.5, 5.0]

X = [1 0; 0 1; 0 1; 1 0; 1 0]
b = ["male", "female"]

Z = [0 0 0 1 0 0 0 0
     0 0 0 0 1 0 0 0
     0 0 0 0 0 1 0 0
     0 0 0 0 0 0 1 0
     0 0 0 0 0 0 0 1] #row:records  column:all animal
u = ["BV1", "BV2", "BV3", "BV4", "BV5", "BV6", "BV7", "BV8"];
σe2 = 40
σa2 = 20
λ = σe2/σa2

2.0

In [1]:
X

5×2 Array{Int64,2}:
 1  0
 0  1
 0  1
 1  0
 1  0

In [1]:
X'X  # number of animals in each level

2×2 Array{Int64,2}:
 3  0
 0  2

In [1]:
X'Z

2×8 Array{Int64,2}:
 0  0  0  1  0  0  1  1
 0  0  0  0  1  1  0  0

In [1]:
Z'Z # number of records each animal have

8×8 Array{Int64,2}:
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0
 0  0  0  1  0  0  0  0
 0  0  0  0  1  0  0  0
 0  0  0  0  0  1  0  0
 0  0  0  0  0  0  1  0
 0  0  0  0  0  0  0  1

In [1]:
X'y  #sum of y for each level of X

2-element Array{Float64,1}:
 13.0
  6.8

In [1]:
Z'y # sum of y for each level of Z

8-element Array{Float64,1}:
 0.0
 0.0
 0.0
 4.5
 2.9
 3.9
 3.5
 5.0

In [1]:
lhs=[X'X X'Z
     Z'X Z'Z+Ainv*λ]
rhs=[X'y
     Z'y]
lhs\rhs

10-element Array{Float64,1}:
  4.35848   
  3.40442   
  0.098486  
 -0.0187685 
 -0.0410794 
 -0.00862435
 -0.185727  
  0.176893  
 -0.249436  
  0.18263   

## JWAS

In [1]:
using JWAS
pedigree = get_pedigree(_, seperator = ",", header = false);
model_equation ="WWG = sex + ID";

In [1]:
model=build_model(model_equation,σe2)

In [1]:
set_random(model,"ID",pedigree,σa2)

In [1]:
solve(model,data,solver="Jacobi") #direct solver does not work

7×2 Array{Any,2}:
 "1:sex : male"     4.33208  
 "1:sex : female"   3.39902  
 "1:ID : 4"         0.0562777
 "1:ID : 5"        -0.1661   
 "1:ID : 6"         0.167233 
 "1:ID : 7"        -0.277056 
 "1:ID : 8"         0.222944 

# Sire Model

The following example is from Chapter 3.4 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) ".

## Data and Model

Consider the same data set used for the *Animal Model.* The objective is to estimate sex effects and predict breeding values for sires only (1, 3 and 4). The data set is obtained from the example 3.3

Assume $\sigma^2_a = 20$, $\sigma^2_e = 55$, $\sigma^2_s = 0.25(\sigma^2_a)$, and $\lambda = \frac{\sigma^2_e}{\sigma^2_s}$

The model to describe the observations is:

$$
y_{ij} = p_i + a_j +e_{ij}
$$
where:

$y_{ij}$ is the WWG of the $j$th sire's progeny for the $i$th sex

$p_i$ is the fixed effect of the $i$th sex of the sire's progeny

$a_j$ is the random effect of the $j$th sire

$e_{ij}$ is the random error effect

**In matrix terms**, the model can be written as

$y = Xb + Zs + e$

This matrix equation is similar to the animal model, with two exceptions:

$s$ is a $q \times 1$ vector of random effects for sires; $q$ is the number of  male animals with at least one progeny record

$Z$ is an $n \times q$ incidence that relates progeny records to their sire

The reason $Xb$ is the same as before is because we want to keep recorded data from all progeny, including female progeny, to estimate the fixed effect.

**Mixed Model Equations (MME)**

To estimate the fixed effects (sex) and predict the random effects (sire breeding values), we will again utilize a mixed model equation.

$\begin{bmatrix} X'X & X'Z \\ Z'X & Z'Z + A^{-1}\frac{\sigma^2_e}{\sigma^2_s} \end{bmatrix}
\begin{bmatrix} \hat{b} \\ \hat{s} \end{bmatrix}
=
\begin{bmatrix} X'y \\ Z'y\end{bmatrix}$

Notice in this case that the $\hat{s}$ vector is smaller than the previous $\hat{a}$ vector, which results in a smaller $Z$ and $A$ matrix. The $A$ matrix now requires less processing time to invert, and the total MME is smaller and easier to process. This is the advantage of only predicting sire breeding values.

## Do It From Scratch

[pedigree_3_4.csv](https://nextjournal.com/data/QmYdveacvrLidj1iwroCG21AC8LdkYr4tVzkRnqutcBrTB?content-type=application/vnd.ms-excel&node-id=019b3654-e1c4-41b6-a16f-7e052182d28b&filename=pedigree_3_4.csv&node-kind=file)


In [1]:
# inverse numerator relationship for 3 sires
Ainv = [1.3333 0.0 -0.667
				0.0000 1.0 0.0000
				-0.667 0.0 1.3333];

In [1]:
y = [4.5, 2.9, 3.9, 3.5, 5.0]
X = [1 0; 0 1; 0 1; 1 0; 1 0]
b = ["male", "female"];

The incidence matrix for fixed effects (sex for sire's progeny) is the same as the random animal effects model. However, in the sire model, since we consider the random effects of the sire, the Z matrix will be of dimensions(5,3)

In [1]:
Z = [1 0 0 
     0 1 0 
     1 0 0 
     0 0 1 
     0 1 0]
s = ["sBV1", "sBV3", "sBV4"];

In [1]:
xpx =  X'X
xpz = X'Z
zpx = Z'X
zpz = Z'Z
xpy = X'y
zpy = Z'y
σe2 = 55
σa2 = 20
σs2 = σa2*0.25 # note that on average, a quarter of the total breeding value variance is attributable to the variance of the sire's breeding value
λ   = σe2/σs2;

In [1]:
lhs = [xpx xpz
       zpx zpz + Ainv*λ]
rhs = [xpy
       zpy]
lhs\rhs

5-element Array{Float64,1}:
  4.33333  
  3.36528  
  0.053953 
  0.0154915
 -0.0694444

## JWAS

In [1]:
using JWAS
pedigree = get_pedigree(_, seperator = ",", header = false)
model_equation = "WWG = sex + sire";
model = build_model(model_equation,σe2)
set_random(model,"sire", pedigree, σs2);

In [1]:
solve(model,data,solver="Jacobi")

5×2 Array{Any,2}:
 "1:sex : male"     4.33298  
 "1:sex : female"   3.36485  
 "1:sire : 1"       0.0541139
 "1:sire : 3"       0.0156524
 "1:sire : 4"      -0.0693642

# Reduced Animal Model

The following example is from Chapter 3.5 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) ".

## Data and Model

Consider the same data set used for the *Animal Model.* The objective is to estimate the effects of sex and breeding values for all animals. In the standard animal model, an equation is required for each evaluated animal. Because the breeding values of progeny can be obtained by back-solving from the predicted parental breeding values, setting equations only for parents can also achieve our goal while it is much more computationally efficient. Quaas and Pollak developed such a method, which is the Reduced Animal Model (RAM).

In [1]:
invA = [
     1.833  0.500  0.000 -0.667  0.000 -1.000  
     0.500  2.000  0.500  0.000 -1.000 -1.000 
     0.000  0.500  1.500  0.000 -1.000  0.000  
    -0.667  0.000  0.000  1.333  0.000  0.000 
     0.000 -1.000 -1.000  0.000  2.000  0.000 
    -1.000 -1.000  0.000  0.000  0.000  2.000 
];

The **model** to describe the observations is:

$y_{p,i} = b_i + \frac{1}{2}\alpha_{s} + \frac{1}{2}\alpha_{d} + e_{i}\\
y_{np,i} = b_i + \frac{1}{2}\alpha_{s} + \frac{1}{2}\alpha_{d} + e_{i}^*$

where $e_{i}^{*} = m_i + e_{i}$ , $\alpha_s$ and $\alpha_d$ are the breeding values for sire and dam, and $m_i$ is the mendelian sampling. 

**In matrix terms**, the model can be written as

$\begin{bmatrix}
y_p\\
y_{np}
\end{bmatrix}
=
\begin{bmatrix}
X_p\\
X_{np}
\end{bmatrix}
b + 
\begin{bmatrix}
Z_p\\
Z_{np}
\end{bmatrix}
\alpha_{p}
+
\begin{bmatrix}
e_p\\
e_{np}
\end{bmatrix}$

where:

$y_p$ is the WWG of the parent calf;

$y_{np}$ is the WWG of the non-parent calfis the fixed effect of the sex;

$\alpha_p$ is the random effect of th parent calf;

$e_p$ and $e_{np}$ is the residual vector for parent and non-parent calf.

For simplicity, we assume:

$y = \begin{bmatrix}
y_p\\
y_{np}
\end{bmatrix}$**,** $W = \begin{bmatrix}
Z_p\\
Z_{np}
\end{bmatrix}$ **,** $R = \begin{bmatrix}
R_p & 0\\
0 & R_{np}
\end{bmatrix}
=
\begin{bmatrix}
I \sigma_{e}^2 & 0\\
0 & I \sigma_{e}^{2*}
\end{bmatrix}
=
\begin{bmatrix}
I & 0\\
0 & I + D \frac{\sigma_{\alpha}^2}{\sigma_{e}^2}
\end{bmatrix}
\sigma_{e}^2$ **,**

where D is an diagonal matrix whose elements  $D_{ii}$ are defined as following

 $D_{ii} = d_{i}(1 - F_{i})$,

where $d_{i}$ equals 0.5, 0.75 or 1 if both, one or no parents of animal i are known.$F_{i}$  is the average inbreeding for both parents. If only one parent is known, $F_{i}$ is the inbreeding coefficient of the known parent. $F_{i}$ equals to 0 if no parent is known.

$$**Mixed Model Equations**

$\begin{bmatrix}
X'R^{-1}X & X'R^{-1}W\\
W'R^{-1}X & W'R^{-1}W +(A\sigma_a^2)^{-1}
\end{bmatrix} 
\begin{bmatrix}
\hat{b}\\
\hat{\alpha_p}
\end{bmatrix}
=\begin{bmatrix}
X'R^{-1}y\\
W'R^{-1}y
\end{bmatrix}$

## $$Do It From Scratch

**Step 1: Constructing the R**

In our case, since calves 4, 5 and 6 are parents, the diagonal elements corresponding to these three are equal to the $\sigma^2_e$, which is 40.

Since calves 7 and 8 are non parents. The diagonal elements corresponding to these two are equal to the $\sigma^2_e + d_i \sigma^2_a$. For each calf, $d_i$ equal to $\frac{1}{2}$ since both their parents are known and inbreeding of the parents is ignored. Therefore, $r_{77} = r_{88} = \sigma^2_e + d\sigma^2_a = 40 + \frac{1}{2}*20 = 50 $.

In [1]:
R = [40 0 0 0 0
     0 40 0 0 0
     0 0 40 0 0
     0 0 0 50 0
     0 0 0 0 50]

5×5 Array{Int64,2}:
 40   0   0   0   0
  0  40   0   0   0
  0   0  40   0   0
  0   0   0  50   0
  0   0   0   0  50

**Step 2: Constructing the X**

X is the same as in section 3.3, which represents the sex effects.

In [1]:
X = [1 0; 0 1; 0 1; 1 0; 1 0]

5×2 Array{Int64,2}:
 1  0
 0  1
 0  1
 1  0
 1  0

In [1]:
X'*inv(R)*X

2×2 Array{Float64,2}:
 0.065  0.0 
 0.0    0.05

**Step 3: Constructing the W**

we know $W = 

\begin{pmatrix}

Z\\

Z_1\\

\end{pmatrix}$, and $Z$ corresponds to the breeding values of the parents animal, which is animal 4,5,6. $Z_1$ corresponds to the breeding value of the non-parents animal, which is animal 7,8.

In [1]:
#Construct the incidence matrix for parent animal 1~6 
Zp = [0 0 0 1 0 0
     0 0 0 0 1 0
     0 0 0 0 0 1]

3×6 Array{Int64,2}:
 0  0  0  1  0  0
 0  0  0  0  1  0
 0  0  0  0  0  1

In [1]:
#Construct the incidene matrix for non-parent animal 7,8
Znp = [0 0 0 0.5 0.5 0 
      0 0 0.5 0 0 0.5]

2×6 Array{Float64,2}:
 0.0  0.0  0.0  0.5  0.5  0.0
 0.0  0.0  0.5  0.0  0.0  0.5

In [1]:
W = [Zp
     Znp]

5×6 Array{Float64,2}:
 0.0  0.0  0.0  1.0  0.0  0.0
 0.0  0.0  0.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0  0.0  1.0
 0.0  0.0  0.0  0.5  0.5  0.0
 0.0  0.0  0.5  0.0  0.0  0.5

In [1]:
W'* inv(R) * W

6×6 Array{Float64,2}:
 0.0  0.0  0.0    0.0    0.0    0.0  
 0.0  0.0  0.0    0.0    0.0    0.0  
 0.0  0.0  0.005  0.0    0.0    0.005
 0.0  0.0  0.0    0.03   0.005  0.0  
 0.0  0.0  0.0    0.005  0.03   0.0  
 0.0  0.0  0.005  0.0    0.0    0.03 

**Step 4: Constructing the MME**

In [1]:
LHS = [X'*inv(R)*X X'*inv(R)*W
       W'*inv(R)*X W'*inv(R)*W + invA/σa2]

8×8 Array{Float64,2}:
 0.065  0.0     0.0       0.0     0.01    0.035     0.01    0.01 
 0.0    0.05    0.0       0.0     0.0     0.0       0.025   0.025
 0.0    0.0     0.09165   0.025   0.0    -0.03335   0.0    -0.05 
 0.0    0.0     0.025     0.1     0.025   0.0      -0.05   -0.05 
 0.01   0.0     0.0       0.025   0.08    0.0      -0.05    0.005
 0.035  0.0    -0.03335   0.0     0.0     0.09665   0.005   0.0  
 0.01   0.025   0.0      -0.05   -0.05    0.005     0.13    0.0  
 0.01   0.025  -0.05     -0.05    0.005   0.0       0.0     0.13 

In [1]:
y = data[:WWG]

5-element Array{Float64,1}:
 4.5
 2.9
 3.9
 3.5
 5.0

In [1]:
RHS = [X'*inv(R)*y
       W'*inv(R)*y]

8-element Array{Float64,1}:
 0.2825
 0.17  
 0.0   
 0.0   
 0.05  
 0.1475
 0.1075
 0.1475

In [1]:
sol = LHS\RHS

8-element Array{Float64,1}:
  4.35848   
  3.40442   
  0.098486  
 -0.0187685 
 -0.0410794 
 -0.00862435
 -0.185727  
  0.176893  

**Step 5: Solve for non-parents animals**

Remember in section 3.3, we have proved that $a_i = 0.5(a_s + a_d) + m_i$. We can utilize this formula to get the breeding value of non parent animals. Also, for the k term, since the $R^{-1}$ has not been factored out of the MME, $k=\frac{r^{11}}{r^{11} + d^{-1}_i g^{-1}}$.

In [1]:
k = 0.025/(0.025 + 2*0.05)

0.2

In [1]:
a7 = 0.5*(-0.009 - 0.186) + 0.2*(3.5 - 4.358 - 0.5*(-0.009 - 0.186))

-0.2496

In [1]:
a8 = 0.5*(-0.041 + 0.177) + 0.2*(5.0 - 4.358 - 0.5*(-0.041+0.177))

0.1828

## JWAS

In [1]:
model_equation = "WWG = sire + Dam + sex";
R = 40
model = build_model(model_equation,R);
G = 20
set_random(model,"sire Dam",G);
out = runMCMC(model, data, chain_length=5000,output_samples_frequency=100);

In [1]:
out["Posterior mean of location parameters"]

In [1]:
jwas_estimate = out["Posterior mean of location parameters"][:Estimate]
jwas_sol =[jwas_estimate[8],jwas_estimate[9],jwas_estimate[1],jwas_estimate[5],jwas_estimate[2],jwas_estimate[3],
jwas_estimate[6],jwas_estimate[7]]

8-element Array{Float64,1}:
  4.39432  
  3.29816  
  0.212434 
  0.0556443
 -0.0949591
 -0.159821 
 -0.405382 
  0.351931 

In [1]:
cor(sol,jwas_sol)

0.997122

Is everything good now?

## JWAS approach 2

In [1]:
data = DataFrame(ID = [4,5,6,7,8],
    sex=["male","female","female","male","male"],
    ID2 = [4,5,6,missing,missing],
    sire=[1,3,1,4,3], 
    Dam=[missing,2,2,5,6],
    WWG=[4.5,2.9,3.9,3.5,5.0])

In [1]:
data_parents = data[1:3,:]
par_pedfile = DataFrame(ID = data_parents[:ID] , sire = data_parents[:sire], dam = data_parents[:Dam])
par_pedfile[1,3] = 0
CSV.write("par_pedigree.txt",par_pedfile)
par_pedigree   = get_pedigree("par_pedigree.txt",separator=",",header=true)

Pedigree(7, Dict{AbstractString,JWAS.PedModule.PedNode}("4"=>PedNode(2, "1", "0", 0.0),"1"=>PedNode(1, "0", "0", 0.0),"5"=>PedNode(5, "3", "2", 0.0),"2"=>PedNode(4, "0", "0", 0.0),"6"=>PedNode(6, "1", "2", 0.0),"3"=>PedNode(3, "0", "0", 0.0)), Dict(7=>0.0,9=>0.0), Set(Any[]), Set(Any[]), Set(Any[]), Set(Any[]), ["1", "4", "3", "2", "5", "6"])

[par_pedigree.txt](https://nextjournal.com/data/QmUZvKEsktqApKSsGNBatDZU3PZgv7o7XDeurRNJy1NJq4?content-type=text/plain&node-id=be2d518e-a576-4535-b3df-5b24c85fc103&filename=par_pedigree.txt&node-kind=file)


In [1]:
model_equation2 = "WWG = ID2 + sex";
R = 40
model2 = build_model(model_equation2,R);
G2 = 20
set_random(model2,"ID2",par_pedigree,G2)
out2 = runMCMC(model2, data, chain_length=5000,output_samples_frequency=100);

# Animal Model with Groups

The following example is from Chapter 3.6 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) ".

## Data and Model

Consider the data set used in the previous models, founders were assumed to originate from a single population with null mean and variance  $\sigma_a^2$. However, when various sub-population structures (i.e. various genetic means) need to be accounted for in the prediction model to avoid biased estimation of breeding values, a proper grouping of founders given available information should be considered. The objective is to estimate sex effects and predict breeding values for animals and phantom parents (groups).

With groups, the **model** is 

 $y_{ij}=p_j+a_i+\sum_{k=1}^n t_{ik}g_k+e_{ij}$

where

 $p_j$ is the fixed effect of the *j*th sex

 $a_i$ is the random effect of animal *i*

 $g_k$ is the fixed group effect containing the *k*th ancestor 

 $t_{ik}$ is the additive genetic relationship between the *k*th and *i*th animals 

n is the number of ancestors of animal *i*

 $e_{ij}$ is the random environmental effect for animal *i*

**In matrix terms**, the model can be written as

 $\mathbf{y = Xb + ZQg + Za + e}$

where

 $\mathbf{Q = TQ^*}$ ( $\mathbf{Q^*}$ assigns unidentified ancestors to groups;  $\mathbf{T}$ is obtained from  $\mathbf{A = TDT'}$ )

**Mixed Model Equations**

 $\begin{bmatrix} X'X & X'Z & X'ZQ \\ Z'X & Z'Z + A^{-1}\frac{\sigma^2_e}{\sigma^2_a} & Z'ZQ \\
Q'Z'X & Q'Z'Z & Q'Z'ZQ \end{bmatrix}
\begin{bmatrix} \hat{b} \\ \hat{a} \\ \hat{g} \end{bmatrix}
= \begin{bmatrix} X'y \\ Z'y \\ Q'Z'y \end{bmatrix}$

## Do It From Scratch

In [1]:
using JWAS, DataFrames, LinearAlgebra, CSV, Statistics

data = DataFrame(ID=["4","5","6","7","8"], 
  sex=["male","female","female","male","male"], 
  sire=["1","3","1","4","3"], 
  dam=["0","2","2","5","6"], 
  WWG=[4.5,2.9,3.9,3.5,5.0])

Assuming that males are of different genetic merit compared to females ,  such that 2 group effects are considered in the model.

In [1]:
y = data[:WWG]
X = [1 0; 0 1; 0 1; 1 0; 1 0]
X = X[:,1]
Z = [0 0 0 1 0 0 0 0
     0 0 0 0 1 0 0 0
     0 0 0 0 0 1 0 0
     0 0 0 0 0 0 1 0
     0 0 0 0 0 0 0 1]
λ = 2;

In [1]:
# Construct the original relationship matrix A
ped=[
1 0 0
2 0 0
3 0 0
4 1 0
5 3 2
6 1 2
7 4 5
8 3 6
];
s=ped[:,2]
d=ped[:,3]
n = length(s)
s=(s .== 0)*n .+s
d=(d .== 0)*n .+d
A = zeros(n,n);
for i in 1:n
    A[i,i] = 1 + A[s[i], d[i]]/2
    for j in (i+1):n    
        A[i,j] = ( A[i, s[j]] + A[i, d[j]] ) / 2  
        A[j,i] = A[i,j] 
    end           
end
A

8×8 Array{Float64,2}:
 1.0   0.0   0.0   0.5     0.0    0.5   0.25    0.25 
 0.0   1.0   0.0   0.0     0.5    0.5   0.25    0.25 
 0.0   0.0   1.0   0.0     0.5    0.0   0.25    0.5  
 0.5   0.0   0.0   1.125   0.0    0.25  0.5625  0.125
 0.0   0.5   0.5   0.0     1.0    0.25  0.5     0.375
 0.5   0.5   0.0   0.25    0.25   1.0   0.25    0.5  
 0.25  0.25  0.25  0.5625  0.5    0.25  1.0     0.25 
 0.25  0.25  0.5   0.125   0.375  0.5   0.25    1.0  

In [1]:
# Construct the matrix Q (the additive genetic relationship between the kth group and ith animals)
ped=[
1 0 0
2 0 0
3 1 2
4 1 2
5 1 2
6 3 2
7 5 4
8 3 4
9 6 7
10 5 8
]; ## animal 1 and 2 are the group effects (phantom parents)
s=ped[:,2]
d=ped[:,3]
n = length(s)
s=(s .== 0)*n .+s
d=(d .== 0)*n .+d
Agroup = zeros(n,n);
for i in 1:n
    Agroup[i,i] = 1 + Agroup[s[i], d[i]]/2
    for j in (i+1):n    
        Agroup[i,j] = ( Agroup[i, s[j]] + Agroup[i, d[j]] ) / 2  
        Agroup[j,i] = Agroup[i,j] 
    end           
end
Agroup
Q = Agroup[3:end, 1:2]

8×2 Array{Float64,2}:
 0.5    0.5  
 0.5    0.5  
 0.5    0.5  
 0.25   0.75 
 0.5    0.5  
 0.5    0.5  
 0.375  0.625
 0.5    0.5  

In [1]:
LHS = [X'X X'Z X'Z*Q
       Z'X Z'Z+inv(A)*λ Z'Z*Q
       Q'Z'X Q'Z'Z Q'Z'Z*Q]
RHS = [X'y
       Z'y
       Q'Z'y]
Sol = inv(LHS)*RHS

11-element Array{Float64,1}:
  1.14104  
  0.117536 
 -0.0390521
 -0.0784834
  0.0587678
 -0.215355 
  0.156777 
 -0.224739 
  0.117251 
  4.31232  
  2.54626  

In [1]:
LHS

12×12 Array{Float64,2}:
 3.0    0.0   0.0       0.0   0.0  …   1.0       1.0  1.125     1.875  
 0.0    2.0   0.0       0.0   0.0      0.0       0.0  1.0       1.0    
 0.0    0.0   3.57143   1.0   0.0      0.0       0.0  0.0       0.0    
 0.0    0.0   1.0       4.0   1.0      0.0       0.0  0.0       0.0    
 0.0    0.0   0.0       1.0   4.0      0.0      -2.0  0.0       0.0    
 1.0    0.0  -1.14286   0.0   0.0  …  -2.13333   0.0  0.25      0.75   
 0.0    1.0   0.0      -2.0  -2.0     -2.13333   0.0  0.5       0.5    
 0.0    1.0  -2.0      -2.0   1.0      0.0      -2.0  0.5       0.5    
 1.0    0.0   0.0       0.0   0.0      5.26667   0.0  0.375     0.625  
 1.0    0.0   0.0       0.0  -2.0      0.0       5.0  0.5       0.5    
 1.125  1.0   0.0       0.0   0.0  …   0.375     0.5  0.953125  1.17188
 1.875  1.0   0.0       0.0   0.0      0.625     0.5  1.17188   1.70313

## JWAS

# Repeatability Model 

When an animal has multiple records on the same trait,  e.g. daily milk yield, there is additional resemblance among those records of an animal due to the permanent environmental effect on this animal. Thus, the difference between two animals are due to both genetic and permanent environmental effect.

## Data and Model

The following example is from Chapter 4.2 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) ".

For illustrative purpose, assume a single dairy herd with the following data structure for five cows:

In [1]:
using DataFrames
data = DataFrame(Cow    = [4,4,5,5,6,6,7,7,8,8], 
                 Sire   = [1,1,3,3,1,1,3,3,1,1], 
                 Dam    = [2,2,2,2,5,5,4,4,7,7], 
                 Parity = [1,2,1,2,1,2,1,2,1,2], 
                 HYS    = [1,3,1,4,2,3,1,3,2,4], 
                 Fat_yield = [201,280,150,200,160,190,180,250,285,300])
data[:Pe] = data[:Cow]
data

The **model** to describe the observations is:

$$
y = X * b + Z * a + W * pe + e
$$
Where

* $y$ is the vector of observations
* $b$ is the vector of fixed effects
* $a$ is the vector of additive random animal effects
* $pe$ is the  vector of random permanent environmental effects
*  $e$ is the vector of random residual effect
* $X,Z,W$are incidence matrices  

Assumptions:

* the permanent environmental and residual effects are independent.

Variance components:

*  $var(pe) = I\sigma^2_{pe}$
* $var(e) = I \sigma^2_e =R$
*  $var(a) = A \sigma^2_a$
*  $var(y) = ZAZ'\sigma^2_a + WI\sigma^2_{pe}W'+ R$

**Mixed Model Equations**

$$
\begin{bmatrix}
X'X & X'Z & X'W \\
Z'X & Z'Z + A^{-1}\frac{\sigma^2_e}{\sigma^2_a} & Z'W \\
W'X & W'Z & W'W+I\frac{\sigma^2_e}{\sigma^2_{pe}}
\end{bmatrix} = \begin{bmatrix}
X'y \\
Z'y \\
W'y
\end{bmatrix}
$$
## Do It From Scratch

[pedigree_4_2.csv](https://nextjournal.com/data/QmcLJMoP47GeU1vbmw6rNdJbuBj9W4uSn5t14vS82NsgTq?content-type=text/csv&node-id=10492400-b68e-4c12-8722-a08be48ca095&filename=pedigree_4_2.csv&node-kind=file)


In [1]:
y = [201,280,150,200,160,190,180,250,285,300]

X=[0 1 0 0 0 1
   1 0 0 1 0 0
   0 1 0 0 0 1
   1 0 1 0 0 0
   0 1 0 0 1 0
   1 0 0 1 0 0
   0 1 0 0 0 1
   1 0 0 1 0 0
   0 1 0 0 1 0
   1 0 1 0 0 0]
b = ["parity2", "parity1", "HYS4", "HYS3", "HYS2", "HYS1"]

Z=  [0 0 0 0 1 0 0 0 
     0 0 0 0 1 0 0 0
     0 0 0 1 0 0 0 0
     0 0 0 1 0 0 0 0
     0 0 1 0 0 0 0 0
     0 0 1 0 0 0 0 0
     0 1 0 0 0 0 0 0
     0 1 0 0 0 0 0 0
     1 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0]
u = ["BV8", "BV7", "BV6", "BV5", "BV4", "BV3", "BV2", "BV1"];

W = Z[:,1:5]
pe= ["pe8","pe7","pe6","pe5","pe4"] 

σe2  = 28
σa2  = 20
σpe2 = 12
λ1   = σe2/σa2
λ2   = σe2/σpe2

A_inv =[2.5 0.5 0.0 -1.0 0.5 -1.0 0.5 -1.0
        0.5 1.5 0.0 -1.0 0.0 0.0  0.0  0.0
        0.0 0.0 1.83 0.5 -0.67 0.0  -1.0 0.0
        -1.0 -1.0 0.5 2.5 0.0 0.0 -1.0 0.0
        0.5 0.0 -0.67 0.0 1.83 -1.0 0.0  0.0
        -1.0 0.0 0.0 0.0 -1.0 2.0 0.0  0.0
        0.5 0.0 -1.0 -1.0 0.0 0.0 2.5 -1.0
        -1.0 0.0 0.0 0.0 0.0 0.0 -1.0 2.0];

In [1]:
using LinearAlgebra
lhs    =[X'X X'Z X'W
         Z'X Z'Z+A_inv*λ1 Z'W
         W'X W'Z W'W+I*λ2]

19×19 Array{Float64,2}:
 5.0  0.0  2.0  3.0  0.0  0.0   1.0  …  1.0      1.0      1.0      1.0    
 0.0  5.0  0.0  0.0  2.0  3.0   1.0     1.0      1.0      1.0      1.0    
 2.0  0.0  2.0  0.0  0.0  0.0   1.0     0.0      0.0      1.0      0.0    
 3.0  0.0  0.0  3.0  0.0  0.0   0.0     1.0      1.0      0.0      1.0    
 0.0  2.0  0.0  0.0  2.0  0.0   1.0     0.0      1.0      0.0      0.0    
 0.0  3.0  0.0  0.0  0.0  3.0   0.0  …  1.0      0.0      1.0      1.0    
 1.0  1.0  1.0  0.0  1.0  0.0   5.5     0.0      0.0      0.0      0.0    
 1.0  1.0  0.0  1.0  0.0  1.0   0.7     2.0      0.0      0.0      0.0    
 1.0  1.0  0.0  1.0  1.0  0.0   0.0     0.0      2.0      0.0      0.0    
 1.0  1.0  1.0  0.0  0.0  1.0  -1.4     0.0      0.0      2.0      0.0    
 1.0  1.0  0.0  1.0  0.0  1.0   0.7  …  0.0      0.0      0.0      2.0    
 0.0  0.0  0.0  0.0  0.0  0.0  -1.4     0.0      0.0      0.0      0.0    
 0.0  0.0  0.0  0.0  0.0  0.0   0.7     0.0      0.0      0.0      0.0    
 

In [1]:
rhs=[X'y
     Z'y
     W'y];

Note that *HYS1* and *HYS2* are nested in *Parity1*, and *HYS3* and *HYS4* are nested in *Parity2,*  the equations for *HYS1 and HYS3* are set to zeros to avoid collinearity. 

In [1]:
# method1: delete the corresponding equation in MME
pickme = deleteat!(collect(1:size(lhs,1)),[4,6])
lhs    = lhs[pickme,pickme]
rhs    = rhs[pickme];
lhs\rhs

17-element Array{Float64,1}:
 250.516  
 180.37   
 -10.3859 
  42.9799 
  19.8546 
  -6.75542
 -22.0055 
  -5.06146
   2.15832
  11.0065 
 -13.5335 
   3.16054
  18.8794 
   2.91337
 -18.4281 
 -13.9332 
  10.5686 

In [1]:
# method2: delete the corresponding equation in X
pickme = deleteat!(collect(1:size(X,2)),[4,6])
X      = X[:,pickme]
lhs    =[X'X X'Z X'W
         Z'X Z'Z+A_inv*λ1 Z'W
         W'X W'Z W'W+I*λ2]
rhs=[X'y
     Z'y
     W'y]
lhs\rhs

17-element Array{Float64,1}:
 250.516  
 180.37   
 -10.3859 
  42.9799 
  19.8546 
  -6.75542
 -22.0055 
  -5.06146
   2.15832
  11.0065 
 -13.5335 
   3.16054
  18.8794 
   2.91337
 -18.4281 
 -13.9332 
  10.5686 

In [1]:
# para=["parity2","parity1","HYS4","HYS3","HYS2","HYS1",
#     "animal8","animal7","animal6","animal5","animal4","animal3","animal2","animal1",
#     "pe8","pe7","pe6","pe5","pe4"];
res_book=[241.893, 175.472, 0.013, 44.065,  24.194, 9.328, -18.387, -18.207, 13.581, -7.063, -3.084, 10.148, 
          17.347, -1.390, -17.229, -7.146, 8.417];
cor(res_book,lhs\rhs)

0.992894

## JWAS

In [1]:
using JWAS
pedigree = get_pedigree("/.nextjournal/data-named/QmcLJMoP47GeU1vbmw6rNdJbuBj9W4uSn5t14vS82NsgTq/pedigree_4_2.csv",separator=",",header=false);

In [1]:
data

In [1]:
model_equation = "Fat_yield = Parity + HYS + Cow + Pe";
model = build_model(model_equation,σe2)
set_random(model,"Cow",pedigree,σa2)
set_random(model,"Pe",σpe2)

In [1]:
sol=solve(model,data,solver="Jacobi") 
#the Jacobi solver is not able to calculate the variance component

19×2 Array{Any,2}:
 "1:Parity : 1"   96.5468 
 "1:Parity : 2"  120.947  
 "1:HYS : 1"      78.9206 
 "1:HYS : 3"     120.941  
 "1:HYS : 4"     120.955  
 "1:HYS : 2"     122.986  
 "1:Cow : 1"      10.1508 
 "1:Cow : 3"      -7.06079
 "1:Cow : 2"      -3.08125
 "1:Cow : 4"      13.5851 
 "1:Cow : 7"       9.33297
 "1:Cow : 8"      24.1981 
 "1:Cow : 5"     -18.203  
 "1:Cow : 6"     -18.3825 
 "1:Pe : 4"        8.41715
 "1:Pe : 5"       -7.14523
 "1:Pe : 6"      -17.2283 
 "1:Pe : 7"       -1.38957
 "1:Pe : 8"       17.3468 

In [1]:
# Jacobi solver in JWAS use "\"
Matrix(model.mmeLhs)\vec(Matrix(model.mmeRhs))

19-element Array{Float64,1}:
  22.033  
  27.721  
 153.439  
 214.172  
 214.185  
 197.504  
  10.1476 
  -7.06342
  -3.08415
  13.5807 
   9.32843
  24.1936 
 -18.207  
 -18.3868 
   8.41698
  -7.14558
 -17.2285 
  -1.38965
  17.3467 

In [1]:
jwas_para=["parity1","parity2",
           "HYS1","HYS3","HYS4","HYS2",
           "BV1","BV3","BV2","BV4","BV7","BV8","BV5","BV6",
           "pe4","pe5","pe6","pe7","pe8"];
jwas_res=sol[:,2];
jwas_sol=[jwas_para jwas_res]

19×2 Array{Any,2}:
 "parity1"   96.5468 
 "parity2"  120.947  
 "HYS1"      78.9206 
 "HYS3"     120.941  
 "HYS4"     120.955  
 "HYS2"     122.986  
 "BV1"       10.1508 
 "BV3"       -7.06079
 "BV2"       -3.08125
 "BV4"       13.5851 
 "BV7"        9.33297
 "BV8"       24.1981 
 "BV5"      -18.203  
 "BV6"      -18.3825 
 "pe4"        8.41715
 "pe5"       -7.14523
 "pe6"      -17.2283 
 "pe7"       -1.38957
 "pe8"       17.3468 

In [1]:
jwas_sol_a = jwas_sol[deleteat!(collect(1:size(jwas_sol,1)),[3,4]),:][5:end,:]
res_book2=[10.148,  -7.063, -3.084, 13.581,  9.328, 24.194,-18.207, -18.387,
           8.417,-7.146,-17.229,-1.390,17.347];
[jwas_sol_a res_book2]

13×3 Array{Any,2}:
 "BV1"   10.1508    10.148
 "BV3"   -7.06079   -7.063
 "BV2"   -3.08125   -3.084
 "BV4"   13.5851    13.581
 "BV7"    9.33297    9.328
 "BV8"   24.1981    24.194
 "BV5"  -18.203    -18.207
 "BV6"  -18.3825   -18.387
 "pe4"    8.41715    8.417
 "pe5"   -7.14523   -7.146
 "pe6"  -17.2283   -17.229
 "pe7"   -1.38957   -1.39 
 "pe8"   17.3468    17.347

JWAS has almost the same EBV as book.

# Model with Common Environmental Effects

Similar to the repeatability model, records of animals sharing the same environment, e.g. pigs reared by the same mother, have additional resemblance due to the common environment. Thus, the variance between two animals reared by different mothers is due to both genetic and common environmental factors.

The following example is from Chapter 4.3 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition)".

Consider the following data set on the weaning weight of piglets, which are progeny of three sows mated to two boars:

## Data and Model

In [1]:
using DataFrames, LinearAlgebra, Statistics

data = DataFrame(Piglet=["a6","a7","a8","a9","a10","a11","a12","a13","a14","a15"],
                 Sire=["a1","a1","a1","a3","a3","a3","a3","a1","a1","a1"],
                 Dam=["a2","a2","a2","a4","a4","a4","a4","a5","a5","a5"],
                 Sex=["Male","Female","Female","Female","Male","Female","Female","Male","Female","Male"],
                 Weight=[90.0,70,65,98,106,60,80,100,85,68])
data

The **model** to describe the observations is:

$y = Xb + Za + Wc + e$

Where  

* $y$ is the vector of observations
* $b$ is the vector of fixed effects
* $a$ is the vector of addtive random animal effects
* $c$ is the vector of commom environmental effects
* $e$ is the vector of random residual effect
* $X,Z,W$ are the incidence matrix  

Assumptions:

* the common environmental and residual effects are independent.

Variance components:

* $var(c) = I\sigma^2_{c}$
* $var(e) = I \sigma^2_e =R$
* $var(a) = A \sigma^2_a$
* $var(y) = ZAZ'\sigma^2_a + WI\sigma^2_{c}W'+ R$

**Mixed Model Equations**

$$
\begin{bmatrix}
X'X & X'Z & X'W \\
Z'X & Z'Z + A^{-1}\frac{\sigma^2_e}{\sigma^2_a} & Z'W \\
W'X & W'Z & W'W+I\frac{\sigma^2_e}{\sigma^2_{c}}
\end{bmatrix} = \begin{bmatrix}
X'y \\
Z'y \\
W'y
\end{bmatrix}
$$
## Do it from Scratch

In [1]:
# Calculate A
ped=[
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 1 2
7 1 2
8 1 2
9 3 4
10 3 4
11 3 4
12 3 4
13 1 5
14 1 5
15 1 5
];
s=ped[:,2]
d=ped[:,3]
n = length(s)
s=(s .== 0)*n .+s
d=(d .== 0)*n .+d;
A = zeros(n,n);
for i in 1:n
    A[i,i] = 1 + A[s[i], d[i]]/2
    for j in (i+1):n    
        A[i,j] = ( A[i, s[j]] + A[i, d[j]] ) / 2  
        A[j,i] = A[i,j] 
    end           
end

In [1]:
y=data[:Weight]

X=[0 1
   1 0
   1 0
   1 0
   0 1
   1 0
   1 0
   0 1
   1 0
   0 1]
b=["female","male"]

W=[0 0 1
   0 0 1
   0 0 1
   0 1 0
   0 1 0
   0 1 0
   0 1 0
   1 0 0
   1 0 0
   1 0 0]
c=["ce5","ce4","ce2"]

Z=[zeros(10,5) I]
u=["BV1","BV2","BV3","BV4","BV5","BV6","BV7","BV8",
    "BV9","BV10","BV11","BV12","BV13","BV14","BV15"]

σa2=20
σe2=65
σc2=15
σy2=σa2+σe2+σc2
λ1=σe2/σa2
λ2=σe2/σc2

A_inv=inv(A)

15×15 Array{Float64,2}:
  4.0   1.5  -0.0   0.0   1.5  -1.0  …   0.0  -0.0   0.0  -1.0  -1.0  -1.0
  1.5   2.5  -0.0   0.0   0.0  -1.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0   3.0   2.0   0.0   0.0     -1.0  -1.0  -1.0  -0.0   0.0  -0.0
  0.0   0.0   2.0   3.0   0.0   0.0     -1.0  -1.0  -1.0  -0.0   0.0  -0.0
  1.5   0.0   0.0   0.0   2.5   0.0      0.0  -0.0   0.0  -1.0  -1.0  -1.0
 -1.0  -1.0   0.0   0.0   0.0   2.0  …   0.0  -0.0   0.0  -0.0   0.0  -0.0
 -1.0  -1.0   0.0   0.0   0.0   0.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
 -1.0  -1.0   0.0   0.0   0.0   0.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0      2.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0  …   0.0   2.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0      0.0   0.0   2.0  -0.0   0.0  -0.0
 -1.0   0.0   0.0   0.0  -1.0   0.0      0.0   0.0   0.0   2.0  -0.0  -0.0
 

In [1]:
lhs=[X'X X'Z X'W
     Z'X Z'Z+inv(A)*λ1 Z'W
     W'X W'Z W'W+I*λ2]

20×20 Array{Float64,2}:
 6.0  0.0   0.0     0.0     0.0    0.0   …   0.0   1.0      3.0      2.0    
 0.0  4.0   0.0     0.0     0.0    0.0       1.0   2.0      1.0      1.0    
 0.0  0.0  13.0     4.875   0.0    0.0      -3.25  0.0      0.0      0.0    
 0.0  0.0   4.875   8.125   0.0    0.0       0.0   0.0      0.0      0.0    
 0.0  0.0   0.0     0.0     9.75   6.5       0.0   0.0      0.0      0.0    
 0.0  0.0   0.0     0.0     6.5    9.75  …   0.0   0.0      0.0      0.0    
 0.0  0.0   4.875   0.0     0.0    0.0      -3.25  0.0      0.0      0.0    
 0.0  1.0  -3.25   -3.25    0.0    0.0       0.0   0.0      0.0      1.0    
 1.0  0.0  -3.25   -3.25    0.0    0.0       0.0   0.0      0.0      1.0    
 1.0  0.0  -3.25   -3.25    0.0    0.0       0.0   0.0      0.0      1.0    
 1.0  0.0   0.0     0.0    -3.25  -3.25  …   0.0   0.0      1.0      0.0    
 0.0  1.0   0.0     0.0    -3.25  -3.25      0.0   0.0      1.0      0.0    
 1.0  0.0   0.0     0.0    -3.25  -3.25      0.0   0

In [1]:
rhs=[X'y
     Z'y
     W'y];

In [1]:
mme_res = lhs\rhs

20-element Array{Float64,1}:
 75.7644  
 91.4931  
 -1.44077 
 -1.17488 
  1.44077 
  1.44077 
 -0.265894
 -1.09756 
 -1.66707 
 -2.33373 
  3.92526 
  2.89476 
 -1.14141 
  1.52526 
  0.447871
  0.545031
 -3.8188  
 -0.398841
  2.16116 
 -1.76232 

In [1]:
#correlation between EBV from book and mme 
res_book=[75.764, 91.493,
          -1.441, -1.175, 1.441, 1.441, -0.266, -1.098, -1.667, -2.334, 3.925, 2.895,
          -1.141, 1.525, 0.448, 0.545, -3.819,
          -0.399,2.161,-1.762];
cor(res_book,mme_res)

1.0

## JWAS

In [1]:
using JWAS

[pedigree_4_3.txt](https://nextjournal.com/data/QmZaAf7ZatZGTwrpRt2EbVUaYm7vvzFuMdgZk9BSM5aEig?content-type=text/plain&node-id=0ab28489-f27c-4a06-b57a-bf4fa46e0691&filename=pedigree_4_3.txt&node-kind=file)


In [1]:
pedigree   = get_pedigree("/.nextjournal/data-named/QmZaAf7ZatZGTwrpRt2EbVUaYm7vvzFuMdgZk9BSM5aEig/pedigree_4_3.txt",separator=" ",header=true);

In [1]:
data

In [1]:
model_equation = "Weight = Sex + Piglet + Dam";
model = build_model(model_equation,σe2);
set_random(model,"Dam",σc2);
set_random(model,"Piglet",pedigree,σa2);

In [1]:
out=runMCMC(model,data,chain_length=100_000,output_samples_frequency=1000, burnin = 10_000);

In [1]:
jwas_res_a=out["EBV_Weight"]

In [1]:
#correlation bewteen EBV from book and JWAS
book_res_a=[0.448, -1.175, -1.441, -3.819, -1.667, 1.525,-0.266,1.441,0.545,1.441,-1.098,2.895,-1.141,-2.334,3.925]
cor(jwas_res_a[:,2],book_res_a)

0.995932