# Linear Models for Genetic Prediction (2)

<h2 class="pm-node nj-subtitle">Code it from scratch or Run it in JWAS software</h2>

<p class="pm-node nj-authors">Hao Cheng, Debbie Chapman, Zigui Wang, Tianjing Zhao, Jiayi Qu</p>

A collection of linear models for genetic prediction are presented in this note. Three sections are included for each model: 1) Model description; 2) Code it from scratch; 3) Run the analysis in a genomic analysis package "[JWAS](https://github.com/reworkhow/JWAS.jl)". Datasets and examples from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) " are used in this note. This note is written by the [QTL.ROCKS](http://qtl.rocks/), quantitative genetics lab at UC Davis.

# Repeatability Model

When an animal has multiple records on the same trait, e.g. daily milk yield, there is additional resemblance among those records of an animal due to the permanent environmental effect on this animal. Thus, the difference between two animals is due to both genetic and permanent environmental effects.

## Data and Model

The following example is from Chapter 4.2 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition) ".

For illustrative purpose, assume a single dairy herd with the following data structure for five cows:

In [1]:
using DataFrames
data = DataFrame(Cow    = [4,4,5,5,6,6,7,7,8,8], 
                 Sire   = [1,1,3,3,1,1,3,3,1,1], 
                 Dam    = [2,2,2,2,5,5,4,4,7,7], 
                 Parity = [1,2,1,2,1,2,1,2,1,2], 
                 HYS    = [1,3,1,4,2,3,1,3,2,4], 
                 Fat_yield = [201,280,150,200,160,190,180,250,285,300])
data[:Pe] =  data[:Cow]
data

The **model** to describe the observations is:

$$
y=X*b+Z*a+W*pe+e
$$
Where  

* $y$ is the vector of observations
* $b$ is the vector of fixed effects
* $a$ is the vector of additive random animal effects
* $pe$ is the  vector of random permanent environmental effects
* $e$ is the vector of random residual effect
* $X,Z,W$ are the incidence matrix

Assumptions:

* the permanent environmental effects and residual effects are independent.

Variance components:

* $var(pe) = I\sigma^2_{pe}$
* $var(e) = I \sigma^2_e =R$
* $var(a) = A \sigma^2_a$
* $var(y) = ZAZ'\sigma^2_a + WI\sigma^2_{pe}W'+ R$

**Mixed Model Equations**

$$
\begin{bmatrix}
X'X & X'Z & X'W \\
Z'X & Z'Z + A^{-1}\frac{\sigma^2_e}{\sigma^2_a} & Z'W \\
W'X & W'Z & W'W+I\frac{\sigma^2_e}{\sigma^2_{pe}}
\end{bmatrix} = \begin{bmatrix}
X'y \\
Z'y \\
W'y
\end{bmatrix}
$$
## Do It From Scratch

In [1]:
y = [201,280,150,200,160,190,180,250,285,300]

X=[0 1 0 0 0 1
   1 0 0 1 0 0
   0 1 0 0 0 1
   1 0 1 0 0 0
   0 1 0 0 1 0
   1 0 0 1 0 0
   0 1 0 0 0 1
   1 0 0 1 0 0
   0 1 0 0 1 0
   1 0 1 0 0 0]
b=["parity2","parity1","HYS4","HYS3","HYS2","HYS1"]

Z = [0 0 0 0 1 0 0 0 
     0 0 0 0 1 0 0 0
     0 0 0 1 0 0 0 0
     0 0 0 1 0 0 0 0
     0 0 1 0 0 0 0 0
     0 0 1 0 0 0 0 0
     0 1 0 0 0 0 0 0
     0 1 0 0 0 0 0 0
     1 0 0 0 0 0 0 0
     1 0 0 0 0 0 0 0]
u=["BV8","BV7","BV6","BV5","BV4","BV3","BV2","BV1"]

W=Z[:,1:5]
pe=["pe8","pe7","pe6","pe5","pe4"]

σe2  = 28
σa2  = 20
σpe2 = 12
λ1   = σe2/σa2
λ2   = σe2/σpe2

A_inv =[2.5 0.5 0.0 -1.0 0.5 -1.0 0.5 -1.0
        0.5 1.5 0.0 -1.0 0.0 0.0  0.0  0.0
        0.0 0.0 1.83 0.5 -0.67 0.0  -1.0 0.0
        -1.0 -1.0 0.5 2.5 0.0 0.0 -1.0 0.0
        0.5 0.0 -0.67 0.0 1.83 -1.0 0.0  0.0
        -1.0 0.0 0.0 0.0 -1.0 2.0 0.0  0.0
        0.5 0.0 -1.0 -1.0 0.0 0.0 2.5 -1.0
        -1.0 0.0 0.0 0.0 0.0 0.0 -1.0 2.0];

In [1]:
using LinearAlgebra
lhs = [X'X X'Z X'W
       Z'X Z'Z+A_inv*λ1 Z'W
       W'X W'Z W'W+I*λ2]

19×19 Array{Float64,2}:
 5.0  0.0  2.0  3.0  0.0  0.0   1.0  …  1.0      1.0      1.0      1.0    
 0.0  5.0  0.0  0.0  2.0  3.0   1.0     1.0      1.0      1.0      1.0    
 2.0  0.0  2.0  0.0  0.0  0.0   1.0     0.0      0.0      1.0      0.0    
 3.0  0.0  0.0  3.0  0.0  0.0   0.0     1.0      1.0      0.0      1.0    
 0.0  2.0  0.0  0.0  2.0  0.0   1.0     0.0      1.0      0.0      0.0    
 0.0  3.0  0.0  0.0  0.0  3.0   0.0  …  1.0      0.0      1.0      1.0    
 1.0  1.0  1.0  0.0  1.0  0.0   5.5     0.0      0.0      0.0      0.0    
 1.0  1.0  0.0  1.0  0.0  1.0   0.7     2.0      0.0      0.0      0.0    
 1.0  1.0  0.0  1.0  1.0  0.0   0.0     0.0      2.0      0.0      0.0    
 1.0  1.0  1.0  0.0  0.0  1.0  -1.4     0.0      0.0      2.0      0.0    
 1.0  1.0  0.0  1.0  0.0  1.0   0.7  …  0.0      0.0      0.0      2.0    
 0.0  0.0  0.0  0.0  0.0  0.0  -1.4     0.0      0.0      0.0      0.0    
 0.0  0.0  0.0  0.0  0.0  0.0   0.7     0.0      0.0      0.0      0.0    
 

In [1]:
rhs=[X'y
     Z'y
     W'y];

In [1]:
# method1: delete the corresponding equation in MME
pickme = deleteat!(collect(1:size(lhs,1)),[4,6])
lhs= lhs[pickme,pickme]
rhs=rhs[pickme]
lhs\rhs

17-element Array{Float64,1}:
 250.516  
 180.37   
 -10.3859 
  42.9799 
  19.8546 
  -6.75542
 -22.0055 
  -5.06146
   2.15832
  11.0065 
 -13.5335 
   3.16054
  18.8794 
   2.91337
 -18.4281 
 -13.9332 
  10.5686 

In [1]:
# method2: delete the corresponding equation in X
pickme = deleteat!(collect(1:size(X,2)),[4,6])
X = X[:,pickme]
lhs = [X'X X'Z X'W
     Z'X Z'Z+A_inv*λ1 Z'W
     W'X W'Z W'W+I*λ2]
rhs=[X'y
     Z'y
     W'y];
lhs\rhs

17-element Array{Float64,1}:
 250.516  
 180.37   
 -10.3859 
  42.9799 
  19.8546 
  -6.75542
 -22.0055 
  -5.06146
   2.15832
  11.0065 
 -13.5335 
   3.16054
  18.8794 
   2.91337
 -18.4281 
 -13.9332 
  10.5686 

In [1]:
using Statistics
# para=["parity2","parity1","HYS4","HYS3","HYS2","HYS1",
#     "animal8","animal7","animal6","animal5","animal4","animal3","animal2","animal1",
#     "pe8","pe7","pe6","pe5","pe4"];
res_book=[241.893, 175.472, 0.013, 44.065,  24.194, 9.328, -18.387, -18.207, 13.581, -7.063, -3.084, 10.148, 
          17.347, -1.390, -17.229, -7.146, 8.417];
cor(res_book,lhs\rhs)

0.992894

## JWAS

[pedigree_4_2.csv](https://nextjournal.com/data/QmcLJMoP47GeU1vbmw6rNdJbuBj9W4uSn5t14vS82NsgTq?content-type=application/vnd.ms-excel&node-id=d1a58db9-b91b-4bf1-b814-f0dcef8e5853&filename=pedigree_4_2.csv&node-kind=file)


In [1]:
using JWAS
pedigree = get_pedigree("/.nextjournal/data-named/QmcLJMoP47GeU1vbmw6rNdJbuBj9W4uSn5t14vS82NsgTq/pedigree_4_2.csv", separator=",", header=false);

In [1]:
data

In [1]:
model_equation  ="Fat_yield = Parity + HYS + Cow + Pe";
model = build_model(model_equation, σe2)
set_random(model, "Cow", pedigree, σa2)
set_random(model,"Pe",σpe2)

In [1]:
sol=solve(model,data,solver="Jacobi")
# the Jacobi solver is not able to calculate variance components

19×2 Array{Any,2}:
 "1:Parity : 1"   96.5468 
 "1:Parity : 2"  120.947  
 "1:HYS : 1"      78.9206 
 "1:HYS : 3"     120.941  
 "1:HYS : 4"     120.955  
 "1:HYS : 2"     122.986  
 "1:Cow : 1"      10.1508 
 "1:Cow : 3"      -7.06079
 "1:Cow : 2"      -3.08125
 "1:Cow : 4"      13.5851 
 "1:Cow : 7"       9.33297
 "1:Cow : 8"      24.1981 
 "1:Cow : 5"     -18.203  
 "1:Cow : 6"     -18.3825 
 "1:Pe : 4"        8.41715
 "1:Pe : 5"       -7.14523
 "1:Pe : 6"      -17.2283 
 "1:Pe : 7"       -1.38957
 "1:Pe : 8"       17.3468 

In [1]:
# Jacobi solver in JWAS use "\"
Matrix(model.mmeLhs)\vec(Matrix(model.mmeRhs))

19-element Array{Float64,1}:
  22.033  
  27.721  
 153.439  
 214.172  
 214.185  
 197.504  
  10.1476 
  -7.06342
  -3.08415
  13.5807 
   9.32843
  24.1936 
 -18.207  
 -18.3868 
   8.41698
  -7.14558
 -17.2285 
  -1.38965
  17.3467 

In [1]:
jwas_para=["parity1","parity2",
           "HYS1","HYS3","HYS4","HYS2",
           "BV1","BV3","BV2","BV4","BV7","BV8","BV5","BV6",
           "pe4","pe5","pe6","pe7","pe8"];
jwas_res=sol[:,2];
jwas_sol=[jwas_para jwas_res]

19×2 Array{Any,2}:
 "parity1"   96.5468 
 "parity2"  120.947  
 "HYS1"      78.9206 
 "HYS3"     120.941  
 "HYS4"     120.955  
 "HYS2"     122.986  
 "BV1"       10.1508 
 "BV3"       -7.06079
 "BV2"       -3.08125
 "BV4"       13.5851 
 "BV7"        9.33297
 "BV8"       24.1981 
 "BV5"      -18.203  
 "BV6"      -18.3825 
 "pe4"        8.41715
 "pe5"       -7.14523
 "pe6"      -17.2283 
 "pe7"       -1.38957
 "pe8"       17.3468 

In [1]:
jwas_sol_a = jwas_sol[deleteat!(collect(1:size(jwas_sol,1)),[3,4]),:][5:end,:]
res_book2=[10.148,  -7.063, -3.084, 13.581,  9.328, 24.194,-18.207, -18.387,
           8.417,-7.146,-17.229,-1.390,17.347];
[jwas_sol_a res_book2]

13×3 Array{Any,2}:
 "BV1"   10.1508    10.148
 "BV3"   -7.06079   -7.063
 "BV2"   -3.08125   -3.084
 "BV4"   13.5851    13.581
 "BV7"    9.33297    9.328
 "BV8"   24.1981    24.194
 "BV5"  -18.203    -18.207
 "BV6"  -18.3825   -18.387
 "pe4"    8.41715    8.417
 "pe5"   -7.14523   -7.146
 "pe6"  -17.2283   -17.229
 "pe7"   -1.38957   -1.39 
 "pe8"   17.3468    17.347

JWAS has almost the same EBV as book.

# Model with Common Environmental Effect

Similar to the repeatability model, records of animals sharing the same environment, e.g. pigs reared by the same mother, have additional resemblance due to the common environment. Thus, the variance between two animals reared by different mothers is due to both genetic and common environmental factors.

The following example is from Chapter 4.3 from "Linear Models for the Prediction of Animal Breeding Values (3rd Edition)".

Consider the following data set on the weaning weight of piglets, which are progeny of three sows mated to two boars:

## Data and Model

In [1]:
using DataFrames, LinearAlgebra, Statistics

data = DataFrame(Piglet=["a6","a7","a8","a9","a10","a11","a12","a13","a14","a15"],
                 Sire=["a1","a1","a1","a3","a3","a3","a3","a1","a1","a1"],
                 Dam=["a2","a2","a2","a4","a4","a4","a4","a5","a5","a5"],
                 Sex=["Male","Female","Female","Female","Male","Female","Female","Male","Female","Male"],
                 Weight=[90.0,70,65,98,106,60,80,100,85,68])
data

The **model** to describe the observations is:

$$
y=Xb+Za+Wc+e
$$
Where

* $y$ is the vector of observations
* $b$  is the vector of fixed effects
* $a$  is the vector of addtive random animal effects
* $c$  is the vector of commom environmental effects
* $e$  is the vector of random residual effect
* $X,Z,W$ are the incidence matrix  

Assumptions:

* the common environmental and residual effects are independent.

Variance components:

* $var(c) = I\sigma^2_{c}$
* $var(e) = I \sigma^2_e =R$
* $var(a) = A \sigma^2_a$
* $var(y) = ZAZ'\sigma^2_a + WI\sigma^2_{c}W'+ R$

**Mixed Model Equations**

$$
\begin{bmatrix}
X'X & X'Z & X'W \\
Z'X & Z'Z + A^{-1}\frac{\sigma^2_e}{\sigma^2_a} & Z'W \\
W'X & W'Z & W'W+I\frac{\sigma^2_e}{\sigma^2_{c}}
\end{bmatrix} = \begin{bmatrix}
X'y \\
Z'y \\
W'y
\end{bmatrix}
$$
## Do it from Scratch

In [1]:
# Calculate A
ped=[
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 1 2
7 1 2
8 1 2
9 3 4
10 3 4
11 3 4
12 3 4
13 1 5
14 1 5
15 1 5
];
s=ped[:,2]
d=ped[:,3]
n = length(s)
s=(s .== 0)*n .+s
d=(d .== 0)*n .+d;
A = zeros(n,n);
for i in 1:n
    A[i,i] = 1 + A[s[i], d[i]]/2
    for j in (i+1):n    
        A[i,j] = ( A[i, s[j]] + A[i, d[j]] ) / 2  
        A[j,i] = A[i,j] 
    end           
end

In [1]:
y=data[:Weight]

X=[0 1
   1 0
   1 0
   1 0
   0 1
   1 0
   1 0
   0 1
   1 0
   0 1]
b=["female","male"]

W=[0 0 1
   0 0 1
   0 0 1
   0 1 0
   0 1 0
   0 1 0
   0 1 0
   1 0 0
   1 0 0
   1 0 0]
c=["ce5","ce4","ce2"]

Z=[zeros(10,5) I]
u=["BV1","BV2","BV3","BV4","BV5","BV6","BV7","BV8",
    "BV9","BV10","BV11","BV12","BV13","BV14","BV15"]



σa2=20
σe2=65
σc2=15
σy2=σa2+σe2+σc2
λ1=σe2/σa2
λ2=σe2/σc2

A_inv=inv(A)

15×15 Array{Float64,2}:
  4.0   1.5  -0.0   0.0   1.5  -1.0  …   0.0  -0.0   0.0  -1.0  -1.0  -1.0
  1.5   2.5  -0.0   0.0   0.0  -1.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0   3.0   2.0   0.0   0.0     -1.0  -1.0  -1.0  -0.0   0.0  -0.0
  0.0   0.0   2.0   3.0   0.0   0.0     -1.0  -1.0  -1.0  -0.0   0.0  -0.0
  1.5   0.0   0.0   0.0   2.5   0.0      0.0  -0.0   0.0  -1.0  -1.0  -1.0
 -1.0  -1.0   0.0   0.0   0.0   2.0  …   0.0  -0.0   0.0  -0.0   0.0  -0.0
 -1.0  -1.0   0.0   0.0   0.0   0.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
 -1.0  -1.0   0.0   0.0   0.0   0.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0      0.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0      2.0  -0.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0  …   0.0   2.0   0.0  -0.0   0.0  -0.0
  0.0   0.0  -1.0  -1.0   0.0   0.0      0.0   0.0   2.0  -0.0   0.0  -0.0
 -1.0   0.0   0.0   0.0  -1.0   0.0      0.0   0.0   0.0   2.0  -0.0  -0.0
 

In [1]:
lhs=[X'X X'Z X'W
     Z'X Z'Z+inv(A)*λ1 Z'W
     W'X W'Z W'W+I*λ2]

20×20 Array{Float64,2}:
 6.0  0.0   0.0     0.0     0.0    0.0   …   0.0   1.0      3.0      2.0    
 0.0  4.0   0.0     0.0     0.0    0.0       1.0   2.0      1.0      1.0    
 0.0  0.0  13.0     4.875   0.0    0.0      -3.25  0.0      0.0      0.0    
 0.0  0.0   4.875   8.125   0.0    0.0       0.0   0.0      0.0      0.0    
 0.0  0.0   0.0     0.0     9.75   6.5       0.0   0.0      0.0      0.0    
 0.0  0.0   0.0     0.0     6.5    9.75  …   0.0   0.0      0.0      0.0    
 0.0  0.0   4.875   0.0     0.0    0.0      -3.25  0.0      0.0      0.0    
 0.0  1.0  -3.25   -3.25    0.0    0.0       0.0   0.0      0.0      1.0    
 1.0  0.0  -3.25   -3.25    0.0    0.0       0.0   0.0      0.0      1.0    
 1.0  0.0  -3.25   -3.25    0.0    0.0       0.0   0.0      0.0      1.0    
 1.0  0.0   0.0     0.0    -3.25  -3.25  …   0.0   0.0      1.0      0.0    
 0.0  1.0   0.0     0.0    -3.25  -3.25      0.0   0.0      1.0      0.0    
 1.0  0.0   0.0     0.0    -3.25  -3.25      0.0   0

In [1]:
rhs=[X'y
     Z'y
     W'y];

In [1]:
mme_res = lhs\rhs

20-element Array{Float64,1}:
 75.7644  
 91.4931  
 -1.44077 
 -1.17488 
  1.44077 
  1.44077 
 -0.265894
 -1.09756 
 -1.66707 
 -2.33373 
  3.92526 
  2.89476 
 -1.14141 
  1.52526 
  0.447871
  0.545031
 -3.8188  
 -0.398841
  2.16116 
 -1.76232 

In [1]:
# correlation between EBV from book and mme
res_book=[75.764, 91.493,
          -1.441, -1.175, 1.441, 1.441, -0.266, -1.098, -1.667, -2.334, 3.925, 2.895,
          -1.141, 1.525, 0.448, 0.545, -3.819,
          -0.399,2.161,-1.762];
cor(res_book,mme_res)

1.0

## JWAS

In [1]:
using JWAS

[pedigree_4_3.txt](https://nextjournal.com/data/QmZ8Sup62oJtLFgHkGwXf5MVRPHFTNNUeXBnBfKi1rwbSn?content-type=text/plain&node-id=652af624-93ea-43b7-8708-6e3e0293e04b&filename=pedigree_4_3.txt&node-kind=file)


In [1]:
pedigree = get_pedigree("/.nextjournal/data-named/QmZ8Sup62oJtLFgHkGwXf5MVRPHFTNNUeXBnBfKi1rwbSn/pedigree_4_3.txt",separator=" ",header=true);

In [1]:
data

In [1]:
model_equation = "Weight = Sex + Piglet + Dam";
model = build_model(model_equation,σe2);
set_random(model,"Dam",σc2);
set_random(model,"Piglet",pedigree,σa2);

In [1]:
out=runMCMC(model,data,chain_length=100_000,output_samples_frequency=1000, burnin = 10_000)

Dict{Any,Any} with 4 entries:
  "Posterior mean of polyg… => [28.3262]
  "EBV_Weight"              => 15×2 DataFrame…
  "Posterior mean of resid… => 176.26
  "Posterior mean of locat… => 20×4 DataFrame…

In [1]:
jwas_res_a=out["EBV_Weight"]

In [1]:
# correlation between EBV from book and JWAS
book_res_a = [0.448, -1.175, -1.441, -3.819, -1.667, 1.525, -0.266, 1.441, 0.545, 1.441, -1.098, 2.895, -1.141, -2.334, 3.925]
cor(jwas_res_a[:,2], book_res_a)

0.994873