# Introduction to JWAS

* JWAS is a package for Bayesian mixed linear model analyses

* Can accommodate: 

    * breeding value models (animal models)
    * maternal effects
    * multiple traits
    * SNP effects
    
* Non-SNP random effects are assumed to be normallly distributed  
* SNP effects can be normal (RR-BLUP), t-distributed (BayesA), mixed (BayesB and BayesC), or Laplace (Bayesian LASSO) 
* Inference based on MCMC samples 


## Non-SNP part of the model: examples 

### Single-trait models 

### Data

In [1]:
using CSV, DataFrames, Statistics
data1 = CSV.read("singleTraitEx.phen")

Unnamed: 0_level_0,Line,Age,Height
Unnamed: 0_level_1,Int64⍰,Int64⍰,Float64⍰
1,1,20,20.0
2,1,19,21.1
3,2,21,15.9
4,2,20,13.7
5,3,18,18.4
6,3,20,22.0


### One-way model 

In [2]:
using JWAS, JWAS.Datasets

In [3]:
varRes = 10.0
model1 = build_model("Height = intercept + Line", varRes);

In [4]:
?runMCMC

search: [0m[1mr[22m[0m[1mu[22m[0m[1mn[22m[0m[1mM[22m[0m[1mC[22m[0m[1mM[22m[0m[1mC[22m



```
runMCMC(mme,df;Pi=0.0,estimatePi=false,chain_length=1000,burnin = 0,starting_value=false,printout_frequency=100,
missing_phenotypes=false,constraint=false,methods="conventional (no markers)",output_samples_frequency::Int64 = 0,
printout_model_info=true,outputEBV=false)
```

Run MCMC (marker information included or not) with sampling of variance components.

  * available **methods** include "conventional (no markers)", "RR-BLUP", "BayesB", "BayesC".
  * save MCMC samples every **output*samples*frequency** iterations to files **output_file** defaulting to `MCMC_samples`.
  * the first **burnin** iterations are discarded at the beginning of an MCMC run
  * **Pi** for single-trait analyses is a number; **Pi** for multi-trait analyses is a dictionary such as `Pi=Dict([1.0; 1.0]=>0.7,[1.0; 0.0]=>0.1,[0.0; 1.0]=>0.1,[0.0; 0.0]=>0.1)`,

      * if Pi (Π) is not provided in multi-trait analysis, it will be generated assuming all markers have effects on all traits.
  * **starting_value** can be provided as a vector for all location parameteres except marker effects.
  * print out the monte carlo mean in REPL with **printout_frequency**
  * **constraint**=true if constrain residual covariances between traits to be zeros.
  * Individual EBVs are returned if **outputEBV**=true.


In [5]:
runMCMC(model1,data1,chain_length=500,output_samples_frequency=100);

A Linear Mixed Model was build using model equations:

Height = intercept + Line

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
Line            factor       fixed                3

MCMC Information:

methods                        conventional (no markers)
chain_length                                    500
burnin                                            0
starting_value                                false
printout_frequency                              501
output_samples_frequency                        100
constraint                                    false
missing_phenotypes                            false
update_priors_frequency                           0

Hyper-parameters Information: 

residual variances:                          10.000

Degree of freedom for hyper-parameters:

residual variances:                           4.000
iid random effect variances:                  4.000



[32mThe file M

[32mrunning MCMC for conventional (no markers)...100%|██████| Time: 0:00:01[39m


In [6]:
model1.X

6×4 SparseArrays.SparseMatrixCSC{Float64,Int64} with 12 stored entries:
  [1, 1]  =  1.0
  [2, 1]  =  1.0
  [3, 1]  =  1.0
  [4, 1]  =  1.0
  [5, 1]  =  1.0
  [6, 1]  =  1.0
  [1, 2]  =  1.0
  [2, 2]  =  1.0
  [3, 3]  =  1.0
  [4, 3]  =  1.0
  [5, 4]  =  1.0
  [6, 4]  =  1.0

In [7]:
Matrix(model1.X)

6×4 Array{Float64,2}:
 1.0  1.0  0.0  0.0
 1.0  1.0  0.0  0.0
 1.0  0.0  1.0  0.0
 1.0  0.0  1.0  0.0
 1.0  0.0  0.0  1.0
 1.0  0.0  0.0  1.0

In [8]:
JWAS.getNames(model1)

4-element Array{AbstractString,1}:
 "1:intercept : intercept"
 "1:Line : 1"             
 "1:Line : 2"             
 "1:Line : 3"             

### One-way model without intercept

In [9]:
varRes = 10.0
model1 = build_model("Height = Line", varRes);

In [10]:
runMCMC(model1,data1,chain_length=500,output_samples_frequency=0);

A Linear Mixed Model was build using model equations:

Height = Line

Model Information:

Term            C/F          F/R            nLevels
Line            factor       fixed                3

MCMC Information:

methods                        conventional (no markers)
chain_length                                    500
burnin                                            0
starting_value                                false
printout_frequency                              501
output_samples_frequency                          0
constraint                                    false
missing_phenotypes                            false
update_priors_frequency                           0

Hyper-parameters Information: 

residual variances:                          10.000

Degree of freedom for hyper-parameters:

residual variances:                           4.000
iid random effect variances:                  4.000





In [11]:
JWAS.getNames(model1)

3-element Array{AbstractString,1}:
 "1:Line : 1"
 "1:Line : 2"
 "1:Line : 3"

In [12]:
Matrix(model1.X)

6×3 Array{Float64,2}:
 1.0  0.0  0.0
 1.0  0.0  0.0
 0.0  1.0  0.0
 0.0  1.0  0.0
 0.0  0.0  1.0
 0.0  0.0  1.0

### Model with covariate

In [13]:
varRes = 10.0
model1 = build_model("Height = intercept + Line + Age", varRes)
set_covariate(model1,"Age");

In [14]:
runMCMC(model1,data1,chain_length=500,output_samples_frequency=0);

A Linear Mixed Model was build using model equations:

Height = intercept + Line + Age

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
Line            factor       fixed                3
Age             covariate    fixed                1

MCMC Information:

methods                        conventional (no markers)
chain_length                                    500
burnin                                            0
starting_value                                false
printout_frequency                              501
output_samples_frequency                          0
constraint                                    false
missing_phenotypes                            false
update_priors_frequency                           0

Hyper-parameters Information: 

residual variances:                          10.000

Degree of freedom for hyper-parameters:

residual variances:                           4.000
iid random e

#### This model has a common slope

In [15]:
JWAS.getNames(model1)

5-element Array{AbstractString,1}:
 "1:intercept : intercept"
 "1:Line : 1"             
 "1:Line : 2"             
 "1:Line : 3"             
 "1:Age : Age"            

In [16]:
Matrix(model1.X)

6×5 Array{Float64,2}:
 1.0  1.0  0.0  0.0  20.0
 1.0  1.0  0.0  0.0  19.0
 1.0  0.0  1.0  0.0  21.0
 1.0  0.0  1.0  0.0  20.0
 1.0  0.0  0.0  1.0  18.0
 1.0  0.0  0.0  1.0  20.0

### Model with Age by Line

In [17]:
varRes = 10.0
model1 = build_model("Height = intercept + Line + Line*Age", varRes)
set_covariate(model1,"Age");

In [18]:
runMCMC(model1,data1,chain_length=500,output_samples_frequency=0);

A Linear Mixed Model was build using model equations:

Height = intercept + Line + Line*Age

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
Line            factor       fixed                3
Line*Age        interaction  fixed                3

MCMC Information:

methods                        conventional (no markers)
chain_length                                    500
burnin                                            0
starting_value                                false
printout_frequency                              501
output_samples_frequency                          0
constraint                                    false
missing_phenotypes                            false
update_priors_frequency                           0

Hyper-parameters Information: 

residual variances:                          10.000

Degree of freedom for hyper-parameters:

residual variances:                           4.000
iid ran

#### This model has line-specific slopes

In [19]:
JWAS.getNames(model1)

7-element Array{AbstractString,1}:
 "1:intercept : intercept"
 "1:Line : 1"             
 "1:Line : 2"             
 "1:Line : 3"             
 "1:Line*Age : 1 * Age"   
 "1:Line*Age : 2 * Age"   
 "1:Line*Age : 3 * Age"   

In [20]:
Matrix(model1.X)

6×7 Array{Float64,2}:
 1.0  1.0  0.0  0.0  20.0   0.0   0.0
 1.0  1.0  0.0  0.0  19.0   0.0   0.0
 1.0  0.0  1.0  0.0   0.0  21.0   0.0
 1.0  0.0  1.0  0.0   0.0  20.0   0.0
 1.0  0.0  0.0  1.0   0.0   0.0  18.0
 1.0  0.0  0.0  1.0   0.0   0.0  20.0

#### Model with Age by Line and common intercept

In [21]:
varRes = 10.0
model1 = build_model("Height = intercept + Line*Age", varRes)
set_covariate(model1,"Age");

In [22]:
runMCMC(model1,data1,chain_length=500,output_samples_frequency=0);

A Linear Mixed Model was build using model equations:

Height = intercept + Line*Age

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
Line*Age        interaction  fixed                3

MCMC Information:

methods                        conventional (no markers)
chain_length                                    500
burnin                                            0
starting_value                                false
printout_frequency                              501
output_samples_frequency                          0
constraint                                    false
missing_phenotypes                            false
update_priors_frequency                           0

Hyper-parameters Information: 

residual variances:                          10.000

Degree of freedom for hyper-parameters:

residual variances:                           4.000
iid random effect variances:                  4.000





#### This model has common intercept and line-specific slopes

In [23]:
JWAS.getNames(model1)

4-element Array{AbstractString,1}:
 "1:intercept : intercept"
 "1:Line*Age : 1 * Age"   
 "1:Line*Age : 2 * Age"   
 "1:Line*Age : 3 * Age"   

In [24]:
Matrix(model1.X)

6×4 Array{Float64,2}:
 1.0  20.0   0.0   0.0
 1.0  19.0   0.0   0.0
 1.0   0.0  21.0   0.0
 1.0   0.0  20.0   0.0
 1.0   0.0   0.0  18.0
 1.0   0.0   0.0  20.0

## Two-trait models

### Data

In [25]:
data2 = CSV.read("twoTraitEx.phen")

Unnamed: 0_level_0,Line,Age,y1,y2
Unnamed: 0_level_1,Int64⍰,Int64⍰,Float64⍰,Float64⍰
1,1,20,20.0,6.2
2,1,19,21.1,5.9
3,2,21,15.9,10.0
4,2,20,13.7,8.2
5,3,18,18.4,9.6
6,3,20,22.0,11.0


In [26]:
data2[3:4]

Unnamed: 0_level_0,y1,y2
Unnamed: 0_level_1,Float64⍰,Float64⍰
1,20.0,6.2
2,21.1,5.9
3,15.9,10.0
4,13.7,8.2
5,18.4,9.6
6,22.0,11.0


In [27]:
Matrix(data2[3:4])

6×2 Array{Union{Missing, Float64},2}:
 20.0   6.2
 21.1   5.9
 15.9  10.0
 13.7   8.2
 18.4   9.6
 22.0  11.0

In [28]:
cov(Matrix(data2[3:4]))

2×2 Array{Float64,2}:
 10.2137    -0.805667
 -0.805667   4.36967 

### Two-trait model Age by Line for trait 1

In [29]:
modelEquation = "y1 = intercept + Line + Line*Age
                 y2 = intercept + Line"
varRes = [10.0 -1.0
     -1.0  5.0]
model2 = build_model(modelEquation,varRes)
set_covariate(model2,"Age");

In [30]:
out = runMCMC(model2,data2,chain_length=500,output_samples_frequency=0);

A Linear Mixed Model was build using model equations:

y1 = intercept + Line + Line*Age
y2 = intercept + Line

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
Line            factor       fixed                3
Line*Age        interaction  fixed                3

MCMC Information:

methods                        conventional (no markers)
chain_length                                    500
burnin                                            0
starting_value                                false
printout_frequency                              501
output_samples_frequency                          0
constraint                                    false
missing_phenotypes                            false
update_priors_frequency                           0

Hyper-parameters Information: 

residual variances:           
 10.0  -1.0
 -1.0   5.0

Degree of freedom for hyper-parameters:

residual variances:                   

[32mrunning MCMC for conventional (no markers)...  0%|      |  ETA: 0:05:12[39m[32mrunning MCMC for conventional (no markers)...100%|██████| Time: 0:00:01[39m


In [31]:
JWAS.getNames(model2)

11-element Array{AbstractString,1}:
 "1:intercept : intercept"
 "1:Line : 1"             
 "1:Line : 2"             
 "1:Line : 3"             
 "1:Line*Age : 1 * Age"   
 "1:Line*Age : 2 * Age"   
 "1:Line*Age : 3 * Age"   
 "2:intercept : intercept"
 "2:Line : 1"             
 "2:Line : 2"             
 "2:Line : 3"             

In [32]:
Matrix(model2.X)

12×11 Array{Float64,2}:
 1.0  1.0  0.0  0.0  20.0   0.0   0.0  0.0  0.0  0.0  0.0
 1.0  1.0  0.0  0.0  19.0   0.0   0.0  0.0  0.0  0.0  0.0
 1.0  0.0  1.0  0.0   0.0  21.0   0.0  0.0  0.0  0.0  0.0
 1.0  0.0  1.0  0.0   0.0  20.0   0.0  0.0  0.0  0.0  0.0
 1.0  0.0  0.0  1.0   0.0   0.0  18.0  0.0  0.0  0.0  0.0
 1.0  0.0  0.0  1.0   0.0   0.0  20.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0   0.0   0.0   0.0  1.0  1.0  0.0  0.0
 0.0  0.0  0.0  0.0   0.0   0.0   0.0  1.0  1.0  0.0  0.0
 0.0  0.0  0.0  0.0   0.0   0.0   0.0  1.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0   0.0   0.0   0.0  1.0  0.0  1.0  0.0
 0.0  0.0  0.0  0.0   0.0   0.0   0.0  1.0  0.0  0.0  1.0
 0.0  0.0  0.0  0.0   0.0   0.0   0.0  1.0  0.0  0.0  1.0

### Breeding value models

In [33]:
data3 = CSV.read("twoTraitMaternal.phen")

Unnamed: 0_level_0,Ind,Mat,y1,y2
Unnamed: 0_level_1,Int64⍰,Int64⍰,Float64⍰,Float64⍰
1,1,0,10.0,11.0
2,2,0,9.7,12.0
3,3,2,8.9,11.9
4,4,2,9.7,10.8
5,5,4,8.8,11.9


In [34]:
pedigree   = get_pedigree("pedFile",separator=",",header=false);

Finished!


[32mcalculating inbreeding...  20%|█████                    |  ETA: 0:00:01[39m[32mcalculating inbreeding... 100%|█████████████████████████| Time: 0:00:00[39m


In [35]:
varRes = cov(Matrix(data3[2:3]))

2×2 Array{Float64,2}:
  2.8   -0.74 
 -0.74   0.287

In [36]:
varGen = [1.0 0.0 0.0
          0.0 0.1 0.0
          0.0 0.0 0.2];

In [37]:
modelEq3 = "y1 = intercept + Ind + Mat
            y2 = intercept + Ind"
model3   = build_model(modelEq3,varRes)
set_random(model3,"Ind Mat",pedigree,varGen)

[31mMat is not found in model equation 2.[39m


Can see below how "Ind" and "Mat" are ordered

In [38]:
model3.pedTrmVec

3-element Array{Any,1}:
 "1:Ind"
 "2:Ind"
 "1:Mat"

In [39]:
 out3 = runMCMC(model3,data3,chain_length=500,output_samples_frequency=0);

A Linear Mixed Model was build using model equations:

y1 = intercept + Ind + Mat
y2 = intercept + Ind

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
Ind             factor       random               5
Mat             factor       random               5

MCMC Information:

methods                        conventional (no markers)
chain_length                                    500
burnin                                            0
starting_value                                false
printout_frequency                              501
output_samples_frequency                          0
constraint                                    false
missing_phenotypes                            false
update_priors_frequency                           0

Hyper-parameters Information: 

residual variances:           
  2.8   -0.74 
 -0.74   0.287
genetic variances (polygenic):
 1.0  0.0  0.0
 0.0  0.1  0.0
 0.0  0.0  0.2

Degr

[32mrunning MCMC for conventional (no markers)...100%|██████| Time: 0:00:00[39m


In [40]:
JWAS.getNames(model3)

17-element Array{AbstractString,1}:
 "1:intercept : intercept"
 "1:Ind : 1"              
 "1:Ind : 2"              
 "1:Ind : 4"              
 "1:Ind : 3"              
 "1:Ind : 5"              
 "1:Mat : 1"              
 "1:Mat : 2"              
 "1:Mat : 4"              
 "1:Mat : 3"              
 "1:Mat : 5"              
 "2:intercept : intercept"
 "2:Ind : 1"              
 "2:Ind : 2"              
 "2:Ind : 4"              
 "2:Ind : 3"              
 "2:Ind : 5"              

In [41]:
out3[res[2]]

UndefVarError: UndefVarError: res not defined

<div class="span5 alert alert-success">
 <font size="5" face="Georgia">Univariate Linear Mixed Model (Genomic data)</font> 
</div>

In [42]:
phenofile  = Datasets.dataset("example","phenotypes.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")
genofile   = Datasets.dataset("example","genotypes.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true)
pedigree   = get_pedigree(pedfile,separator=",",header=true);

Finished!


In [43]:
head(phenotypes)

│   caller = top-level scope at In[43]:1
└ @ Core In[43]:1


Unnamed: 0_level_0,ID,y1,y2,y3,x1,x2,x3,dam
Unnamed: 0_level_1,String⍰,Float64⍰,Float64⍰,Float64⍰,Float64⍰,Int64⍰,String⍰,String⍰
1,a1,-0.06,3.58,-1.18,0.9,2,m,0
2,a3,-2.07,3.19,0.73,0.7,2,f,0
3,a4,-2.63,6.97,-0.83,0.6,1,m,a2
4,a5,2.31,3.5,-1.52,0.4,2,m,a2
5,a6,0.93,4.87,-0.01,5.0,2,f,a3


In [44]:
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";

In [45]:
R      = 1.0
model1 = build_model(model_equation1,R);

In [46]:
set_covariate(model1,"x1");

In [47]:
G1 = 1.0
G2 = [1.0 0.5
      0.5 1.0]
set_random(model1,"x2",G1);
set_random(model1,"ID dam",pedigree,G2);

#### Adding SNP effects to the model

In [48]:
G3 =1.0
add_genotypes(model1,genofile,G3,separator=',');

│   caller = ip:0x0
└ @ Core :-1


5 markers on 7 individuals were added.


Can ask for samples of non-SNP effects:

In [49]:
outputMCMCsamples(model1,"x2")

In [50]:
out1=runMCMC(model1,phenotypes,methods="BayesC",estimatePi=true,chain_length=5000,output_samples_frequency=100);


The prior for marker effects variance is calculated from the genetic variance and π. The mean of the prior for the marker effects variance is: 0.492462


A Linear Mixed Model was build using model equations:

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12

MCMC Information:

methods                                      BayesC
chain_length                                   5000
burnin                                            0
estimatePi                                     true
starting_value                                false
printout_frequency                             5001
output_samples_frequency     

[32mrunning MCMC for BayesC...100%|█████████████████████████| Time: 0:00:01[39m


<button type="button" class="btn btn-lg btn-primary">Check Results</button> 

In [51]:
keys(out1)

Base.KeySet for a Dict{Any,Any} with 7 entries. Keys:
  "Posterior mean of polygenic effects covariance matrix"
  "EBV_y1"
  "Posterior mean of marker effects"
  "Posterior mean of residual variance"
  "Posterior mean of marker effects variance"
  "Posterior mean of location parameters"
  "Posterior mean of Pi"

In [52]:
out1["Posterior mean of Pi"]

0.47192420369315996