<ul class="breadcrumb">
  <li><a href="1_Conventional_Linear_Mixed_Model.ipynb">Bayesian Linear Mixed Models (Conventional)</a></li>
  <li><a href="2_Linear_Additive_Genetic_Model.ipynb">Bayesian Linear Additive Genetic Model</a></li> 
  <li><a href="3_Genomic_Linear_Mixed_Model.ipynb">Bayesian Linear Mixed Models (Genomic Data)</a></li>
</ul>

<div class="span5 alert alert-success">
 <font size="5" face="Georgia">Single-step Bayesian Regression (Incomplete Genomic Data)</font> 
</div>

<button type="button" class="btn btn-lg btn-primary">Step 1: Load Packages</button> 

In [50]:
using JWAS,JWAS.Datasets,DataFrames,CSV, LinearAlgebra

<button type="button" class="btn btn-lg btn-primary">Step 2: Read data</button> 

In [51]:
phenofile  = Datasets.dataset("example","phenotypes_ssbr.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")
genofile   = Datasets.dataset("example","genotypes.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true)
pedigree   = get_pedigree(pedfile,separator=",",header=true);

[32mThe delimiter in pedigree.txt is ','.[39m
Finished!


In [52]:
first(phenotypes,5)

Unnamed: 0_level_0,ID,y1,y2,y3,x1,x2,x3,dam
Unnamed: 0_level_1,String,Float64,Float64,Float64,Float64,Int64,String,String
1,a1,-0.06,3.58,-1.18,0.9,2,m,0
2,a2,-0.6,4.9,0.88,0.3,1,f,0
3,a3,-2.07,3.19,0.73,0.7,2,f,0
4,a4,-2.63,6.97,-0.83,0.6,1,m,a2
5,a5,2.31,3.5,-1.52,0.4,2,m,a2


<div class="span5 alert alert-success">
 <font size="5" face="Georgia">Single-trait Single-step Bayesian Regression (Incomplete Genomic Data)</font> 
</div>

<button type="button" class="btn btn-lg btn-primary">Step 3: Build Model Equations</button> 

In [54]:
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";

In [55]:
R      = 1.0
model1 = build_model(model_equation1,R);

<button type="button" class="btn btn-lg btn-primary">Step 4: Set Factors or Covariates</button> 

In [56]:
set_covariate(model1,"x1");

<button type="button" class="btn btn-lg btn-primary">Step 5: Set Random or Fixed Effects</button> 

In [57]:
G1 = 1.0
G2 = [1.0 0.5
      0.5 1.0]
set_random(model1,"x2",G1);
set_random(model1,"ID dam",pedigree,G2);

<button type="button" class="btn btn-lg btn-primary">Step 6: Use Genomic Information</button> 

In [58]:
G3 =1.0
add_genotypes(model1,genofile,G3,separator=',');

[32mThe delimiter in genotypes.txt is ','.[39m
[32mThe header (marker IDs) is provided in genotypes.txt.[39m
5 markers on 7 individuals were added.


<button type="button" class="btn btn-lg btn-primary">Step 7: Run Analysis</button> 

In [59]:
outputEBV(model1,["a1","a2","a3"]);# without this line, EBV for all genotyped individuals are returned by default
out1=runMCMC(model1,phenotypes,methods="RR-BLUP",single_step_analysis=true,
    pedigree=pedigree,chain_length=5000,output_samples_frequency=100);

[32mChecking phenotypes...[39m
[32mIndividual IDs (strings) are provided in the first column of the phenotypic data.[39m
calculating A inverse
  0.000232 seconds (203 allocations: 16.063 KiB)
imputing missing genotypes
  0.239723 seconds (68 allocations: 7.586 KiB, 99.88% gc time)
completed imputing genotypes

The prior for marker effects variance is calculated from the genetic variance and π.
The mean of the prior for the marker effects variance is: 0.492462



[0m[1mA Linear Mixed Model was build using model equations:[22m

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

[0m[1mModel Information:[22m

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
ϵ               fac

[32mrunning MCMC for RR-BLUP...100%|████████████████████████| Time: 0:00:00[39m




[0m[1mThe version of Julia and Platform in use:[22m

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


[0m[1mThe analysis has finished. Results are saved in the returned [22m[0m[1mvariable and text files. MCMC samples are saved in text files.[22m




<button type="button" class="btn btn-lg btn-primary">Check Results</button> 

In [60]:
keys(out1)

Base.KeySet for a Dict{Any,Any} with 6 entries. Keys:
  "marker effects"
  "EBV_y1"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "marker effects variance"

In [61]:
out1["EBV_y1"]

Unnamed: 0_level_0,ID,EBV,PEV
Unnamed: 0_level_1,Any,Any,Any
1,a1,2.24064,189.065
2,a2,1.51196,60.7762
3,a3,-2.2499,61.7912


In [62]:
out1["marker effects"]

Unnamed: 0_level_0,Trait,Marker_ID,Estimate,Std_Error,Model_Frequency
Unnamed: 0_level_1,Any,Any,Any,Any,Any
1,y1,m1,-0.054418,0.765134,1.0
2,y1,m2,-0.142892,0.681995,1.0
3,y1,m3,0.260093,0.656865,1.0
4,y1,m4,-0.10688,0.553649,1.0
5,y1,m5,0.0157241,0.563027,1.0


<div class="span5 alert alert-success">
 <font size="5" face="Georgia">Multi-trait Single-step Bayesian Regression (Incomplete Genomic Data)</font> 
</div>

<button type="button" class="btn btn-lg btn-primary">Step 3: Build Model Equations</button> 

In [63]:
model_equation2 ="y1 = intercept + x1 + x3 + ID + dam
                  y2 = intercept + x1 + x2 + x3 + ID
                  y3 = intercept + x1 + x1*x3 + x2 + ID";

In [64]:
R      = [1.0 0.5 0.5
          0.5 1.0 0.5
          0.5 0.5 1.0]
model2 = build_model(model_equation2,R);

<button type="button" class="btn btn-lg btn-primary">Step 4: Set Factors or Covariates</button> 

In [65]:
set_covariate(model2,"x1");

<button type="button" class="btn btn-lg btn-primary">Step 5: Set Random or Fixed Effects</button> 

In [66]:
G1 = [1.0 0.5
      0.5 1.0]
G2 = [1.0 0.5 0.5 0.5
      0.5 1.0 0.5 0.5
      0.5 0.5 1.0 0.5
      0.5 0.5 0.5 1.0]
set_random(model2,"x2",G1);
set_random(model2,"ID dam",pedigree,G2);

[32mx2 is not found in model equation 1.[39m
[32mdam is not found in model equation 2.[39m
[32mdam is not found in model equation 3.[39m


<button type="button" class="btn btn-lg btn-primary">Step 6: Use Genomic Information</button> 

In [67]:
G3 = [1.0 0.5 0.5
      0.5 1.0 0.5
      0.5 0.5 1.0]
add_genotypes(model2,genofile,G3,separator=',');

[32mThe delimiter in genotypes.txt is ','.[39m
[32mThe header (marker IDs) is provided in genotypes.txt.[39m
5 markers on 7 individuals were added.


<button type="button" class="btn btn-lg btn-primary">Step 7: Run Analysis</button> 

In [68]:
outputEBV(model1,["a1","a2","a3"]);# without this line, EBV for all genotyped individuals are returned by default
out2=runMCMC(model2,phenotypes,methods="BayesC",estimatePi=true,single_step_analysis=true,pedigree=pedigree,chain_length=5000,output_samples_frequency=100);

[32mChecking phenotypes...[39m
[32mIndividual IDs (strings) are provided in the first column of the phenotypic data.[39m
calculating A inverse
  0.000048 seconds (203 allocations: 16.063 KiB)
imputing missing genotypes
  0.152350 seconds (68 allocations: 7.586 KiB, 99.93% gc time)
completed imputing genotypes

[0mPi (Π) is not provided.
[0mPi (Π) is generated assuming all markers have effects on all traits.

The prior for marker effects covariance matrix is calculated from genetic covariance matrix and Π.
The mean of the prior for the marker effects covariance matrix is:
 0.492462  0.246231  0.246231
 0.246231  0.492462  0.246231
 0.246231  0.246231  0.492462



[0m[1mA Linear Mixed Model was build using model equations:[22m

y1 = intercept + x1 + x3 + ID + dam
y2 = intercept + x1 + x2 + x3 + ID
y3 = intercept + x1 + x1*x3 + x2 + ID

[0m[1mModel Information:[22m

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1      

[32mrunning MCMC for BayesC...100%|█████████████████████████| Time: 0:00:03[39m




[0m[1mThe version of Julia and Platform in use:[22m

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.6.0)
  CPU: Intel(R) Core(TM) i7-8559U CPU @ 2.70GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)


[0m[1mThe analysis has finished. Results are saved in the returned [22m[0m[1mvariable and text files. MCMC samples are saved in text files.[22m




<button type="button" class="btn btn-lg btn-primary">Check Results</button> 

In [69]:
keys(out2)

Base.KeySet for a Dict{Any,Any} with 9 entries. Keys:
  "marker effects"
  "EBV_y2"
  "EBV_y1"
  "Pi"
  "location parameters"
  "residual variance"
  "polygenic effects covariance matrix"
  "EBV_y3"
  "marker effects variance"

In [70]:
out1["location parameters"]

Unnamed: 0_level_0,Trait,Effect,Level,Estimate,Std_Error
Unnamed: 0_level_1,Any,Any,Any,Any,Any
1,y1,intercept,intercept,-7.72138,13.4495
2,y1,x1*x3,x1 * m,0.395969,8.11195
3,y1,x1*x3,x1 * f,0.509506,0.922909
4,y1,x2,2,0.039693,1.15179
5,y1,x2,1,-0.0514321,0.905896
6,y1,x3,m,5.55972,15.6429
7,y1,x3,f,6.72726,11.5743
8,y1,ID,a2,0.954273,2.28606
9,y1,ID,a12,2.32418,3.58933
10,y1,ID,a10,1.76746,3.90921
