<ul class="breadcrumb">
  <li><a href="1_Conventional_Linear_Mixed_Model.ipynb">Bayesian Linear Mixed Models (Conventional)</a></li>
  <li><a href="2_Linear_Additive_Genetic_Model.ipynb">Bayesian Linear Additive Genetic Model</a></li> 
  <li><a href="3_Genomic_Linear_Mixed_Model.ipynb">Bayesian Linear Mixed Models (Genomic Data)</a></li>
</ul>

<div class="span5 alert alert-success">
 <font size="5" face="Georgia">Single-step Bayesian Regression (Incomplete Genomic Data)</font> 
</div>

In [1]:
include("/home/ubuntu/work/Github/JWAS.jl/src/JWAS.jl")

JWAS

<button type="button" class="btn btn-lg btn-primary">Step 1: Load Packages</button> 

In [2]:
using JWAS,JWAS.Datasets,DataFrames,CSV

<button type="button" class="btn btn-lg btn-primary">Step 2: Read data</button> 

In [3]:
phenofile  = Datasets.dataset("example","phenotypes_ssbr.txt")
pedfile    = Datasets.dataset("example","pedigree.txt")
genofile   = Datasets.dataset("example","genotypes.txt")

phenotypes = CSV.read(phenofile,delim = ',',header=true)
pedigree   = get_pedigree(pedfile,separator=",",header=true);

[32mcoding pedigree...   8%|███                             |  ETA: 0:00:01[39m

Finished!


[32mcoding pedigree... 100%|████████████████████████████████| Time: 0:00:00[39m


In [4]:
head(phenotypes)

Unnamed: 0,ID,y1,y2,y3,x1,x2,x3,dam
1,a1,-0.06,3.58,-1.18,0.9,2,m,0
2,a2,-0.6,4.9,0.88,0.3,1,f,0
3,a3,-2.07,3.19,0.73,0.7,2,f,0
4,a4,-2.63,6.97,-0.83,0.6,1,m,a2
5,a5,2.31,3.5,-1.52,0.4,2,m,a2
6,a6,0.93,4.87,-0.01,5.0,2,f,a3


<div class="span5 alert alert-success">
 <font size="5" face="Georgia">Single-trait Single-step Bayesian Regression (Incomplete Genomic Data)</font> 
</div>

<button type="button" class="btn btn-lg btn-primary">Step 3: Build Model Equations</button> 

In [5]:
model_equation1  ="y1 = intercept + x1*x3 + x2 + x3 + ID + dam";

In [6]:
R      = 1.0
model1 = build_model(model_equation1,R);

<button type="button" class="btn btn-lg btn-primary">Step 4: Set Factors or Covariates</button> 

In [7]:
set_covariate(model1,"x1");

<button type="button" class="btn btn-lg btn-primary">Step 5: Set Random or Fixed Effects</button> 

In [8]:
G1 = 1.0
G2 = eye(2)
set_random(model1,"x2",G1);
set_random(model1,"ID dam",pedigree,G2);

<button type="button" class="btn btn-lg btn-primary">Step 6: Use Genomic Information</button> 

In [9]:
G3 =1.0
add_genotypes(model1,genofile,G3,separator=',');

5 markers on 7 individuals were added.


In [10]:
JWAS.outputEBV(model1,["a1","a2","a3"]);

Estimated breeding values and prediction error variances will be included in the output.


<button type="button" class="btn btn-lg btn-primary">Step 7: Run Analysis</button> 

In [11]:
outputMCMCsamples(model1,"x2")
out1=runMCMC(model1,phenotypes,methods="BayesC",estimatePi=true,single_step_analysis=true,pedigree=pedigree,chain_length=5000,output_samples_frequency=100);


The prior for marker effects variance is calculated from 
the genetic variance and π. The prior for the marker effects variance 
is: 0.492462



A Linear Mixed Model was build using model equations:

y1 = intercept + x1*x3 + x2 + x3 + ID + dam

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1*x3           interaction  fixed                2
x2              factor       random               2
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
ϵ               factor       random               5
J               covariate    fixed                1

MCMC Information:

methods                                      BayesC
chain_length                                   5000
burnin                                            0
estimatePi                                     true
starting_value                        

[32mrunning MCMC for BayesC...100%|█████████████████████████| Time: 0:00:01[39m


<button type="button" class="btn btn-lg btn-primary">Check Results</button> 

In [12]:
out1["Posterior mean of Pi"]

0.48397016611142374

In [36]:
res=out1["Posterior mean of location parameters"]

37×2 Array{Any,2}:
 "1:intercept : intercept"  -12.8387   
 "1:x1*x3 : x1 * m"          -3.79852  
 "1:x1*x3 : x1 * f"           0.801257 
 "1:x2 : 2"                   0.226994 
 "1:x2 : 1"                  -0.255599 
 "1:x3 : m"                  20.0851   
 "1:x3 : f"                  15.308    
 "1:ID : a12"                 0.254401 
 "1:ID : a10"                -0.125304 
 "1:ID : a11"                -0.059143 
 "1:ID : a2"                 -0.0848801
 "1:ID : a9"                  0.0997279
 "1:ID : a6"                  0.318544 
 ⋮                                     
 "1:dam : a7"                -0.944983 
 "1:dam : a3"                -0.649777 
 "1:dam : a8"                -0.269806 
 "1:dam : a1"                -0.165287 
 "1:dam : a5"                 0.161601 
 "1:dam : a4"                -0.148299 
 "1:ϵ : a12"                  0.0899715
 "1:ϵ : a10"                 -0.392067 
 "1:ϵ : a11"                 -0.119267 
 "1:ϵ : a2"                  -0.0337624
 "1:ϵ : a9"          

In [38]:
convert(DataFrame,res)

Unnamed: 0,x1,x2
1,1:intercept : intercept,-12.8387
2,1:x1*x3 : x1 * m,-3.79852
3,1:x1*x3 : x1 * f,0.801257
4,1:x2 : 2,0.226994
5,1:x2 : 1,-0.255599
6,1:x3 : m,20.0851
7,1:x3 : f,15.308
8,1:ID : a12,0.254401
9,1:ID : a10,-0.125304
10,1:ID : a11,-0.059143


<div class="span5 alert alert-success">
 <font size="5" face="Georgia">Multi-trait Single-step Bayesian Regression (Incomplete Genomic Data)</font> 
</div>

<button type="button" class="btn btn-lg btn-primary">Step 3: Build Model Equations</button> 

In [15]:
model_equation2 ="y1 = intercept + x1 + x3 + ID + dam
                  y2 = intercept + x1 + x2 + x3 + ID
                  y3 = intercept + x1 + x1*x3 + x2 + ID";

In [16]:
R      = eye(3)
model2 = build_model(model_equation2,R);

<button type="button" class="btn btn-lg btn-primary">Step 4: Set Factors or Covariates</button> 

In [17]:
set_covariate(model2,"x1");

<button type="button" class="btn btn-lg btn-primary">Step 5: Set Random or Fixed Effects</button> 

In [18]:
G1 = eye(2)
G2 = eye(4)
set_random(model2,"x2",G1);
set_random(model2,"ID dam",pedigree,G2);

[1m[36mINFO: [39m[22m[36mx2 is not found in model equation 1.
[39m[1m[36mINFO: [39m[22m[36mdam is not found in model equation 2.
[39m[1m[36mINFO: [39m[22m[36mdam is not found in model equation 3.
[39m

<button type="button" class="btn btn-lg btn-primary">Step 6: Use Genomic Information</button> 

In [19]:
G3 = eye(3)
add_genotypes(model2,genofile,G3,separator=',');

5 markers on 7 individuals were added.


<button type="button" class="btn btn-lg btn-primary">Step 7: Run Analysis</button> 

In [20]:
outputMCMCsamples(model2,"x2")
out2=runMCMC(model2,phenotypes,methods="BayesC",estimatePi=true,single_step_analysis=true,pedigree=pedigree,chain_length=5000,output_samples_frequency=100);

[1m[36mINFO: [39m[22m[36mPi (Π) is not provided.
[39m[1m[36mINFO: [39m[22m[36mPi (Π) is generated assuming all markers have effects on all traits.
[39m


The prior for marker effects covariance matrix is calculated from 
genetic covariance matrix and Π. The prior for the marker effects 
covariance matrix is: 

 0.492462  0.0       0.0     
 0.0       0.492462  0.0     
 0.0       0.0       0.492462


A Linear Mixed Model was build using model equations:

y1 = intercept + x1 + x3 + ID + dam
y2 = intercept + x1 + x2 + x3 + ID
y3 = intercept + x1 + x1*x3 + x2 + ID

Model Information:

Term            C/F          F/R            nLevels
intercept       factor       fixed                1
x1              covariate    fixed                1
x3              factor       fixed                2
ID              factor       random              12
dam             factor       random              12
x2              factor       random               2
x1*x3           interaction  fixed                2
ϵ               factor       random               5
J               covariate    fixed                1

MCMC Information:

methods                 

[32mrunning MCMC for BayesC...100%|█████████████████████████| Time: 0:00:05[39m


<button type="button" class="btn btn-lg btn-primary">Check Results</button> 

In [21]:
keys(out2)

Base.KeyIterator for a Dict{Any,Any} with 10 entries. Keys:
  "Posterior mean of polygenic effects covariance matrix"
  "Model frequency"
  "Posterior mean of residual covariance matrix"
  "Posterior mean of marker effects"
  "Posterior mean of marker effects covariance matrix"
  "EBV_y1"
  "EBV_y2"
  "EBV_y3"
  "Posterior mean of location parameters"
  "Posterior mean of Pi"

In [22]:
out2["Posterior mean of Pi"]

Dict{Array{Float64,1},Float64} with 8 entries:
  [1.0, 0.0, 1.0] => 0.123901
  [0.0, 0.0, 1.0] => 0.128251
  [0.0, 1.0, 1.0] => 0.119375
  [1.0, 1.0, 0.0] => 0.131305
  [0.0, 0.0, 0.0] => 0.125698
  [0.0, 1.0, 0.0] => 0.121291
  [1.0, 0.0, 0.0] => 0.129121
  [1.0, 1.0, 1.0] => 0.121058

In [23]:
a=randn(10)

10-element Array{Float64,1}:
  0.732419 
 -0.205542 
  0.424181 
 -0.0536685
  0.385955 
  1.7495   
 -1.56113  
 -0.0282286
  1.65375  
 -0.544697 

In [24]:
b=a

10-element Array{Float64,1}:
  0.732419 
 -0.205542 
  0.424181 
 -0.0536685
  0.385955 
  1.7495   
 -1.56113  
 -0.0282286
  1.65375  
 -0.544697 

In [143]:
map(Float64,res[:,2])

82-element Array{Float64,1}:
  13.4964   
   0.712683 
  -8.0825   
 -10.0853   
   0.221651 
  -0.037391 
   0.0561244
  -0.0985782
   0.0754944
   0.180056 
  -0.0176215
   0.0858479
  -0.0476729
   ⋮        
   0.317737 
  -0.0382785
   0.230667 
  -0.335866 
   0.0101171
   0.208554 
   0.0624572
   0.0964736
  -0.0533304
   6.35545  
  -0.830931 
   1.05116  

In [97]:
res=out2["Posterior mean of location parameters"]

82×2 Array{Any,2}:
 "1:intercept : intercept"   13.4964   
 "1:x1 : x1"                  0.712683 
 "1:x3 : m"                  -8.0825   
 "1:x3 : f"                 -10.0853   
 "1:ID : a12"                 0.221651 
 "1:ID : a10"                -0.037391 
 "1:ID : a11"                 0.0561244
 "1:ID : a2"                 -0.0985782
 "1:ID : a9"                  0.0754944
 "1:ID : a6"                  0.180056 
 "1:ID : a7"                 -0.0176215
 "1:ID : a3"                  0.0858479
 "1:ID : a8"                 -0.0476729
 ⋮                                     
 "2:ϵ : a10"                  0.317737 
 "2:ϵ : a11"                 -0.0382785
 "2:ϵ : a2"                   0.230667 
 "2:ϵ : a9"                  -0.335866 
 "3:ϵ : a12"                  0.0101171
 "3:ϵ : a10"                  0.208554 
 "3:ϵ : a11"                  0.0624572
 "3:ϵ : a2"                   0.0964736
 "3:ϵ : a9"                  -0.0533304
 "1:J : J"                    6.35545  
 "2:J : J"           

In [141]:
?DataFrame

search: [1mD[22m[1ma[22m[1mt[22m[1ma[22m[1mF[22m[1mr[22m[1ma[22m[1mm[22m[1me[22m [1mD[22m[1ma[22m[1mt[22m[1ma[22m[1mF[22m[1mr[22m[1ma[22m[1mm[22m[1me[22ms [1mD[22m[1ma[22m[1mt[22m[1ma[22m[1mF[22m[1mr[22m[1ma[22m[1mm[22m[1me[22mRow Sub[1mD[22m[1ma[22m[1mt[22m[1ma[22m[1mF[22m[1mr[22m[1ma[22m[1mm[22m[1me[22m Groupe[1md[22mD[1ma[22m[1mt[22m[1ma[22m[1mF[22m[1mr[22m[1ma[22m[1mm[22m[1me[22m



An AbstractDataFrame that stores a set of named columns

The columns are normally AbstractVectors stored in memory, particularly a Vector or CategoricalVector.

**Constructors**

```julia
DataFrame(columns::Vector, names::Vector{Symbol}; makeunique::Bool=false)
DataFrame(columns::Matrix, names::Vector{Symbol}; makeunique::Bool=false)
DataFrame(kwargs...)
DataFrame(pairs::Pair{Symbol}...; makeunique::Bool=false)
DataFrame() # an empty DataFrame
DataFrame(t::Type, nrows::Integer, ncols::Integer) # an empty DataFrame of arbitrary size
DataFrame(column_eltypes::Vector, names::Vector, nrows::Integer; makeunique::Bool=false)
DataFrame(column_eltypes::Vector, cnames::Vector, categorical::Vector, nrows::Integer;
          makeunique::Bool=false)
DataFrame(ds::AbstractDict)
```

**Arguments**

  * `columns` : a Vector with each column as contents or a Matrix
  * `names` : the column names
  * `makeunique` : if `false` (the default), an error will be raised if duplicates in `names` are found; if `true`, duplicate names will be suffixed with `_i` (`i` starting at 1 for the first duplicate).
  * `kwargs` : the key gives the column names, and the value is the column contents
  * `t` : elemental type of all columns
  * `nrows`, `ncols` : number of rows and columns
  * `column_eltypes` : elemental type of each column
  * `categorical` : `Vector{Bool}` indicating which columns should be converted to                 `CategoricalVector`
  * `ds` : `AbstractDict` of columns

Each column in `columns` should be the same length.

**Notes**

A `DataFrame` is a lightweight object. As long as columns are not manipulated, creation of a `DataFrame` from existing AbstractVectors is inexpensive. For example, indexing on columns is inexpensive, but indexing by rows is expensive because copies are made of each column.

If a column is passed to a `DataFrame` constructor or is assigned as a whole using `setindex!` then its reference is stored in the `DataFrame`. An exception to this rule is assignment of an `AbstractRange` as a column, in which case the range is collected to a `Vector`.

Because column types can vary, a `DataFrame` is not type stable. For performance-critical code, do not index into a `DataFrame` inside of loops.

**Examples**

```julia
df = DataFrame()
v = ["x","y","z"][rand(1:3, 10)]
df1 = DataFrame(Any[collect(1:10), v, rand(10)], [:A, :B, :C])
df2 = DataFrame(A = 1:10, B = v, C = rand(10))
dump(df1)
dump(df2)
describe(df2)
head(df1)
df1[:A] + df2[:C]
df1[1:4, 1:2]
df1[[:A,:C]]
df1[1:2, [:A,:C]]
df1[:, [:A,:C]]
df1[:, [1,3]]
df1[1:4, :]
df1[1:4, :C]
df1[1:4, :C] = 40. * df1[1:4, :C]
[df1; df2]  # vcat
[df1  df2]  # hcat
size(df1)
```


In [138]:
out[out[:Trait].=="1"&out[:Effect].=="ϵ" ,:]

LoadError: [91mMethodError: no method matching &(::String, ::Array{Any,1})[0m
Closest candidates are:
  &(::Any, ::Any, [91m::Any[39m, [91m::Any...[39m) at operators.jl:424
  &([91m::Number[39m, ::AbstractArray) at deprecated.jl:56
  &([91m::AbstractArray[39m, ::AbstractArray) at deprecated.jl:56[39m

In [108]:
DataFrame(out)

Unnamed: 0,x1,x2,x3,x4
1,Trait,Effect,Level,Estimate
2,1,intercept,intercept,13.4964
3,1,x1,x1,0.712683
4,1,x3,m,-8.0825
5,1,x3,f,-10.0853
6,1,ID,a12,0.221651
7,1,ID,a10,-0.037391
8,1,ID,a11,0.0561244
9,1,ID,a2,-0.0985782
10,1,ID,a9,0.0754944


In [95]:
?permutedims

search: [1mp[22m[1me[22m[1mr[22m[1mm[22m[1mu[22m[1mt[22m[1me[22m[1md[22m[1mi[22m[1mm[22m[1ms[22m [1mp[22m[1me[22m[1mr[22m[1mm[22m[1mu[22m[1mt[22m[1me[22m[1md[22m[1mi[22m[1mm[22m[1ms[22m! i[1mp[22m[1me[22m[1mr[22m[1mm[22m[1mu[22m[1mt[22m[1me[22m[1md[22m[1mi[22m[1mm[22m[1ms[22m [1mP[22m[1me[22m[1mr[22m[1mm[22m[1mu[22m[1mt[22m[1me[22m[1md[22mD[1mi[22m[1mm[22m[1ms[22mArray



```
permutedims(A, perm)
```

Permute the dimensions of array `A`. `perm` is a vector specifying a permutation of length `ndims(A)`. This is a generalization of transpose for multi-dimensional arrays. Transpose is equivalent to `permutedims(A, [2,1])`.

See also: [`PermutedDimsArray`](@ref).

# Example

```jldoctest
julia> A = reshape(collect(1:8), (2,2,2))
2×2×2 Array{Int64,3}:
[:, :, 1] =
 1  3
 2  4

[:, :, 2] =
 5  7
 6  8

julia> permutedims(A, [3, 2, 1])
2×2×2 Array{Int64,3}:
[:, :, 1] =
 1  3
 5  7

[:, :, 2] =
 2  4
 6  8
```


In [139]:
describe(out)

Trait
Summary Stats:
Length:         82
Type:           Any
Number Unique:  3
Number Missing: 0
% Missing:      0.000000

Effect
Summary Stats:
Length:         82
Type:           Any
Number Unique:  9
Number Missing: 0
% Missing:      0.000000

Level
Summary Stats:
Length:         82
Type:           Any
Number Unique:  21
Number Missing: 0
% Missing:      0.000000

Estimate
Summary Stats:
Length:         82
Type:           Any
Number Unique:  82
Number Missing: 0
% Missing:      0.000000



In [82]:
[out
[strip(i) for i in split(res[3,1],':',keep=false)]]

LoadError: [91mDimensionMismatch("mismatch in dimension 2 (expected 3 got 1)")[39m

In [88]:
JWAS.transubstrarr(out)

1×249 Array{String,2}:
 "trait"  "effect"  "level"  "1"  …  "m"  "1"  "x3"  "m"  "1"  "x3"  "m"