# Simulation Of Marker Data For Hypothesis Testing Using XSim

In `hypothesisTestBayesABC` different methods are compared for inferring the difference between the least favorable and the most favorable genotype. For more background and explanaition, please have a look at the notebook directly.

The approach taken in here is based on the notebooks about `XSim` used on day2. Based on the content of the notebook `dataSimulation`, we can setup the following strategy.

## Initialize sampler

The last dataset that is used in the comparision of the methods uses $n=500$ animals with $p = 1000$ markers. Hence we are aiming for a population of that size in our simulation.

In [1]:
using XSim
chrLength = 1.0                                     ### # assume chromosome to be of length = 1M 
numChr    = 1                                       ### # assume 1 chromosome
numLoci   = 1000                                    ### # number of loci in the genome
mutRate   = 0.0                                     ### # mutation rate
locusInt  = chrLength/numLoci                       ### # interval between loci
mapPos   = collect(0:locusInt:(chrLength-0.0001))   ### # map-positions of loci on chromosome, evenly-spaced
geneFreq = fill(0.5,numLoci)                        ### # gene-frequency, the same for all = .5
XSim.build_genome(numChr,chrLength,numLoci,geneFreq,mapPos,mutRate) 

## Sampling Founders

We are fixing the number of founders to be 500. 

In [8]:
nNrFounder = 500;
sires = sampleFounders(nNrFounder);
dams  = sampleFounders(nNrFounder);

Sampling 500 animals into base population.
Sampling 500 animals into base population.


## Random Matings
Over a given number of generations, the population is mated randomly.

In [9]:
ngen,popSize = 20,500
sires1,dams1,gen1 = sampleRan(popSize, ngen, sires, dams);

Generation     2: sampling   250 males and   250 females
Generation     3: sampling   250 males and   250 females
Generation     4: sampling   250 males and   250 females
Generation     5: sampling   250 males and   250 females
Generation     6: sampling   250 males and   250 females
Generation     7: sampling   250 males and   250 females
Generation     8: sampling   250 males and   250 females
Generation     9: sampling   250 males and   250 females
Generation    10: sampling   250 males and   250 females
Generation    11: sampling   250 males and   250 females
Generation    12: sampling   250 males and   250 females
Generation    13: sampling   250 males and   250 females
Generation    14: sampling   250 males and   250 females
Generation    15: sampling   250 males and   250 females
Generation    16: sampling   250 males and   250 females
Generation    17: sampling   250 males and   250 females
Generation    18: sampling   250 males and   250 females
Generation    19: sampling   25

## Get Genotypes
To get all genotypes, we start by combining the cohorts of males and females together into a single one.

In [10]:
animals = concatCohorts(sires1,dams1)
M = getOurGenotypes(animals)

500×1000 Array{Int64,2}:
 0  1  0  1  2  2  2  1  1  1  0  1  1  …  2  2  0  2  0  2  2  0  0  2  2  0
 2  2  0  0  2  2  0  0  2  2  2  0  0     2  2  1  0  1  2  1  0  0  2  0  1
 1  1  1  1  1  1  0  1  2  0  1  0  2     1  1  1  2  1  1  0  1  2  1  2  0
 1  1  1  1  1  1  2  2  1  1  0  0  0     0  1  2  0  2  1  1  2  2  0  2  0
 0  1  0  2  2  2  2  2  2  1  0  1  0     0  2  2  1  0  1  0  1  0  2  1  1
 2  1  1  1  2  1  1  2  0  2  1  1  0  …  2  1  1  2  1  1  1  1  1  1  0  1
 1  1  0  1  0  2  1  1  1  1  2  1  1     1  0  1  2  0  2  1  2  1  1  2  1
 0  2  0  1  1  0  0  2  1  1  2  1  1     2  1  1  2  1  0  2  1  1  2  1  1
 0  2  1  2  1  0  1  1  1  1  2  1  1     0  1  1  0  2  1  2  2  2  1  1  1
 1  1  1  2  0  1  0  1  1  0  2  0  0     2  1  1  1  0  2  1  2  2  2  2  1
 1  1  0  1  2  1  2  2  1  1  0  0  1  …  2  2  0  1  1  2  2  0  2  0  2  0
 0  0  0  1  2  1  0  1  0  1  0  2  0     1  1  2  0  1  1  1  1  1  2  2  0
 2  1  1  1  0  0  1  1  1  1  1  0  1 

## Gene Frequencies

In [12]:
using Statistics
freq=Statistics.mean(M,dims=1)/2

1×1000 Array{Float64,2}:
 0.479  0.517  0.326  0.549  0.556  …  0.508  0.564  0.493  0.555  0.466

Getting correlations and LD

In [13]:
M = float(M)

500×1000 Array{Float64,2}:
 0.0  1.0  0.0  1.0  2.0  2.0  2.0  1.0  …  2.0  2.0  0.0  0.0  2.0  2.0  0.0
 2.0  2.0  0.0  0.0  2.0  2.0  0.0  0.0     2.0  1.0  0.0  0.0  2.0  0.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  0.0  1.0     1.0  0.0  1.0  2.0  1.0  2.0  0.0
 1.0  1.0  1.0  1.0  1.0  1.0  2.0  2.0     1.0  1.0  2.0  2.0  0.0  2.0  0.0
 0.0  1.0  0.0  2.0  2.0  2.0  2.0  2.0     1.0  0.0  1.0  0.0  2.0  1.0  1.0
 2.0  1.0  1.0  1.0  2.0  1.0  1.0  2.0  …  1.0  1.0  1.0  1.0  1.0  0.0  1.0
 1.0  1.0  0.0  1.0  0.0  2.0  1.0  1.0     2.0  1.0  2.0  1.0  1.0  2.0  1.0
 0.0  2.0  0.0  1.0  1.0  0.0  0.0  2.0     0.0  2.0  1.0  1.0  2.0  1.0  1.0
 0.0  2.0  1.0  2.0  1.0  0.0  1.0  1.0     1.0  2.0  2.0  2.0  1.0  1.0  1.0
 1.0  1.0  1.0  2.0  0.0  1.0  0.0  1.0     2.0  1.0  2.0  2.0  2.0  2.0  1.0
 1.0  1.0  0.0  1.0  2.0  1.0  2.0  2.0  …  2.0  2.0  0.0  2.0  0.0  2.0  0.0
 0.0  0.0  0.0  1.0  2.0  1.0  0.0  1.0     1.0  1.0  1.0  1.0  2.0  2.0  0.0
 2.0  1.0  1.0  1.0  0.0  0.0  1.0  1

In [14]:
corMat = cor(M)

1000×1000 Array{Float64,2}:
  1.0          -0.0717905    0.0820936    …   0.0465538    0.073344  
 -0.0717905     1.0          0.182229        -0.00755517  -0.0354066 
  0.0820936     0.182229     1.0             -0.0551715    0.0219699 
 -0.164574      0.042311    -0.0649889        0.0655094   -0.0377836 
 -0.00615178    0.0761537   -0.315534         0.0608489   -0.0806681 
 -0.167526      0.0211645   -0.113425     …  -0.0398169    0.0752997 
 -0.231685     -0.145405    -0.0350174        0.028283    -0.00389995
 -0.0446621    -0.324354    -0.296228         0.145731     0.012219  
  0.229736      0.00786328   0.0113362       -0.0159151   -0.0502303 
 -0.177658     -0.140454    -0.0555866       -0.0681038   -0.0942289 
 -0.111436      0.101058    -0.0978386    …  -0.0119036   -0.021501  
  0.14857       0.0172316    0.0908203        0.0380715   -0.0534654 
  0.109331     -0.0299815   -0.116013         0.087588    -0.0265401 
  ⋮                                       ⋱                   

In [17]:
LDMat = zeros(800,200)
for i = 1:800
    LDMat[i,:] = corMat[i,(i+1):(i+200)].^2
end

In [18]:
y = Statistics.mean(LDMat,dims=1)

1×200 Array{Float64,2}:
 0.0216882  0.0201184  0.0218095  …  0.00476824  0.00484815  0.00441463