## Unrolling Function `build_genome`
The basic version of the function `build_genome` is found on github in XSim.jl/src/XSim.jl. Here, we try to unroll the function body to understand what is happening inside of the function.

In [1]:
using XSim, DelimitedFiles, Distributions

### Initialisation of Function Arguments
The first step is to initialize the function argument.

In [2]:
### # number of chromosomes
numChr     = 2
### # number of loci
nLoci      = 6096
### # genetic length of each chromosome
chrLength  = [1.62;1.41]
### # number of loci on each chromosome
numLoci    = [3339;2757]
### # mutation rate
mutationRate = 0.0
### # read the genetic loci from a file
myData = readdlm("markerCatalogue4JuliaChrom1-2", ' ', Any, '\n', header=false)
### # convert read positions to floating point numbers
mp1 = Float64.(myData[1,1:numLoci[1]])
mp2 = Float64.(myData[2,1:numLoci[2]])
### # combining map positions into an array of arrays
mapPos = [mp1, mp2]
mapPos

2-element Array{Array{Float64,1},1}:
 [4.83e-6, 1.329e-5, 2.863e-5, 4.975e-5, 5.266e-5, 0.0001074, 0.0001404, 0.0001939, 0.00022, 0.0002716  …  1.607, 1.608, 1.608, 1.608, 1.609, 1.609, 1.61, 1.61, 1.61, 1.611]    
 [4.032e-5, 4.066e-5, 5.46e-5, 5.816e-5, 6.075e-5, 6.748e-5, 0.0001557, 0.0001713, 0.0002177, 0.0002378  …  1.403, 1.403, 1.404, 1.404, 1.405, 1.406, 1.406, 1.406, 1.406, 1.407]

Gene frequencies are assigned here, such that they can be passed eventually to the `build_genome()` function. But they will not be used, because the genetic loci information will be based on the data read from the file.

In [3]:
### # the following genefrequencies are not used, because real
### #  data is used, but build_genome() requires an array of arrays
genefreq1   = fill(0.5,numLoci[1])
genefreq2 = fill(0.5,numLoci[2])
geneFreq = [genefreq1, genefreq2];

From the number of loci, a random sample is declared as QTL.

In [4]:
### # indices of qtl positions
idx = rand(numLoci[1]).>0.995  # you want 0.5% to be QTL, i.e about 17 QTL
qtlIndex1 = collect(1:numLoci[1])[idx]
idx = rand(numLoci[2]).>0.995  # you want 0.5%% to be QTL, i.e about 14 QTL
qtlIndex2 = collect(1:numLoci[2])[idx]
qtlIndex = [qtlIndex1, qtlIndex2]

2-element Array{Array{Int64,1},1}:
 [364, 610, 785, 824, 960, 1159, 1721, 1905, 1987, 2631, 2633, 2829, 2941, 3245]
 [128, 228, 253, 264, 497, 672, 1420, 1509, 1938, 1996, 2154, 2317, 2385]       

The QTL-effects are generated from a normal distribution.

In [5]:
line = 0
numQTL = 0

for i in qtlIndex
    line +=1
    println("Number of QTL on chromosome $line: ", length(i))
    numQTL += 1
end

qtlEffect1 = randn(length(qtlIndex1))/sqrt(0.5*numQTL)
qtlEffect2 = randn(length(qtlIndex2))/sqrt(0.5*numQTL)
qtlEffect = [qtlEffect1, qtlEffect2];

Number of QTL on chromosome 1: 14
Number of QTL on chromosome 2: 13



Now all components are ready and we can step into the function body.

In [6]:
#build_genome(numChr,chrLength,numLoci,geneFreq, mapPos, qtlIndex, qtlEffect, mutationRate)
nChromosome = numChr
chromosome_length = chrLength
nLoci = numLoci
gene_frequency = geneFreq
map_position = mapPos
qtl_index = qtlIndex
qtl_effect = qtlEffect
mutation_rate = mutationRate
genotypeErrorRate=0.0;


The following statements are copied from the function body. We first have to define the required types.

In [9]:
mutable struct LocusInfo
    map_pos::Float64
    allele_freq::Array
    QTL::Bool
    QTL_effect::Float64
end

mutable struct ChromosomeInfo
    chrLength::Float64
    numLoci::Int64
    mapPos::Array{Float64,1}
    loci::Array{LocusInfo,1}
end

In [10]:
### # initialize
QTL_index  = Array{Int64}(undef, 0)  #for whole genome
QTL_effect = Array{Float64}(undef, 0)#for whole genome
chr        = Array{ChromosomeInfo}(undef, 0)#for whole genome

startlocus= 0; #locus index on whole genome

In [12]:
### # unroll loop over chromosomes
#for j in 1:nChromosome
j=1
locus_array = Array{LocusInfo}(undef, nLoci[j]);

#end

In [14]:
### # unroll loop over loci
# for i in 1:nLoci[j]
i=1
        if map_position[j][i]>=chromosome_length[j]
          error("Map posion is not on the chromosome (map position >= chromosome length)")
        end
        pos = map_position[j][i]
        locus_array[i] = LocusInfo(pos,[gene_frequency[j][i],1-gene_frequency[j][i]],false,0.0)
#end

LocusInfo(4.83e-6, [0.5, 0.5], false, 0.0)

In [15]:
i=2
        if map_position[j][i]>=chromosome_length[j]
          error("Map posion is not on the chromosome (map position >= chromosome length)")
        end
        pos = map_position[j][i]
        locus_array[i] = LocusInfo(pos,[gene_frequency[j][i],1-gene_frequency[j][i]],false,0.0)

LocusInfo(1.329e-5, [0.5, 0.5], false, 0.0)

These assignments are done for all loci on the current chromosome.

For all QTL, the required settings are done in the LocusInfo objects. Then all LocusInfo objects are collected into a ChromosomeInfo object. 

At the end the following items are collected into a GenomeInfo object. 

* chr: array of ChromosomeInfo
* nChromosome: number of chromosomes
* mutation_rate: mutation rate
* genotypeErrorRate: error rate
* QTL_index: array of indices indicating which loci are qtl
* QTL_effect: array of qtl effects


At the bottom of the function `build_genome()` the GenomeInfo object is assigned to myCommon.G and an empty array of founder animals is assigned to myCommon.founders. This animal array will most likely be filled with any of the founder sampling methods. 