A genetic algorithm to find 2-locus epistatic parameter space to maximise the maintenence of additive genetic variance under selection. This programme was used for analysis in the following article:
Hemani G, Knott S, Haley C. An evolutionary perspective on epistasis and the missing heritability. PLoS Genetics (in press).
For detailed information on the background and interpretation of
epiSpaces please refer to this article.
(The name is sort of a pun/portmanteau of epistasis and parameter space...)
Epistasis may play an important role in the genetic variation of complex traits, as a consequence of being able to mask additive genetic variation from selection. This programme was written to find genotype-phenotype maps that maximise additive genetic variation over many generations, where the causal variants act directly on fitness.
A brief description of what the programme does follows:
We are trying to create two locus bi-allelic epistatic patterns that maximise additive variance and survival time under selection.
UNMODrandom patterns of epistasis (aka genotype-phenotype maps, or models) with fitness values with a maximum range of URANGE.
- For each model create a population of individuals whose fitness is governed by that model, for each of combinations of frequencies in
UFREQ2. For example, if
UFREQ2=5then there will be 25 initial populations for each model, and each population will have the mutations at a different starting frequency.
- We calculate the theoretical allele frequency trajectory for each of the
UNFREQS2populations, and for each model.
- Additive variance is summed from all the runs of eligible patterns starting from generation
- If at least
UGTHRESHpopulations out of all the initial populations survive for at least
UMAXGgenerations then that model is eligible to continue.
- Each surviving model is mutated a number of different times (defined in
UNEXTMODS), by sampling from the values in the
UPERTarray. The original model is also kept so that fitness cannot regress.
- If none survive then choose a new random set of models.
Requires GCC. To install (on Mac or Linux) simply clone the repo and run
This will create an executable called
How to run
epiSpaces takes only one argument - the filename of a parameter file. An example of a parameter file is available in this repo,
example_parameter.txt. This algorithm isn't necessarily designed to identify a global solution, moreover it explores the parameter space of the genotype-phenotype map. To this end genetic algorithms can be tuned and perturbed to deliver varying results, and so to access this flexibility the inputs have to be carefully chosen. The input file consists of 19 lines, where each line must be the value for a specific parameter, as listed in order below:
UNIDNumber of individuals in the population (e.g. 1000)
UMAXGMaximum number of generations to simulate each population (e.g. 100)
UMAXRUNSMaximum number of generations for the genetic algorithm to run (e.g. 5000)
UGTHRESHThe number of runs (out of all initial frequencies) that must survive to be allowed to continue (e.g. 20 when there are 25 starting frequencies)
USTARTVAThe generation at which to start summing additive genetic variance (e.g. 20)
URANGEExact range between maximum and minimum genotype class means after scaling (e.g. 6)
UNFREQ1Number of different starting frequencies for locus 1 (e.g. 5)
UNFREQ1 UNFREQ1 ...Starting frequencies for locus 1 (e.g. 0.1 0.3 0.5 0.7 0.91)
UNFREQ2Number of different starting frequencies for locus 2 (e.g. 5)
UNFREQ2 UNFREQ2 ...Starting frequencies for locus 2 (e.g. 0.1 0.3 0.5 0.7 0.91)
UNMODNumber of random genotype-phenotype maps to generate (e.g. 40)
UNBESTMODSNumber of models to take forward to the next generation (e.g. 4)
UNEXTMODS UNEXTMODS ... UNEXTMODS[UNBESTMODS-1]How many random variations of each model carried forward to make, and how many new models to make (e.g. 5 5 5 5 20 if 5 mutations of each of the 4 best models to be made, plus 20 completely new mutations)
UNPERTNumber of perturbation values to be used for mutating models (e.g. 10)
UPERT UPERT ...Perturbation values from which to sample mutations to the models (e.g. -0.04 -0.02 -0.01 0 0 0 0 0.01 0.02 0.04)
USEEDSeed for random number generator (e.g. 1234)
UFILENAMERootname for output files (e.g. ga_out)
INITIALInitial pattern (e.g. -1 0 1 0 0 0 1 0 -1 would be an example of additive x additive epistasis)
OVERRIDEOverriding pattern for any sampling problems (e.g. -1 -1 -1 -1 -1 -1 -1 -1 -1)
Once the input file is created, (e.g.
example_parameters.txt), the following command runs the programme:
There are two output files, with the examples given they will be named
ga_out is a table that tracks all the models that passed the threshold
UGTHRESH at each generation, it has the following columns:
- Model number
- Allele frequency A
- Allele frequency B
- Average start Vg
- Average end Vg
- Sum Va over population history
ga_outpats file details the genotype-phenotype map for every model that appears in
ga_out, along with its total Va. The patterns that maximise for the conditions of the genetic algorithm are at the end of the files.
epiSpaces is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
epiSpaces is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.