Skip to content

Commit

Permalink
Merge pull request #13 from rscherrer/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
rscherrer committed Apr 25, 2021
2 parents b152c77 + f85351f commit be37fad
Show file tree
Hide file tree
Showing 6 changed files with 63 additions and 106 deletions.
9 changes: 4 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,6 @@ General simulation parameters:
* `tburnin` (0) is the duration of the burn-in period, in generations
* `tend` (10) is the number of generations to simulate (after burn-in)
* `tsave` (10) is the frequency at which to record the data
* `tfreeze` (100) is the frequency at which to save the whole genomes of all individuals (separate because it takes a lot of space)
* `talkative` (1) is either 0 or 1 and sets whether the simulation should print status information to the prompt
* `choosewhattosave` (0) is either 0 or 1 and sets whether the variables to save are specified in a separate file, the order file (see below). If 0 all of the output variables are saved every `tsave` generations except for whole genomes
* `datsave` (1) sets whether to save the recorded variables to files
Expand Down Expand Up @@ -167,13 +166,13 @@ By default the program will save all these variables. To save only some of them,

## Saving whole individual genomes

Saving the whole genomes of all individuals through time takes a lot of space, for this reason this output is controlled separately from the other output variables. If you set `gensave` to 1 two things will be saved every `tfreeze` generations: (1) the whole genomes of all individuals in a _freezer file_ determined by the `freezerfile` parameter, and (2) the genetic values at every locus for every individual in a _locus file_ determined by the `locifile` parameter. Both files are binary data files (`.dat`).
Saving the whole genomes of all individuals through time takes a lot of space, for this reason this output is controlled separately from the other output variables. If you set `gensave 1` in addition to `datsave 1` two things will be saved every `tsave` generations: (1) the whole genomes of all individuals in a _freezer file_ determined by the `freezerfile` parameter, and (2) the genetic values at every locus for every individual in a _locus file_ determined by the `locifile` parameter. Both files are binary data files (`.dat`).

Whole genomes are encoded in the freezer file in a different way from other variables. To save space, we use the fact that alleles are binary (0 or 1). Each value in a full genome is an allele at a specific position along one of the two haplotypes of an individual. Therefore, a genome contains twice as many values as there are loci, because the organisms are diploid. Each value is either 0 or 1 (the two possible alleles). Haplotypes are saved in turns, such that the first N values are all alleles of the first haplotype and the next N values are all alleles of the second haplotype, if N is the total number of loci. This does not mean that each saved individual genome is exactly 2N values long, though. In order to save space for this large amount of data, individual genomes are first split into blocks of 64 bits, and each block is converted into a 64bit integer, which is then saved to the `freezerfile` as binary. Therefore, the `freezerfile` output file must be interpreted on a bit-wise basis in order to retrieve the actual alleles of the individual (i.e. reading it as 64bit integers will show integer equivalents of chunks of 64 alleles). This also means that for each individual, a multiple of 64 bits will be written to the file, even if 2N alleles is not necessarily a multiple of 64. In other words, for each individual 2N bits will be written to file, and the remaining part of the last 64bit-chunk will be filled with zeros.
1. Whole genomes are encoded in the freezer file in a different way from other variables. To save space, we use the fact that alleles are binary (0 or 1). Each value in a full genome is an allele at a specific position along one of the two haplotypes of an individual. Therefore, a genome contains twice as many values as there are loci, because the organisms are diploid. Each value is either 0 or 1 (the two possible alleles). Haplotypes are saved in turns, such that the first N values are all alleles of the first haplotype and the next N values are all alleles of the second haplotype, if N is the total number of loci. This does not mean that each saved individual genome is exactly 2N values long, though. In order to save space for this large amount of data, individual genomes are first split into blocks of 64 bits, and each block is converted into a 64bit integer, which is then saved to the `freezerfile` as binary. Therefore, the `freezerfile` output file must be interpreted on a bit-wise basis in order to retrieve the actual alleles of the individual (i.e. reading it as 64bit integers will show integer equivalents of chunks of 64 alleles). This also means that for each individual, a multiple of 64 bits will be written to the file, even if 2N alleles is not necessarily a multiple of 64. In other words, for each individual 2N bits will be written to file, and the remaining part of the last 64bit-chunk will be filled with zeros.

Genetic values across the genome for each individual are stored in the locus file as floating point numbers encoded into a binary file, just as the other output variables, but with one value per locus per individual, per time point.
2. Genetic values across the genome for each individual are stored in the `locifile` as floating point numbers encoded into a binary file, just as the other output variables, but with one value per locus per individual, per time point.

**Important:** the functions in the R package `speciomer` that deal with individual whole genomes are expecting to know how many individuals lived at each time point where whole genomes were saved (to be able to assign time points to individuals), using `time.dat` and `population_size.dat`. This means that the two latter files must be saved at the same time points as the `freezerfile` and the `locifile`. So, it is not possible to save general data (i.e. not individual genomes) at different time intervals than whole individual genomes. That is, when `gensave 1`, `datsave` must be one, `time.dat` and `population_size.dat` must be saved and `tfreeze` and `tsave` must be the same. Individual genomes take up a lot of space, so we advise to not save them too frequently. To be able to save general data regularly but individual genomes less regularly, we advise the user to run two separate simulations with the same `seed`, one saving regular data often, and one focusing on less frequent whole genome saving. When using a random seed this can be done by setting `parsave 1` in the first simulation and using the recorded `parfile`, which includes the recorded seed, as a parameter file for the second simulation.
**Important:** whole individual genomes take a lot of space. For this reason we advise against saving them too often. Unfortunately it is not possible to save some variables at a certain frequency and individual genomes less frequently in one simulation, as there is only one `tsave`. To collect data on different timelines, however, it is possible to run multiple simulations with the same `seed`. This can be done either by choosing a seed beforehand, or by saving the random seed of the first simulation (it will be saved in the `parfile` if `parsave 1`) and use it as `seed` to parametrize the next simulations.

## Which variables to save

Expand Down
5 changes: 0 additions & 5 deletions src/Param.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@ Param::Param() :
tburnin(0),
tend(10),
tsave(10),
tfreeze(100),
talkative(true),
datsave(true),
choosewhattosave(false),
Expand Down Expand Up @@ -138,7 +137,6 @@ void Param::import(std::ifstream &file)
else if (input == "tburnin") file >> tburnin;
else if (input == "tend") file >> tend;
else if (input == "tsave") file >> tsave;
else if (input == "tfreeze") file >> tfreeze;
else if (input == "talkative") file >> talkative;
else if (input == "datsave") file >> datsave;
else if (input == "choosewhattosave") file >> choosewhattosave;
Expand Down Expand Up @@ -264,8 +262,6 @@ void Param::check() const
msg = "End time should be positive";
if (tsave <= 0)
msg = "Save time should be positive";
if (tfreeze <= 0)
msg = "Freezing time should be positive";
if (ntrials == 0u)
msg = "Number of mating trials should be at least one";

Expand Down Expand Up @@ -334,7 +330,6 @@ void Param::write(std::ofstream &file) const
file << "tburnin " << tburnin << '\n';
file << "tend " << tend << '\n';
file << "tsave " << tsave << '\n';
file << "tfreeze " << tfreeze << '\n';
file << "talkative " << talkative << '\n';
file << "datsave " << datsave << '\n';
file << "choosewhattosave " << choosewhattosave << '\n';
Expand Down
1 change: 0 additions & 1 deletion src/Param.h
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,6 @@ struct Param {
int tburnin;
int tend;
int tsave;
int tfreeze;
bool talkative;
bool datsave;
bool choosewhattosave;
Expand Down
21 changes: 5 additions & 16 deletions src/Simul.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,6 @@ bool timetosave(const int &t, const int &tsave)
return t >= 0 && t % tsave == 0;
}

bool timetofreeze(const int &t, const int &tfreeze)
{
if (tfreeze == 0 && t == 0) return true;
return t > 0 && t % tfreeze == 0;

// Note: the modulo of zero by some number is always zero, so make sure
// to set t > 0 as a condition otherwise time point zero will always be
// considered freezing time
}

int simulate(const std::vector<std::string> &args)
{

Expand Down Expand Up @@ -66,7 +56,7 @@ int simulate(const std::vector<std::string> &args)
std::cout << "Simulation started.\n";

// Loop through time
for (int t = -pars.tburnin; t < pars.tend; ++t) {
for (int t = -pars.tburnin; t <= pars.tend; ++t) {

if (t == 0) metapop.exitburnin();

Expand All @@ -86,11 +76,10 @@ int simulate(const std::vector<std::string> &args)
const size_t tu = static_cast<size_t>(t);
printer.print(tu, collector, metapop);

}
// Save whole genomes if needed (space-consuming)
if (pars.gensave) freezer.freeze(metapop, pars.nloci);

// Save whole genomes if needed (space-consuming)
if (pars.gensave && timetofreeze(t, pars.tfreeze))
freezer.freeze(metapop, pars.nloci);
}

metapop.reproduce(pars, arch);
metapop.survive(pars);
Expand All @@ -103,7 +92,7 @@ int simulate(const std::vector<std::string> &args)
}

std::cout << "Simulation ended.\n";
std::fclose(stdout);
if (pars.logsave) std::fclose(stdout);

return 0;
}
Expand Down
117 changes: 54 additions & 63 deletions tests/SimulTests.cpp
Original file line number Diff line number Diff line change
@@ -1,63 +1,54 @@
#define BOOST_TEST_DYN_LINK
#define BOOST_TEST_MAIN

#include "src/Simul.h"
#include "tests/TestUtilities.h"
#include <boost/test/unit_test.hpp>
#include <iostream>

// Black box testing of the proper run of the main function

// Check that the program can run without arguments
BOOST_AUTO_TEST_CASE(testUseNoArgs)
{
// std::clog << "Testing run without arguments...\n";
BOOST_CHECK_EQUAL(simulate({ "EGS_test" }), 0);
}

// Check that the program cannot run with more than one argument
BOOST_AUTO_TEST_CASE(testAbuseTooManyArgs)
{
// std::clog << "Testing run with too many arguments...\n";
BOOST_CHECK_EQUAL(simulate({ "EGS_test", "arg1", "arg2" }), 1);
}

BOOST_AUTO_TEST_CASE(testAbuseInvalidFilename)
{
// std::clog << "Testing run with invalid parameter file...\n";
BOOST_CHECK_EQUAL(simulate({ "EGS_test", "nonsense.txt" }), 1);
}


BOOST_AUTO_TEST_CASE(testUseValidFilename)
{
// std::clog << "Testing run with valid parameter file...\n";
tst::makeValidParamFile();
BOOST_CHECK_EQUAL(simulate({ "EGS_test", "validparamfile.txt" }), 0);
}


BOOST_AUTO_TEST_CASE(testAbuseInvalidParamName)
{
// std::clog << "Testing run with invalid parameter names...\n";
tst::makeInvalidParamName();
BOOST_CHECK_EQUAL(simulate({ "EGS_test", "invalidparamname.txt" }), 1);
}

BOOST_AUTO_TEST_CASE(testAbuseInvalidParamValue)
{
// std::clog << "Testing run with invalid parameter values...\n";
tst::makeInvalidParamValue();
BOOST_CHECK_EQUAL(simulate({"EGS_test", "invalidparamvalue.txt"}), 1);
tst::makeInvalidParamValue2();
BOOST_CHECK_EQUAL(simulate({"EGS_test", "invalidparamvalue2.txt"}), 1);
}

BOOST_AUTO_TEST_CASE(testUseEmptyFreezer)
{

tst::makeParamFileEmptyFreezer();
simulate({"EGS_test", "paramfileemptyfreezer.txt"});
BOOST_CHECK(tst::isFileEmpty("empty_freezer.dat"));

}
#define BOOST_TEST_DYN_LINK
#define BOOST_TEST_MAIN

#include "src/Simul.h"
#include "tests/TestUtilities.h"
#include <boost/test/unit_test.hpp>
#include <iostream>

// Black box testing of the proper run of the main function

// Check that the program can run without arguments
BOOST_AUTO_TEST_CASE(testUseNoArgs)
{
// std::clog << "Testing run without arguments...\n";
BOOST_CHECK_EQUAL(simulate({ "speciome_test" }), 0);
}

// Check that the program cannot run with more than one argument
BOOST_AUTO_TEST_CASE(testAbuseTooManyArgs)
{
// std::clog << "Testing run with too many arguments...\n";
BOOST_CHECK_EQUAL(simulate({ "speciome_test", "arg1", "arg2" }), 1);
}

BOOST_AUTO_TEST_CASE(testAbuseInvalidFilename)
{
// std::clog << "Testing run with invalid parameter file...\n";
BOOST_CHECK_EQUAL(simulate({ "speciome_test", "nonsense.txt" }), 1);
}


BOOST_AUTO_TEST_CASE(testUseValidFilename)
{
// std::clog << "Testing run with valid parameter file...\n";
tst::makeValidParamFile();
BOOST_CHECK_EQUAL(simulate({ "speciome_test", "validparamfile.txt" }), 0);
}


BOOST_AUTO_TEST_CASE(testAbuseInvalidParamName)
{
// std::clog << "Testing run with invalid parameter names...\n";
tst::makeInvalidParamName();
BOOST_CHECK_EQUAL(simulate({ "speciome_test", "invalidparamname.txt" }), 1);
}

BOOST_AUTO_TEST_CASE(testAbuseInvalidParamValue)
{
// std::clog << "Testing run with invalid parameter values...\n";
tst::makeInvalidParamValue();
BOOST_CHECK_EQUAL(simulate({"speciome_test", "invalidparamvalue.txt"}), 1);
tst::makeInvalidParamValue2();
BOOST_CHECK_EQUAL(simulate({"speciome_test", "invalidparamvalue2.txt"}), 1);
}
16 changes: 0 additions & 16 deletions tests/TestUtilities.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -178,22 +178,6 @@ void tst::makeInvalidParamValue2()
file.close();
}

void tst::makeParamFileEmptyFreezer()
{
std::ofstream file;
file.open("paramfileemptyfreezer.txt");
if (!file.is_open())
std::cout << "Unable to open empty freezer parameter file.\n";

file << "freezerfile" << '\t' << "empty_freezer.dat" << '\n'
<< "gensave" << '\t' << 1u << '\n'
<< "tend" << '\t' << 10u << '\n'
<< "tsave" << '\t' << 10u << '\n'
<< "tfreeze" << '\t' << 100u << '\n';

file.close();
}

size_t tst::getFileSize(const std::string &filename) {

std::ifstream input(filename, std::ios::binary);
Expand Down

0 comments on commit be37fad

Please sign in to comment.