-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a sequence generator for the tests and benchmarks of the library such that it is easier to handle the generation of sequences inbetween different calls. #12
Comments
Since this ticket only describes the functional requirements, I thought up the following (iterative) design: After some thought I think the base class for sequence generation should be called 1) Implement call
|
Some thoughts before I forget about them. 1)I very much agree with the general style proposed in 1 with some additions:
seqan3::test::sequence_generator<seqan3::dna4> generate_sequence{/*.seed = */42};
auto sequence = generate_sequence(100, 10); To make it invocable without arguments you could also do: seqan3::test::sequence_generator<seqan3::dna4> generate_sequence{/*.seed = */42};
size_t average_size = 100; // As an example for possible run time parameter.
size_t size_variance = 10;
std::vector<sequence_t> sequences(10);
std::ranges::generate(sequences, [&] () { return generate_sequence(average_size, size_variance); }); The rational is that it is natural to change the size or variance based on some run time parameters. In the proposed approach either a new generator has to be constructed or the generator offers setter interfaces for the respective parameters. 2)I would not add the range interface but would rather have a generic generator view that gets an invocable and generates an infinity range by calling the invocable. (Note: one could play around with co-routines here as well) seqan3::test::sequence_generator<seqan3::dna4> generate_sequence{/*.seed = */42};
for (auto && sequence : seqan3::views::generate([&] () { return generate_sequence(100); }) | seqan3::views::take(1000))
{ ... } The other points are quite far in the future so I haven't put more thoughts into it. Implementation details:Since the purpose is mainly for seqan3, I propose to use the standard seqan3 alphabet interfaces to generate a sequence. For the SeqAn2 alphabet (seqan::SimpleType) we should add overloads such that they can be found by the CPO. Generate a single letter: std::uniform_int_distribution<> letter_rank_distribution(0ull, seqan3::alphabet_size_v<alphabet_t>);
/// ...
constexpr alphabet_t generate_letter()
{
return seqan3::assign_rank(alphabet_t{}, this->letter_rank_distribution(this->random_number_generator));
} Generate a sequence: constexpr std::vector<alphabet_t> operator()(sequence_size, size_variance = 0)
{
std::uniform_int_distribution<> size_distribution(sequence_size - size_variance, sequence_size + size_variance);
std::vector<alphabet_t> sequence(size_distribution(this->random_number_generator));
std::ranges::generate(sequence, generate_letter);
return sequence;
} We then can either convert the vector to a seqan::String on the caller site: auto sequence = generate_sequence(100);
seqan::String<seqan::Dna4> seqan_string(sequence.size());
std::ranges::move(sequence, seqan_string.begin()); or provide a convenient traits object to the sequence_generator: seqan3::test::sequence_generator<seqan::Dna4, seqan::String /*defaults to std::vector*/> ...; Verbose version 2 (allows more control over the sequence type specification): template <typename alphabet_t>
struct seqan_traits
{
using alphabet_type = alphabet_t;
using sequence_type = seqan::String<alphabet_type, seqan::Alloc<>>;
};
seqan3::test::sequence_generator<seqan::Dna4, seqan_traits> ...; |
Resolution: 11.08.2020This would be an example for the look and feel of the template <std::uniform_random_bit_generator generator_t>
auto fn(generator_t && generator /*could be std::mt19937_64*/)
{
using sequence_t = std::vector<seqan3::dna4>;
std::vector<sequence_t> sequences(100);
seqan3::test::random_sequence_generator sequence_gen{.size = 100, .size_variance = 10};
std::ranges::generate(sequences, [&] () { return sequence_gen<sequence_t>(generator); });
return sequences;
} ... we also considered a view based interface, but we decided for the above solution. template <std::uniform_random_bit_generator generator_t>
auto fn(generator_t && generator /*could be std::mt19937_64*/)
{
using sequence_t = std::vector<seqan3::dna4>;
std::vector<sequence_t> sequences;
seqan3::test::random_sequence_generator sequence_gen{.size = 100, .size_variance = 10};
std::ranges::move(ranges::view::generate_n([&] () { return sequence_gen<sequence_t>(generator); }, 100),
std::back_inserter(sequences));
return sequences;
} EDIT (14.08.2020)This would be the scaffold of the random_sequence_generator. class random_sequence_generator
{
public:
// all constructors
explicit random_sequence_generator(size_t size, size_t size_variance = 0) : size{size}, size_variance{size_variance}
{}
template <seqan3::sequence sequence_t, std::uniform_random_bit_generator generator_t>
requires (std::ranges::output_range<sequence_t>) // possibly a specific interface that allows to create a std::back_inserter from the sequence type.
sequence_t operator()(generator_t && random_generator) const
{
sequence_t random_sequence{};
// implement sequence generation using random_generator
return random_sequence;
}
size_t size{};
size_t size_variance{};
}; User calling code: int main()
{
std::mt19937_64 random_engine{42};
seqan3::test::random_sequence_generator random_sequence_gen{100};
auto seq1 = random_sequence_gen<std::vector<seqan3::dna4>>(random_engine);
auto seq2 = random_sequence_gen<seqan::String<seqan::Dna>>(random_engine);
} |
Fixed by seqan/seqan3#1985 |
Description
Currently we use free functions to generate the sequences but have to pass the seed for every sequence invocation. This makes it difficult to generate different sequences over different test instances or to generate multiple "random" sequences without resetting the seed for every invocation. Accordingly, there should be generator class that has the random number generator as a state and is initialised during construction. Thus, it becomes much easier to handle a global sequence generator or a test case local one that generates random sequences for a single test but the same sequences for every test case, e.g. as member of the test fixture.
Acceptance criteria
Tasks
Definition of Done
The text was updated successfully, but these errors were encountered: