Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 639 and 646 #647

Merged
merged 4 commits into from Feb 4, 2021
Merged

Fix 639 and 646 #647

merged 4 commits into from Feb 4, 2021

Conversation

molpopgen
Copy link
Owner

@molpopgen molpopgen commented Jan 23, 2021

WIP to fix #639 and #646

  • Decide on input parameter validation

To follow up on in a later PR:

  1. Neutral mutations w/frequency tracking should error out if not simplifying every generation.

@molpopgen molpopgen changed the base branch from main to dev January 23, 2021 23:25
@molpopgen molpopgen linked an issue Jan 23, 2021 that may be closed by this pull request
@molpopgen molpopgen added this to the 0.13.0 milestone Jan 23, 2021
@molpopgen molpopgen force-pushed the fix_639_646 branch 4 times, most recently from f182ad6 to a7abc47 Compare January 24, 2021 19:13
@molpopgen molpopgen marked this pull request as draft January 24, 2021 23:13
@molpopgen
Copy link
Owner Author

The implementation of remove_extinct_mutations is part of the final simulation cleanup, and is where the error is raised. Should perhaps implement this to work from the tables rather than the "count vectors".

But, because we want the final population to have all of its variants fully counted, we need a check to make sure counting happens if there are ancient samples or if there are neutral mutations.

@molpopgen
Copy link
Owner Author

The implementation of remove_extinct_mutations is part of the final simulation cleanup, and is where the error is raised. Should perhaps implement this to work from the tables rather than the "count vectors".

But, because we want the final population to have all of its variants fully counted, we need a check to make sure counting happens if there are ancient samples or if there are neutral mutations.

Implementing both of these is not going to be the full solution; if we do the second, and add a check that all mutations in the table must have counts > 0, the check can fail. The failed check suggests that there's some logic missing earlier, likely during post-simplification actions.

@molpopgen
Copy link
Owner Author

So, 378f13e causes the test for #646 to pass. But, I am thinking that that change is a band-aid and not the real cause, as the commit is forcing mutation counting.

After counting, if suppress_table_indexing = False, then recycling happens from pop.mcounts and pop.mcounts_from_preserved_nodes. Ultimately, this is the problem. What should happen is that we use the return value from simplification instead. We should probably do this no matter what? It is possibly a bit slower, which is why the current behavior is in place, but probably not slow enough to really matter?

@molpopgen
Copy link
Owner Author

molpopgen commented Jan 25, 2021

annoyingly, tests are passing but the example that originally triggered this whole business can still fail. This block of code seems to be to blame:

    if (pop.generation == last_preserved_generation)
        {
            std::transform(begin(pop.mcounts_from_preserved_nodes),
                           end(pop.mcounts_from_preserved_nodes),
                           begin(last_preserved_generation_counts),
                           begin(pop.mcounts_from_preserved_nodes),
                           std::minus<std::uint32_t>());
        }

Since all tests are passing, we clearly don't have the right test in place to get whatever's happening here. This is probably a new issue/bug?

Update: I will guess that track_ancestral_count is to blame here...

@molpopgen
Copy link
Owner Author

molpopgen commented Jan 25, 2021

Update: I will guess that track_ancestral_count is to blame here...

The test TestSimplificationInterval is likely helpful here. Adding some debug code to this function causes this test to raise an error. Need to come back to this and make sure that the debug code is actually doing the right thing.

The debug code was indeed wrong.

@molpopgen
Copy link
Owner Author

For the existing test:

Changing the "simlen" for the second params fixture values results affects if error being triggered or not.

Likewise, in the original test script, adjusting simlen and/or simplification interval affects if it happens or not.

@molpopgen
Copy link
Owner Author

A recent force-push undid some candidate fixes. I did this because it forced the tests to fail in a binary way based on if neutral variants are simulated or not. For posterity, the removed changes were:

In simplify_tables.hpp:

  • Force mutation counting if restting ancient samples after simplification.
    This changed required a new argument being passed in, too.
    This is for 639.
  • Related to 646, we also forced counting from tables if any neutral variants are present.

In index_and_count_mutations.cc:

  • Check if any neutral variants are present.
  • If so, do not return early!
  • If so, count mutations from tables no matter what.
#include <numeric>
#include <stdexcept>
#include <fwdpy11/types/DiploidPopulation.hpp>
#include <fwdpp/ts/count_mutations.hpp>
#include <fwdpp/internal/sample_diploid_helpers.hpp>
#include "index_and_count_mutations.hpp"

void
index_and_count_mutations(const bool suppress_edge_table_indexing,
                          fwdpy11::DiploidPopulation& pop)
{
    pop.mcounts_from_preserved_nodes.resize(pop.mutations.size(), 0);
    bool neutral_variants_present = false;
    for (auto& m : pop.tables->mutations)
        {
            if (m.neutral)
                {
                    neutral_variants_present = true;
                    break;
                }
        }

    // FIXME: there must be a way to remove this awkward check?
    if (!suppress_edge_table_indexing && !neutral_variants_present)
        {
            return;
        }
    if (pop.tables->preserved_nodes.empty() || neutral_variants_present)
        {
            pop.tables->build_indexes();
            pop.fill_alive_nodes();
            if (pop.tables->preserved_nodes.empty())
                {
                    fwdpp::ts::count_mutations(*pop.tables, pop.mutations,
                                               pop.alive_nodes, pop.mcounts);
                }
            else
                {
                    fwdpp::ts::count_mutations(*pop.tables, pop.mutations,
                                               pop.alive_nodes, pop.mcounts,
                                               pop.mcounts_from_preserved_nodes);
                }
            for (auto& mr : pop.tables->mutations)
                {
                    if (pop.mcounts[mr.key] + pop.mcounts_from_preserved_nodes[mr.key]
                        == 0)
                        {
                            throw std::runtime_error(
                                "mutation in table has count of zero.");
                        }
                }
        }
    else
        {
            fwdpp::fwdpp_internal::process_haploid_genomes(pop.haploid_genomes,
                                                           pop.mutations, pop.mcounts);
        }
}

@molpopgen
Copy link
Owner Author

A check used in this PR is to assert that mutation table rows have nonzero counts. See 80689f1, for example. This check is not correct when resetting--the deletion of ancient sample records can cause a mutation to be in the tables but have no descendants. Thus, index_and_count_mutations correctly assigns it a count of zero, but then our check triggers an error. A little erase/remove will handle this, but only if we are resetting! This check could be a useful way to catch other legitimate errors.

@molpopgen
Copy link
Owner Author

This works to make all tests pass:

  • index_and_count_mutations always uses tables
  • disable the final "tuning" of mcounts that happens as a result of track_ancestral_counts

Although this is a good clue, it is likely not the solution we need. We cannot disable track_ancestral_counts with just these two changes.

@molpopgen
Copy link
Owner Author

We cannot disable track_ancestral_counts with just these two changes.

The reason why is that we'd need to count mutations from tables after simplification. Currently, we don't do that if there are ancient samples. The track_ancestral_counts does the work in place of the tables.

So, getting closer. Need to enumerate the various behavior combos and make sure we do the right thing. Will probably need more tests.

Fixes #646

Add extensive tests of ancient samples w/ and w/o neutral mutations.

Change test to trigger more failures.

Add run time checks of mutation table consistency.

For final mutation counting, always use tables when simulating neutral
mutations.

After simplification, build recycling queue from simplification results
if simulating neutral mutations.

index_and_count_mutations removes mutations from tables whose counts
are adjusted to zero if reset_treeseqs_to_alive_nodes_after_simplification is true.

Only adjust mcounts_from_preserved_nodes using
last_preserved_generation_counts if
reset_treeseqs_to_alive_nodes_after_simplification is false.

Only adjust mutation counts by last_preserved_generation_counts if
not simulating neutral mutations.
if any. This can arise when preserving nodes very often.
preserve_selected_fixations is false, remove fixations from haploid
genomes.
@molpopgen
Copy link
Owner Author

06a8a41 needs a test, which can be pulled from the example I'm working on for the freq tracking.

final simplification post-simulation. See 06a8a41.
@molpopgen molpopgen marked this pull request as ready for review February 4, 2021 01:28
@molpopgen molpopgen merged commit 4220750 into dev Feb 4, 2021
@molpopgen molpopgen deleted the fix_639_646 branch February 4, 2021 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ancient sample recording with neutral mutations. An experimental feature needs better testing
1 participant