Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple improvements to augur clades #1199

Merged
merged 8 commits into from May 4, 2023
Merged

Conversation

jameshadfield
Copy link
Member

@jameshadfield jameshadfield commented Apr 11, 2023

Contains a bunch of improvements to augur clades which should improve usability quite a bit. See commit messages
for details.

Commit 22e2444 was motivated by discussion with @rneher and @huddlej in #728 regarding labelling clades at the root of the tree. Clades at the root are correctly annotated if either (a) the root reference sequence is supplied in the node-data JSON or (b) the root node has the clade-defining mutations annotated. We do not use mutations observed elsewhere in the tree to infer the root sequence, but we can do this if it would be useful? #1028 is also relevant here.

Checklist

  • Add a message in CHANGES.md summarizing the changes in this PR that are end user focused. Keep headers and formatting consistent with the rest of the file.

Addresses 4/5 of the remaining tasks in #735
Closes #965
Closes #1153
Closes #1154

Our current implementation of read_node_data requires that every
node in the tree is specified in the (merged) node_data files. For
mutations this is overkill -- many nodes don't have mutations and it's
overkill to require node_data JSONs to specify things like
`"node_name": {"muts": []}`.

This may well be the general behaviour we want, but i didn't want to
modify the read_node_data function which sees extensive use.

A welcome side effect of these changes is that we no longer have to
supply both nuc and aa_muts.
See comments in tests/functional/clades.t

Also adds / updates comments and docstrings which were noticed as I
worked through the code relating to these tests.
Workflows may be using this so I elected to hide it rather than remove
it (and warn people it's a no-op if they do happen to be using it)
This function had a few subtle bugs in it which are fixed here, as well
as improving the warning message to explain how this may affect clade
inference.

Note that the presence of sequences on nodes other than the root is
not considered by augur clades.
We could check all of these up-front instead of exiting upon the first
error, and such a check should be part of validation within augur
clades, but this commit is a simple solution to fix a reported bug.

Closes #965
A fatal error is raised if no clades are defined, but if a clade is not
found on the tree it's only a warning.
Suggested in #735
Multiple mutations at the same position on a single branch are now a
fatal error. Previous behaviour was to overwrite such mutations when
parsing. Suggested by #735.
@codecov
Copy link

codecov bot commented Apr 11, 2023

Codecov Report

Patch coverage: 81.94% and project coverage change: +0.15 🎉

Comparison is base (007cb47) 68.66% compared to head (e5cfc3a) 68.81%.

Additional details and impacted files
@@                Coverage Diff                @@
##           branch-labels    #1199      +/-   ##
=================================================
+ Coverage          68.66%   68.81%   +0.15%     
=================================================
  Files                 64       64              
  Lines               6852     6898      +46     
  Branches            1677     1693      +16     
=================================================
+ Hits                4705     4747      +42     
- Misses              1840     1844       +4     
  Partials             307      307              
Impacted Files Coverage Δ
augur/clades.py 91.01% <81.94%> (+0.10%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@jameshadfield jameshadfield requested a review from a team April 11, 2023 23:28
@jameshadfield jameshadfield merged commit dd318ba into branch-labels May 4, 2023
15 checks passed
@jameshadfield jameshadfield deleted the clade-fixes branch May 4, 2023 03:47
victorlin added a commit that referenced this pull request May 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

1 participant