-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix titer substitution model tree annotations #1555
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Fixes a bug in the `augur titers sub` tree annotations in the JSON output where antigenic weights were assigned per branch for the opposite substitution values than expected. Specifically, the bug caused each branch to get the antigenic weight associated with substitutions in the form of "<derived allele><position><ancestral allele>" instead of "<ancestral allele><position><derived allele>". For example, when the substitution model found antigenic weights for HA1 substitutions N193S and S193N, the tree annotations would have assigned the N193S weight to branches with the S193N substitution. This commit expands an existing functional test to check for a data-specific antigenic weight. After confirming this test failed, I fixed the bug and confirmed that the test passed. Note that this bug dates back to December 2018 when I first added the code to annotate the tree with the substitution model weights [1]. This bug had no impact on the accuracy of the underlying model itself, so we would never have noticed it without manually comparing the titer drops in the tree to the "substitution" array in the JSON output. [1] 6721b7d#diff-96cfc90794aa092cbb7b8577ca92d5757a777325d78078da3761ea26ee4c5956R67
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #1555 +/- ##
=======================================
Coverage 70.22% 70.22%
=======================================
Files 74 74
Lines 7952 7952
Branches 1945 1945
=======================================
Hits 5584 5584
Misses 2082 2082
Partials 286 286 ☔ View full report in Codecov by Sentry. |
genehack
approved these changes
Jul 23, 2024
tsibley
reviewed
Jul 23, 2024
Co-authored-by: Thomas Sibley <tsibley@fredhutch.org>
rneher
reviewed
Jul 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, John! Good catch.
rneher
approved these changes
Jul 24, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description of proposed changes
Fixes a bug in the
augur titers sub
tree annotations in the JSON output where antigenic weights were assigned per branch for the opposite substitution values than expected. Specifically, the bug caused each branch to get the antigenic weight associated with substitutions in the form of<derived allele><position><ancestral allele>
instead of<ancestral allele><position><derived allele>
. For example, when the substitution model found antigenic weights for HA1 substitutions N193S and S193N, the tree annotations would have assigned the N193S weight to branches with the S193N substitution.This commit expands an existing functional test to check for a data-specific antigenic weight. After confirming this test failed, I fixed the bug and confirmed that the test passed.
Note that this bug dates back to December 2018 when I first added the code to annotate the tree with the substitution model weights. This bug had no impact on the accuracy of the underlying model itself, so we would never have noticed it without manually comparing the titer drops in the tree to the "substitution" array in the JSON output.
I found this issue today when I noticed:
The following screenshot shows what the cumulative antigenic advance from the substitution model looks like for the H1 HA dataset mentioned above before the bug is fixed (note small advance for nodes with HA1:155E):
This screenshot shows what the advance looks like after the bug is fixed (note larger advance for 155E nodes):
As another point of comparison, here is the scatterplot of the antigenic advance from the tree model on the x-axis and the substitution model on the y-axis with the bug. Note the low correlation value between the data.
Here is the updated scatterplot view after fixing the bug with a much higher correlation between the models.
Checklist