Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix titer substitution model tree annotations #1555

Merged
merged 3 commits into from
Jul 24, 2024

Conversation

huddlej
Copy link
Contributor

@huddlej huddlej commented Jul 23, 2024

Description of proposed changes

Fixes a bug in the augur titers sub tree annotations in the JSON output where antigenic weights were assigned per branch for the opposite substitution values than expected. Specifically, the bug caused each branch to get the antigenic weight associated with substitutions in the form of <derived allele><position><ancestral allele> instead of <ancestral allele><position><derived allele>. For example, when the substitution model found antigenic weights for HA1 substitutions N193S and S193N, the tree annotations would have assigned the N193S weight to branches with the S193N substitution.

This commit expands an existing functional test to check for a data-specific antigenic weight. After confirming this test failed, I fixed the bug and confirmed that the test passed.

Note that this bug dates back to December 2018 when I first added the code to annotate the tree with the substitution model weights. This bug had no impact on the accuracy of the underlying model itself, so we would never have noticed it without manually comparing the titer drops in the tree to the "substitution" array in the JSON output.

I found this issue today when I noticed:

  1. an antigenic weight of 2.93 associated with HA1:G155E in an H1N1pdm HA dataset
  2. a weight of 0.59 associated with the opposite mutation HA1:E155G
  3. nodes in the tree with HA1:G155E annotated with the smaller weight

The following screenshot shows what the cumulative antigenic advance from the substitution model looks like for the H1 HA dataset mentioned above before the bug is fixed (note small advance for nodes with HA1:155E):

image 2

This screenshot shows what the advance looks like after the bug is fixed (note larger advance for 155E nodes):

image

As another point of comparison, here is the scatterplot of the antigenic advance from the tree model on the x-axis and the substitution model on the y-axis with the bug. Note the low correlation value between the data.

image 3

Here is the updated scatterplot view after fixing the bug with a much higher correlation between the models.

image 4

Checklist

  • Automated checks pass
  • Check if you need to add a changelog message
  • Check if you need to add tests
  • Check if you need to update docs

Fixes a bug in the `augur titers sub` tree annotations in the JSON
output where antigenic weights were assigned per branch for the opposite
substitution values than expected. Specifically, the bug caused each
branch to get the antigenic weight associated with substitutions in the
form of "<derived allele><position><ancestral allele>" instead of
"<ancestral allele><position><derived allele>". For example, when the
substitution model found antigenic weights for HA1 substitutions N193S
and S193N, the tree annotations would have assigned the N193S weight to
branches with the S193N substitution.

This commit expands an existing functional test to check for a
data-specific antigenic weight. After confirming this test failed, I
fixed the bug and confirmed that the test passed.

Note that this bug dates back to December 2018 when I first added the
code to annotate the tree with the substitution model weights [1]. This bug
had no impact on the accuracy of the underlying model itself, so we
would never have noticed it without manually comparing the titer drops
in the tree to the "substitution" array in the JSON output.

[1] 6721b7d#diff-96cfc90794aa092cbb7b8577ca92d5757a777325d78078da3761ea26ee4c5956R67
@huddlej huddlej requested a review from rneher July 23, 2024 22:12
Copy link

codecov bot commented Jul 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.22%. Comparing base (be94e50) to head (4846fbb).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1555   +/-   ##
=======================================
  Coverage   70.22%   70.22%           
=======================================
  Files          74       74           
  Lines        7952     7952           
  Branches     1945     1945           
=======================================
  Hits         5584     5584           
  Misses       2082     2082           
  Partials      286      286           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Co-authored-by: Thomas Sibley <tsibley@fredhutch.org>
Copy link
Member

@rneher rneher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, John! Good catch.

@huddlej huddlej merged commit bc5ea54 into master Jul 24, 2024
28 checks passed
@huddlej huddlej deleted the fix-titer-sub-tree-annotations branch July 24, 2024 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants