Fix indel position offset in paired-end reverse reads#276
Merged
joshfactorial merged 2 commits intoncsa:developfrom May 5, 2026
Merged
Fix indel position offset in paired-end reverse reads#276joshfactorial merged 2 commits intoncsa:developfrom
joshfactorial merged 2 commits intoncsa:developfrom
Conversation
For read_2 (reverse), the reference_segment is constructed starting `padding` bases before self.position to allow room for deletions after reverse_complement. apply_mutations was computing the intra-segment index using self.position, which caused mutations to be placed padding (= read_len // 5) bases too early in the segment. After reverse_complement this manifested as indels appearing at a different reference coordinate in read_2 than in read_1. Fix: add segment_start attribute to Read, set to the actual reference coordinate of reference_segment[0]. apply_mutations now subtracts segment_start instead of position when computing the index. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
|
While this pull request fixes the issue a more robust fix would do:
This would require no padding and be less prone to bugs |
Collaborator
|
The code looks good. We need some regression tests. I added some on this branch: https://github.com/ncsa/NEAT/tree/colinhercus-fix-paired-end-indel-position - you can check and add those or others to this for the PR. |
Author
|
Thanks for developing NEAT, it's is an excellent read simulator with it's working for us now. We get excellent precision and recall of indels and other variants on all our tests. Sorry but I don't really have time to do anymore at the moment. |
Author
|
Thanks! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For read_2 (reverse), the reference_segment is constructed starting
paddingbases before self.position to allow room for deletions after reverse_complement. apply_mutations was computing the intra-segment index using self.position, which caused mutations to be placed padding (= read_len // 5) bases too early in the segment. After reverse_complement this manifested as indels appearing at a different reference coordinate in read_2 than in read_1.This results in one false positive indel call for each true positive.
Fix: add segment_start attribute to Read, set to the actual reference coordinate of reference_segment[0]. apply_mutations now subtracts segment_start instead of position when computing the index.
Image shows indel in two positions and at one loci all in the +strand reads and in the other all in the - strand reads.
Correction done with assistance from Claude