Skip to content

Fix indel position offset in paired-end reverse reads#276

Merged
joshfactorial merged 2 commits intoncsa:developfrom
colinhercus:fix-paired-end-indel-position
May 5, 2026
Merged

Fix indel position offset in paired-end reverse reads#276
joshfactorial merged 2 commits intoncsa:developfrom
colinhercus:fix-paired-end-indel-position

Conversation

@colinhercus
Copy link
Copy Markdown

@colinhercus colinhercus commented May 4, 2026

For read_2 (reverse), the reference_segment is constructed starting padding bases before self.position to allow room for deletions after reverse_complement. apply_mutations was computing the intra-segment index using self.position, which caused mutations to be placed padding (= read_len // 5) bases too early in the segment. After reverse_complement this manifested as indels appearing at a different reference coordinate in read_2 than in read_1.

This results in one false positive indel call for each true positive.

Fix: add segment_start attribute to Read, set to the actual reference coordinate of reference_segment[0]. apply_mutations now subtracts segment_start instead of position when computing the index.

Image shows indel in two positions and at one loci all in the +strand reads and in the other all in the - strand reads.

image

Correction done with assistance from Claude

joshfactorial and others added 2 commits May 3, 2026 11:23
For read_2 (reverse), the reference_segment is constructed starting
`padding` bases before self.position to allow room for deletions after
reverse_complement. apply_mutations was computing the intra-segment
index using self.position, which caused mutations to be placed padding
(= read_len // 5) bases too early in the segment. After reverse_complement
this manifested as indels appearing at a different reference coordinate
in read_2 than in read_1.

Fix: add segment_start attribute to Read, set to the actual reference
coordinate of reference_segment[0]. apply_mutations now subtracts
segment_start instead of position when computing the index.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@colinhercus
Copy link
Copy Markdown
Author

While this pull request fixes the issue a more robust fix would do:

  1. Generate fragment length and extract
  2. Apply mutations
  3. Generate read1 & 2 from either end of the fragment.

This would require no padding and be less prone to bugs

@joshfactorial
Copy link
Copy Markdown
Collaborator

The code looks good. We need some regression tests. I added some on this branch: https://github.com/ncsa/NEAT/tree/colinhercus-fix-paired-end-indel-position - you can check and add those or others to this for the PR.

@joshfactorial joshfactorial self-requested a review May 4, 2026 13:49
@colinhercus
Copy link
Copy Markdown
Author

Thanks for developing NEAT, it's is an excellent read simulator with it's working for us now. We get excellent precision and recall of indels and other variants on all our tests.

Sorry but I don't really have time to do anymore at the moment.

@joshfactorial joshfactorial changed the base branch from main to develop May 5, 2026 12:44
@joshfactorial joshfactorial merged commit 12c0330 into ncsa:develop May 5, 2026
1 check passed
@colinhercus colinhercus deleted the fix-paired-end-indel-position branch May 6, 2026 01:21
@colinhercus
Copy link
Copy Markdown
Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants