Skip to content

Fix 6 effect-annotation bugs, add 32 regression tests#265

Merged
iskandr merged 1 commit intomainfrom
fix-classification-bugs
Apr 12, 2026
Merged

Fix 6 effect-annotation bugs, add 32 regression tests#265
iskandr merged 1 commit intomainfrom
fix-classification-bugs

Conversation

@iskandr
Copy link
Copy Markdown
Contributor

@iskandr iskandr commented Apr 12, 2026

Summary

Closes or substantially addresses six reported bugs plus one visibility improvement, with 32 new regression tests (378 → 410 passing, zero regressions).

Classification bugs (wrong effect type returned)

  • Insertion effect from SNV? #250 — SNV in the stop codon reported as Insertion(p.XinsR) instead of StopLoss when the 3' UTR begins with another stop codon. Root cause: translate_in_frame_mutation had branches for first_utr_stop_codon_index > 0 and == -1 but not == 0, so using_three_prime_utr stayed False and the StopLoss branch was never taken.
  • Stoploss confused as insert #205 — Same root cause as Insertion effect from SNV? #250 on a different transcript.
  • Synonymous Insertion before stop codon annotated as non silent insert #201 — Insertion of ATATAA before the stop codon is reported as Insertion(p.1291insI) even though the resulting protein is identical to the reference (the inserted `TAA` creates a new stop at the same protein position). New silent check: if a premature stop is introduced and the new amino acids match the reference tail, classify as Silent.

Position / format bugs

Loader robustness

API ergonomics

Test plan

Related

Six reported bugs and one visibility improvement:

* #250 / #205: SNV in the stop codon classified as Insertion instead of
  StopLoss when the 3' UTR begins with a stop codon. Fixed by handling
  first_utr_stop_codon_index == 0 in translate_in_frame_mutation so
  using_three_prime_utr is set correctly.

* #201: An insertion before the stop codon that produces a protein
  identical to the reference was reported as an Insertion. Added a
  silent check in predict_in_frame_coding_effect: when a premature stop
  is introduced and the new amino acids match the reference tail, the
  mutation is reported as Silent.

* #208: Silent.aa_pos was off by one because aa_mutation_start_offset
  had been incremented past the shared prefix. The Silent constructor
  now receives aa_mutation_start_offset - n_aa_shared.

* #88: VCF symbolic alleles (<DEL>, <CN0>, <INS:ME:ALU>, ...) and
  breakend notation (G]17:198982]) caused load_vcf to crash. They are
  now skipped with a visible warnings.warn call that reports the count
  and points at the tracking issue (#264) for structural variant
  support.

* #216: PrematureStop.short_description now uses HGVS "p.{pos}ins{alt}*"
  notation when aa_ref is empty (the stop was introduced by an
  insertion) instead of the ambiguous "p.{pos}*".

* #217: Silent.short_description now conforms to the HGVS standard,
  returning "p.{aa_ref}{pos}=" instead of the literal string "silent".
  (The existing MAF test is updated to match.)

* #227: EffectCollection is now sorted by effect priority descending by
  default, so the first element of the collection is the most severe
  effect. Users can pass sort_key=False to disable sorting or their own
  callable to override.

32 new regression tests across six new test files; 410 tests pass
(was 378 before this change), zero regressions.
@iskandr iskandr merged commit e3f2716 into main Apr 12, 2026
6 checks passed
iskandr added a commit that referenced this pull request Apr 12, 2026
Major release: several backward-incompatible fixes landed in PRs #263
and #265.

Breaking changes:

* Silent.short_description now returns HGVS "p.{ref}{pos}=" (e.g.
  "p.R6=") instead of the literal "silent" (#217).
* Silent.aa_pos no longer includes the shared-prefix offset; it now
  points at the actual synonymous codon (#208). Callers that
  compensated for the off-by-one need to stop doing so.
* PrematureStop.short_description returns "p.{pos}ins{alt}*" when
  aa_ref is empty instead of the ambiguous "p.{pos}{alt}*" (#216).
* EffectCollection is now sorted by effect priority (most severe first)
  by default. Pass sort_key=False to disable sorting or a custom
  callable to override (#227).
* Sequence-aware splice site classification (#262): variants at +1/+2
  or -1/-2 with a non-canonical reference base are now classified as
  IntronicSpliceSite rather than SpliceDonor/SpliceAcceptor.

Other user-visible changes:

* VCF loader now skips symbolic alleles (<DEL>, <CN0>, etc.) and
  breakend notation with a warnings.warn call instead of crashing
  (#88, tracked for full support in #264).
* SNV in the stop codon with a stop-prefixed 3' UTR is now correctly
  classified as StopLoss instead of Insertion (#250, #205).
* Insertion before the stop codon that produces an identical protein
  is now correctly classified as Silent (#201).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant