Fix 6 effect-annotation bugs, add 32 regression tests#265
Merged
Conversation
Six reported bugs and one visibility improvement: * #250 / #205: SNV in the stop codon classified as Insertion instead of StopLoss when the 3' UTR begins with a stop codon. Fixed by handling first_utr_stop_codon_index == 0 in translate_in_frame_mutation so using_three_prime_utr is set correctly. * #201: An insertion before the stop codon that produces a protein identical to the reference was reported as an Insertion. Added a silent check in predict_in_frame_coding_effect: when a premature stop is introduced and the new amino acids match the reference tail, the mutation is reported as Silent. * #208: Silent.aa_pos was off by one because aa_mutation_start_offset had been incremented past the shared prefix. The Silent constructor now receives aa_mutation_start_offset - n_aa_shared. * #88: VCF symbolic alleles (<DEL>, <CN0>, <INS:ME:ALU>, ...) and breakend notation (G]17:198982]) caused load_vcf to crash. They are now skipped with a visible warnings.warn call that reports the count and points at the tracking issue (#264) for structural variant support. * #216: PrematureStop.short_description now uses HGVS "p.{pos}ins{alt}*" notation when aa_ref is empty (the stop was introduced by an insertion) instead of the ambiguous "p.{pos}*". * #217: Silent.short_description now conforms to the HGVS standard, returning "p.{aa_ref}{pos}=" instead of the literal string "silent". (The existing MAF test is updated to match.) * #227: EffectCollection is now sorted by effect priority descending by default, so the first element of the collection is the most severe effect. Users can pass sort_key=False to disable sorting or their own callable to override. 32 new regression tests across six new test files; 410 tests pass (was 378 before this change), zero regressions.
iskandr
added a commit
that referenced
this pull request
Apr 12, 2026
Major release: several backward-incompatible fixes landed in PRs #263 and #265. Breaking changes: * Silent.short_description now returns HGVS "p.{ref}{pos}=" (e.g. "p.R6=") instead of the literal "silent" (#217). * Silent.aa_pos no longer includes the shared-prefix offset; it now points at the actual synonymous codon (#208). Callers that compensated for the off-by-one need to stop doing so. * PrematureStop.short_description returns "p.{pos}ins{alt}*" when aa_ref is empty instead of the ambiguous "p.{pos}{alt}*" (#216). * EffectCollection is now sorted by effect priority (most severe first) by default. Pass sort_key=False to disable sorting or a custom callable to override (#227). * Sequence-aware splice site classification (#262): variants at +1/+2 or -1/-2 with a non-canonical reference base are now classified as IntronicSpliceSite rather than SpliceDonor/SpliceAcceptor. Other user-visible changes: * VCF loader now skips symbolic alleles (<DEL>, <CN0>, etc.) and breakend notation with a warnings.warn call instead of crashing (#88, tracked for full support in #264). * SNV in the stop codon with a stop-prefixed 3' UTR is now correctly classified as StopLoss instead of Insertion (#250, #205). * Insertion before the stop codon that produces an identical protein is now correctly classified as Silent (#201).
This was referenced Apr 12, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes or substantially addresses six reported bugs plus one visibility improvement, with 32 new regression tests (378 → 410 passing, zero regressions).
Classification bugs (wrong effect type returned)
Insertion(p.XinsR)instead ofStopLosswhen the 3' UTR begins with another stop codon. Root cause:translate_in_frame_mutationhad branches forfirst_utr_stop_codon_index > 0and== -1but not== 0, sousing_three_prime_utrstayed False and the StopLoss branch was never taken.ATATAAbefore the stop codon is reported asInsertion(p.1291insI)even though the resulting protein is identical to the reference (the inserted `TAA` creates a new stop at the same protein position). New silent check: if a premature stop is introduced and the new amino acids match the reference tail, classify as Silent.Position / format bugs
Loader robustness
`, ``, `INS:ME:ALU`) or breakend notation (`G]17:198982]`) caused `load_vcf` to crash. These are now skipped with a visible `warnings.warn` call that reports the total skipped count per file and points at the tracking issue (Support VCF symbolic alleles and breakends as extended variant types #264) for eventual structural variant support.API ergonomics
Test plan
Related