What's new
Resolves #169: three-tier protein-coding biotype ontology.
pyensembl.Gene / pyensembl.Transcript now expose three layered flags for "does this entry make a polypeptide?":
| Flag | Includes |
|---|---|
is_protein_coding (unchanged) |
strict canonical protein_coding only |
is_protein_coding_extended (new) |
+ IG_{C,D,J,V}_gene, TR_{C,D,J,V}_gene, polymorphic_pseudogene, translated_{processed,unprocessed}_pseudogene |
is_translated (new) |
+ nonsense_mediated_decay, non_stop_decay |
The strict tier is unchanged so downstream effect predictors like varcode keep their existing behavior. Use is_protein_coding_extended when you want IG/TR gene segments and translated pseudogenes (e.g. immunology workflows). Use is_translated when you only care about ribosome occupancy regardless of stable expression (e.g. peptide search, top-variant-effect picking).
The underlying biotype sets are exported as PROTEIN_CODING_BIOTYPES, EXTENDED_PROTEIN_CODING_BIOTYPES, TRANSLATED_BIOTYPES from pyensembl.locus_with_genome for callers who want to derive their own categorization.
Full Changelog: v2.10.0...v2.10.1