Skip to content

Fix Transcript.exons crash when GTF lacks exon_id attribute#331

Merged
iskandr merged 1 commit intomainfrom
fix-exon-id-missing
Apr 21, 2026
Merged

Fix Transcript.exons crash when GTF lacks exon_id attribute#331
iskandr merged 1 commit intomainfrom
fix-exon-id-missing

Conversation

@iskandr
Copy link
Copy Markdown
Contributor

@iskandr iskandr commented Apr 21, 2026

Summary

  • Some GTFs (Ensembl release 54 and earlier, plus non-Ensembl GTFs) omit the exon_id attribute. pyensembl's installer already treats that column as optional (see `database.py:134`), but `Transcript.exons` still unconditionally SELECTed exon_id, so any call on such a genome crashed with:

    sqlite3.OperationalError: no such column: exon_id
    

    This was hit from pirlygenes: FN1 tests call `transcript.exons` and the local pyensembl cache (old release) has the exon table without the exon_id column.

  • `Transcript.exons` now checks `db.column_exists("exon", "exon_id")` and falls back to constructing Exon objects directly from the exon row, with a synthesized per-transcript ID of the form `"<transcript_id>exon"`.

  • Exon objects returned from the fallback path carry the real contig/start/end/strand/gene coordinates from the GTF; only the id is synthetic.

  • Regression test builds a minimal Ensembl-style GTF with `exon_number` but no `exon_id` and verifies exon ordering and synthesized IDs.

  • Bumps to 2.6.7.

Test plan

  • New `test_transcript_exons_without_exon_id` passes.
  • Full local suite (120 tests excluding HLA-A data-drift ones) passes.
  • CI green on Python 3.9–3.12 (exercises the normal path — release 75/77/93 GTFs all have exon_id).

Ensembl release 54 and some non-Ensembl GTFs (e.g. UCSC refseq/gencode)
omit the exon_id attribute. pyensembl's installer already treats the
column as optional (database.py:134), but Transcript.exons still
unconditionally SELECTed exon_id, crashing with
sqlite3.OperationalError: no such column: exon_id.

Transcript.exons now checks db.column_exists("exon", "exon_id") and
falls back to building Exon objects directly from the exon row with a
synthesized per-transcript ID of the form "<transcript_id>_exon_<n>".

Adds a regression test that builds an Ensembl-style GTF with
exon_number but no exon_id and verifies both exon ordering and
synthesized IDs.

Bumps to 2.6.7.
@coveralls
Copy link
Copy Markdown

Coverage Status

coverage: 83.468% (+0.3%) from 83.208% — fix-exon-id-missing into main

@iskandr iskandr merged commit a35f787 into main Apr 21, 2026
10 checks passed
@iskandr iskandr deleted the fix-exon-id-missing branch April 21, 2026 16:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants