Skip to content

feat(gedi): add gedi/indexgenome and gedi/price modules#11693

Merged
apeltzer merged 10 commits into
nf-core:masterfrom
pinin4fjords:gedi-add-modules
May 19, 2026
Merged

feat(gedi): add gedi/indexgenome and gedi/price modules#11693
apeltzer merged 10 commits into
nf-core:masterfrom
pinin4fjords:gedi-add-modules

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords commented May 19, 2026

Adds two new modules wrapping the GEDI / PRICE toolkit for Ribo-seq translated-ORF discovery (Erhard et al. 2018, doi:10.1038/nmeth.4631).

gedi/indexgenome

Wraps gedi -e IndexGenome to build the .oml GEDI genome index from a FASTA + GTF pair.

  • Input: tuple val(meta), path(fasta), path(gtf)
  • Output: tuple val(meta), path("${prefix}") (emit: index) — directory containing ${meta.id}.oml + sidecars; directory name defaults to meta.id, overridable via task.ext.prefix
  • Plus topic: versions

gedi/price

Wraps bamlist2cit + gedi -e Price. Takes a cohort of Ribo-seq BAMs plus the GEDI genome index and emits PRICE's ORF predictions. PRICE estimates a shared codon-position model across all input BAMs, so the cohort is processed as a single invocation (not per-sample).

  • Inputs:
    • tuple val(meta), path(bams, stageAs: 'bams/*'), path(bais, stageAs: 'bams/*') — Ribo-seq cohort
    • tuple val(meta2), path(index) — directory from gedi/indexgenome
  • Emits: orfs_tsv plus optional orfs_cit / orfs_metadata / codons_cit / model / signal / param sidecars, and topic: versions

Containers

Wave-built community containers backed by bioconda::gedi=1.0.6a:

  • community.wave.seqera.io/library/gedi_indexgenome:cfca16738f306c86
  • community.wave.seqera.io/library/gedi_price:2392624d5f803049

Test data

gedi/price is exercised against a minimal Ribo-seq cohort (four samples, chr19+chr22, protein-coding-only reference) at data/genomics/homo_sapiens/riboseq_expression/price/ (added via nf-core/test-datasets#2061). Every fixture file < 4 MiB, ~11 MB total. PRICE produces 380 ORF calls; the test snapshot captures the orfs.tsv line count for stability.

Test plan

  • nf-core modules test --profile docker gedi/indexgenome — green, two-pass stable
  • nf-core modules test --profile docker gedi/price — green, two-pass stable
  • nf-core modules lint --dir . gedi/{indexgenome,price} — 0 failures (warnings limited to Wave URL probe / tag-version cosmetic, same as other Wave-container modules)

Source: nf-core/riboseq#174.

pinin4fjords and others added 2 commits May 19, 2026 12:10
Adds two modules wrapping the GEDI / PRICE toolkit (`bioconda::gedi=1.0.6a`) for Ribo-seq translated-ORF discovery. PRICE (Erhard et al. 2018, doi:10.1038/nmeth.4631) calls translated ORFs from ribosome profiling data with near-cognate start codon detection.

`gedi/indexgenome` wraps `gedi -e IndexGenome`, producing the `.oml` genome index directory consumed by PRICE.

`gedi/price` wraps `bamlist2cit` + `gedi -e Price`, taking a cohort of Ribo-seq BAMs plus the genome index and emitting ORF predictions (`*.orfs.tsv` + `*.cit` + sidecars). One-shot across the cohort - PRICE is not per-sample.

Both modules use Wave-built community containers from `bioconda::gedi=1.0.6a`. The bioconda recipe was merged 2026-05-16; using Wave directly for now.

Source: nf-core/riboseq#174.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Previously hard-coded the output directory as `price_index`. Switching to
`${prefix}` (default `${meta.id}`, overridable via `task.ext.prefix`) lets
callers control the directory name and matches the nf-core convention for
publishable directory outputs.

The default ${meta.id} keeps the directory keyed to the reference id, so
when `gedi/price` opens `${index}/${meta2.id}.oml`, the lookup still
resolves provided meta ids match (already the case in the test chain).

Snapshot regenerated: the index directory name in the output snapshot
changes from `price_index` to the test's `homo_sapiens_chr20` (its meta.id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
pinin4fjords and others added 8 commits May 19, 2026 16:54
Replaces the stub-only PRICE test with an end-to-end test that runs
PRICE on a minimal cohort of four Ribo-seq samples (chr19+chr22,
protein-coding-only reference). The cohort produces 380 ORF calls;
snapshot captures the orfs.tsv line count for stability validation.

Fixtures published in nf-core/test-datasets PR nf-core#2061.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…factor

The earlier `${prefix}` refactor (commit 0ca4c45) changed the index
output declaration from `path("price_index")` to `path("${prefix}")`,
but the meta.yml output entry still hard-coded `price_index` — causing
CI lint to flag `correct_meta_outputs: Module meta.yml does not match
main.nf`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… emit

After the `${prefix}` refactor (commit 0ca4c45) the index output line
was the only `tuple val(meta), path(...)` emit in the module, so the
52-space alignment padding it kept from when the path was `price_index`
no longer aligns with anything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two cross-cutting fixes from review of nf-core#11693:

- Licence was Apache-2.0 in both meta.yml files; the upstream repo
  erhard-lab/gedi is GPL-3.0. Corrected.
- "GEDI (Gene Expression Data Integration)" was unverified — the
  upstream README/wiki/paper don't expand the acronym that way.
  Replaced with the upstream one-liner phrasing. PRICE meta.yml also
  adds the verified PRICE expansion (Probabilistic Inference of Codon
  Activities by an EM algorithm) from the GEDI wiki.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
nf-core/test-datasets#2061 merged; fixtures now live on the modules branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pinin4fjords pinin4fjords marked this pull request as ready for review May 19, 2026 16:54
@apeltzer apeltzer added this pull request to the merge queue May 19, 2026
Merged via the queue into nf-core:master with commit 51af5cb May 19, 2026
30 checks passed
manascripts pushed a commit to manascripts/modules that referenced this pull request May 21, 2026
* feat(gedi): add gedi/indexgenome and gedi/price modules

Adds two modules wrapping the GEDI / PRICE toolkit (`bioconda::gedi=1.0.6a`) for Ribo-seq translated-ORF discovery. PRICE (Erhard et al. 2018, doi:10.1038/nmeth.4631) calls translated ORFs from ribosome profiling data with near-cognate start codon detection.

`gedi/indexgenome` wraps `gedi -e IndexGenome`, producing the `.oml` genome index directory consumed by PRICE.

`gedi/price` wraps `bamlist2cit` + `gedi -e Price`, taking a cohort of Ribo-seq BAMs plus the genome index and emitting ORF predictions (`*.orfs.tsv` + `*.cit` + sidecars). One-shot across the cohort - PRICE is not per-sample.

Both modules use Wave-built community containers from `bioconda::gedi=1.0.6a`. The bioconda recipe was merged 2026-05-16; using Wave directly for now.

Source: nf-core/riboseq#174.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* refactor(gedi/indexgenome): use ${prefix} for the index output directory

Previously hard-coded the output directory as `price_index`. Switching to
`${prefix}` (default `${meta.id}`, overridable via `task.ext.prefix`) lets
callers control the directory name and matches the nf-core convention for
publishable directory outputs.

The default ${meta.id} keeps the directory keyed to the reference id, so
when `gedi/price` opens `${index}/${meta2.id}.oml`, the lookup still
resolves provided meta ids match (already the case in the test chain).

Snapshot regenerated: the index directory name in the output snapshot
changes from `price_index` to the test's `homo_sapiens_chr20` (its meta.id).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gedi/price): add real test using minimised chr19+chr22 fixtures

Replaces the stub-only PRICE test with an end-to-end test that runs
PRICE on a minimal cohort of four Ribo-seq samples (chr19+chr22,
protein-coding-only reference). The cohort produces 380 ORF calls;
snapshot captures the orfs.tsv line count for stability validation.

Fixtures published in nf-core/test-datasets PR nf-core#2061.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gedi/indexgenome): update meta.yml output name after ${prefix} refactor

The earlier `${prefix}` refactor (commit 0ca4c45) changed the index
output declaration from `path("price_index")` to `path("${prefix}")`,
but the meta.yml output entry still hard-coded `price_index` — causing
CI lint to flag `correct_meta_outputs: Module meta.yml does not match
main.nf`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* style(gedi/indexgenome): collapse leftover alignment padding on index emit

After the `${prefix}` refactor (commit 0ca4c45) the index output line
was the only `tuple val(meta), path(...)` emit in the module, so the
52-space alignment padding it kept from when the path was `price_index`
no longer aligns with anything.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(gedi): correct licence (GPL-3.0) and Gedi description in meta.yml

Two cross-cutting fixes from review of nf-core#11693:

- Licence was Apache-2.0 in both meta.yml files; the upstream repo
  erhard-lab/gedi is GPL-3.0. Corrected.
- "GEDI (Gene Expression Data Integration)" was unverified — the
  upstream README/wiki/paper don't expand the acronym that way.
  Replaced with the upstream one-liner phrasing. PRICE meta.yml also
  adds the verified PRICE expansion (Probabilistic Inference of Codon
  Activities by an EM algorithm) from the GEDI wiki.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test(gedi/price): point fixtures at nf-core/test-datasets@modules

nf-core/test-datasets#2061 merged; fixtures now live on the modules branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants