feat: add GPU support to ribodetector (dual-container) by pinin4fjords · Pull Request #11178 · nf-core/modules

pinin4fjords · 2026-04-14T09:10:06Z

Summary

Add GPU container switching to ribodetector: when task.accelerator is set, uses a CUDA-enabled container with pytorch-gpu; otherwise uses the existing CPU container
CPU users are unaffected (same ~546 MB container), GPU users get a ~3.5 GB container with CUDA
Both conda and container directives use ternaries based on task.accelerator
Fix a latent GString interpolation bug in the memory flag ($task.memory.toGiga() -> ${task.memory.toGiga()})
Fix stub gzip syntax for nf-core lint compliance
Add GPU-tagged tests (real + stub) following the parabricks test pattern
Improve both CPU and GPU tests: use nft-fastq plugin for order-independent sorted read name comparison, log content assertions for classification counts (4159 non-rRNA, 0 rRNA), ANSI color stripping

Container size

The GPU container is ~3.5 GB vs ~546 MB for the CPU container. The size comes from pytorch-gpu bundling the CUDA toolkit, cuDNN, and GPU-specific pytorch builds. With the dual-container approach, CPU-only users are unaffected.

GPU environment

The environment.gpu.yml uses conda-forge::pytorch-gpu=1.11.0. The CPU environment.yml is unchanged.

ARM limitation

The GPU container is x86_64 only. This is a conda packaging limitation, not a hardware one (ARM + NVIDIA GPU setups like Jetson do exist):

All conda-forge ARM pytorch-gpu builds (oldest available is 2.1.2) depend on the __cuda virtual package, which requires CUDA drivers to be present at conda solve time. Wave's micromamba base image does not provide this, so the solve fails.
The x86 pytorch-gpu=1.11.0 works because it predates the __cuda convention. It depends on cudatoolkit instead, which is a regular installable package that gets bundled into the container.
Newer x86 builds (2.5.1+) also require __cuda and would fail the same way. The version pin to 1.11.0 is not just for reproducibility - it's the newest x86 version that avoids the __cuda gate.
The pytorch channel packages avoid __cuda entirely (they use pytorch-cuda and pytorch-mutex real packages instead), but the pytorch channel has no ARM builds at all.
We tested CONDA_OVERRIDE_CUDA=12.4 (which tells conda to pretend CUDA is present) but Wave does not support injecting environment variables at solve time, only at container runtime (--config-env).

Pipelines using this module should guard GPU ribodetector on ARM, as nf-core/rnaseq does:

if (params.use_gpu_ribodetector && (params.arm ?: false)) {
    error("--use_gpu_ribodetector is not supported on ARM architecture.")
}

Resolution paths:

Wave adds support for CONDA_OVERRIDE_CUDA at build/solve time
Wave provides CUDA-aware base images
conda-forge drops the __cuda requirement for ARM pytorch-gpu (unlikely, it serves a purpose)

Rationale

The ribodetector module already selects between GPU (ribodetector) and CPU (ribodetector_cpu) binaries based on task.accelerator, but the existing container only includes CPU pytorch, making the GPU binary fail with "CUDA unavailable". This PR adds the missing GPU container and wires up the selection.

Related pipeline PR: nf-core/rnaseq#1790

Test plan

CPU tests pass (conda + docker, verified locally and in CI)
nf-core lint passes
GPU stub tests pass (docker + singularity)
GPU real tests run ribodetector on NVIDIA GPU hardware with correct classification counts
GPU and CPU produce identical classification results (same read counts: 4159 non-rRNA, 0 rRNA)

🤖 Generated with Claude Code

Switch between GPU and CPU containers based on task.accelerator. CPU users keep the existing ~546 MB container; GPU users get a ~3.5 GB container with CUDA-enabled PyTorch from conda-forge::pytorch-gpu. Module changes: - Container ternary: switches between GPU and CPU containers for both Docker and Singularity based on task.accelerator - Conda ternary: environment.gpu.yml (pytorch-gpu) when GPU, environment.yml (unchanged) when CPU - Fix GString bug: $task.memory.toGiga() -> ${task.memory.toGiga()} - Fix stub gzip syntax for nf-core lint compliance GPU environment uses only standard conda-forge + bioconda channels (conda-forge::pytorch-gpu=1.11.0, built from the pytorch-cpu-feedstock as a GPU output variant). Test improvements (CPU and GPU, identical assertions): - nft-fastq plugin for order-independent sorted read name comparison - Log content assertions for classification counts (4159 non-rRNA, 0 rRNA) with ANSI color stripping - GPU tests tagged with "gpu" following the parabricks pattern Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

pinin4fjords · 2026-04-14T14:11:22Z

Thanks @jfy133 !

Add guidance for nf-core modules that support GPU acceleration: - Software requirements: dual-container pattern using task.accelerator ternaries for both conda and container directives, environment.gpu.yml for GPU deps, ARM limitation note - Resource requirements: GPU acceleration via task.accelerator, pipeline controls allocation not the module - Testing: GPU test file conventions (main.gpu.nf.test), tagging with "gpu", nextflow.gpu.config, nft-fastq plugin for non-deterministic output, same assertions for GPU and CPU Based on patterns established in nf-core/modules#11178 (ribodetector GPU support) and existing parabricks conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add guidance for nf-core modules that support GPU acceleration, based on patterns from ribodetector (nf-core/modules#11178) and parabricks. Software requirements: - Three container approaches (dual-container, single, vendor-provided) - Dual-container ternary pattern for conda + container directives - environment.gpu.yml convention - Conda/mamba guard for vendor containers - Binary selection and multi-GPU via task.accelerator - ARM __cuda limitation note Resource requirements: - task.accelerator for GPU detection (module reads, pipeline sets) - Tip pointing to pipeline-side containerOptions pattern Testing: - GPU test file conventions (main.gpu.nf.test, nextflow.gpu.config) - gpu and gpu_highmem tags for CI runner selection - Same assertions for GPU and CPU to catch divergence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace local patch with the merged upstream module that includes the dual-container GPU support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

) Add use_gpu_ribodetector boolean parameter. When enabled, requests a GPU accelerator and applies container GPU flags for the RIBODETECTOR process. Module (from nf-core/modules#11178): - Dual-container: GPU container (conda-forge::pytorch-gpu, ~3.5 GB) when task.accelerator is set, CPU container (~546 MB) otherwise - Both conda and container directives use ternaries on task.accelerator - Fix GString bug: $task.memory.toGiga() -> ${task.memory.toGiga()} Pipeline: - Add use_gpu_ribodetector param with accelerator + containerOptions closures in the RIBODETECTOR process config - Add validation: errors on ARM or wrong ribo_removal_tool - Fix .nftignore pattern for ribodetector log files - Add GPU test (real + stub) tagged "gpu", skip_bbsplit for determinism - Generalize CI skip from SKIP_PARABRICKS to SKIP_GPU (backward compat) Closes #1780 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Resolve modules.json conflict: take dev updates, restore ribodetector SHA from nf-core/modules#11178. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot added the size/s label Apr 14, 2026

This was referenced Apr 14, 2026

feat: switch ribodetector to GPU-capable container #11177

Closed

fix: align GPU CI sharding tags with detection tags #11179

Merged

feat: add GPU support for ribodetector nf-core/rnaseq#1790

Merged

github-actions Bot added size/m and removed size/s labels Apr 14, 2026

pinin4fjords closed this Apr 14, 2026

pinin4fjords force-pushed the ribodetector-gpu-dual-container branch from ea874d4 to 042b648 Compare April 14, 2026 11:14

pinin4fjords reopened this Apr 14, 2026

github-actions Bot added size/xl and removed size/m labels Apr 14, 2026

pinin4fjords force-pushed the ribodetector-gpu-dual-container branch from 68e09d8 to 88816cb Compare April 14, 2026 12:53

Merge branch 'master' into ribodetector-gpu-dual-container

ab1629e

pinin4fjords marked this pull request as ready for review April 14, 2026 13:01

jfy133 approved these changes Apr 14, 2026

View reviewed changes

pinin4fjords added this pull request to the merge queue Apr 14, 2026

Merged via the queue into master with commit 20423f5 Apr 14, 2026
39 checks passed

pinin4fjords deleted the ribodetector-gpu-dual-container branch April 14, 2026 14:15

pinin4fjords mentioned this pull request Apr 14, 2026

docs: add GPU module guidelines nf-core/website#4142

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GPU support to ribodetector (dual-container)#11178

feat: add GPU support to ribodetector (dual-container)#11178
pinin4fjords merged 2 commits intomasterfrom
ribodetector-gpu-dual-container

pinin4fjords commented Apr 14, 2026 •

edited

Loading

Uh oh!

pinin4fjords commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pinin4fjords commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Container size

GPU environment

ARM limitation

Rationale

Test plan

Uh oh!

pinin4fjords commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pinin4fjords commented Apr 14, 2026 •

edited

Loading