Skip to content

feat: add GPU support to ribodetector (dual-container)#11178

Merged
pinin4fjords merged 2 commits intomasterfrom
ribodetector-gpu-dual-container
Apr 14, 2026
Merged

feat: add GPU support to ribodetector (dual-container)#11178
pinin4fjords merged 2 commits intomasterfrom
ribodetector-gpu-dual-container

Conversation

@pinin4fjords
Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords commented Apr 14, 2026

Summary

  • Add GPU container switching to ribodetector: when task.accelerator is set, uses a CUDA-enabled container with pytorch-gpu; otherwise uses the existing CPU container
  • CPU users are unaffected (same ~546 MB container), GPU users get a ~3.5 GB container with CUDA
  • Both conda and container directives use ternaries based on task.accelerator
  • Fix a latent GString interpolation bug in the memory flag ($task.memory.toGiga() -> ${task.memory.toGiga()})
  • Fix stub gzip syntax for nf-core lint compliance
  • Add GPU-tagged tests (real + stub) following the parabricks test pattern
  • Improve both CPU and GPU tests: use nft-fastq plugin for order-independent sorted read name comparison, log content assertions for classification counts (4159 non-rRNA, 0 rRNA), ANSI color stripping

Container size

The GPU container is ~3.5 GB vs ~546 MB for the CPU container. The size comes from pytorch-gpu bundling the CUDA toolkit, cuDNN, and GPU-specific pytorch builds. With the dual-container approach, CPU-only users are unaffected.

GPU environment

The environment.gpu.yml uses conda-forge::pytorch-gpu=1.11.0. The CPU environment.yml is unchanged.

ARM limitation

The GPU container is x86_64 only. This is a conda packaging limitation, not a hardware one (ARM + NVIDIA GPU setups like Jetson do exist):

  • All conda-forge ARM pytorch-gpu builds (oldest available is 2.1.2) depend on the __cuda virtual package, which requires CUDA drivers to be present at conda solve time. Wave's micromamba base image does not provide this, so the solve fails.
  • The x86 pytorch-gpu=1.11.0 works because it predates the __cuda convention. It depends on cudatoolkit instead, which is a regular installable package that gets bundled into the container.
  • Newer x86 builds (2.5.1+) also require __cuda and would fail the same way. The version pin to 1.11.0 is not just for reproducibility - it's the newest x86 version that avoids the __cuda gate.
  • The pytorch channel packages avoid __cuda entirely (they use pytorch-cuda and pytorch-mutex real packages instead), but the pytorch channel has no ARM builds at all.
  • We tested CONDA_OVERRIDE_CUDA=12.4 (which tells conda to pretend CUDA is present) but Wave does not support injecting environment variables at solve time, only at container runtime (--config-env).

Pipelines using this module should guard GPU ribodetector on ARM, as nf-core/rnaseq does:

if (params.use_gpu_ribodetector && (params.arm ?: false)) {
    error("--use_gpu_ribodetector is not supported on ARM architecture.")
}

Resolution paths:

  1. Wave adds support for CONDA_OVERRIDE_CUDA at build/solve time
  2. Wave provides CUDA-aware base images
  3. conda-forge drops the __cuda requirement for ARM pytorch-gpu (unlikely, it serves a purpose)

Rationale

The ribodetector module already selects between GPU (ribodetector) and CPU (ribodetector_cpu) binaries based on task.accelerator, but the existing container only includes CPU pytorch, making the GPU binary fail with "CUDA unavailable". This PR adds the missing GPU container and wires up the selection.

Related pipeline PR: nf-core/rnaseq#1790

Test plan

  • CPU tests pass (conda + docker, verified locally and in CI)
  • nf-core lint passes
  • GPU stub tests pass (docker + singularity)
  • GPU real tests run ribodetector on NVIDIA GPU hardware with correct classification counts
  • GPU and CPU produce identical classification results (same read counts: 4159 non-rRNA, 0 rRNA)

🤖 Generated with Claude Code

Switch between GPU and CPU containers based on task.accelerator.
CPU users keep the existing ~546 MB container; GPU users get a ~3.5 GB
container with CUDA-enabled PyTorch from conda-forge::pytorch-gpu.

Module changes:
- Container ternary: switches between GPU and CPU containers for both
  Docker and Singularity based on task.accelerator
- Conda ternary: environment.gpu.yml (pytorch-gpu) when GPU,
  environment.yml (unchanged) when CPU
- Fix GString bug: $task.memory.toGiga() -> ${task.memory.toGiga()}
- Fix stub gzip syntax for nf-core lint compliance

GPU environment uses only standard conda-forge + bioconda channels
(conda-forge::pytorch-gpu=1.11.0, built from the pytorch-cpu-feedstock
as a GPU output variant).

Test improvements (CPU and GPU, identical assertions):
- nft-fastq plugin for order-independent sorted read name comparison
- Log content assertions for classification counts (4159 non-rRNA,
  0 rRNA) with ANSI color stripping
- GPU tests tagged with "gpu" following the parabricks pattern

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pinin4fjords pinin4fjords force-pushed the ribodetector-gpu-dual-container branch from 68e09d8 to 88816cb Compare April 14, 2026 12:53
@pinin4fjords pinin4fjords marked this pull request as ready for review April 14, 2026 13:01
@pinin4fjords
Copy link
Copy Markdown
Member Author

Thanks @jfy133 !

@pinin4fjords pinin4fjords added this pull request to the merge queue Apr 14, 2026
Merged via the queue into master with commit 20423f5 Apr 14, 2026
39 checks passed
@pinin4fjords pinin4fjords deleted the ribodetector-gpu-dual-container branch April 14, 2026 14:15
pinin4fjords added a commit to nf-core/website that referenced this pull request Apr 14, 2026
Add guidance for nf-core modules that support GPU acceleration:

- Software requirements: dual-container pattern using task.accelerator
  ternaries for both conda and container directives, environment.gpu.yml
  for GPU deps, ARM limitation note
- Resource requirements: GPU acceleration via task.accelerator, pipeline
  controls allocation not the module
- Testing: GPU test file conventions (main.gpu.nf.test), tagging with
  "gpu", nextflow.gpu.config, nft-fastq plugin for non-deterministic
  output, same assertions for GPU and CPU

Based on patterns established in nf-core/modules#11178 (ribodetector
GPU support) and existing parabricks conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords added a commit to nf-core/website that referenced this pull request Apr 14, 2026
Add guidance for nf-core modules that support GPU acceleration,
based on patterns from ribodetector (nf-core/modules#11178) and
parabricks.

Software requirements:
- Three container approaches (dual-container, single, vendor-provided)
- Dual-container ternary pattern for conda + container directives
- environment.gpu.yml convention
- Conda/mamba guard for vendor containers
- Binary selection and multi-GPU via task.accelerator
- ARM __cuda limitation note

Resource requirements:
- task.accelerator for GPU detection (module reads, pipeline sets)
- Tip pointing to pipeline-side containerOptions pattern

Testing:
- GPU test file conventions (main.gpu.nf.test, nextflow.gpu.config)
- gpu and gpu_highmem tags for CI runner selection
- Same assertions for GPU and CPU to catch divergence

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords added a commit to nf-core/rnaseq that referenced this pull request Apr 14, 2026
Replace local patch with the merged upstream module that includes
the dual-container GPU support.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords added a commit to nf-core/rnaseq that referenced this pull request Apr 14, 2026
)

Add use_gpu_ribodetector boolean parameter. When enabled, requests a
GPU accelerator and applies container GPU flags for the RIBODETECTOR
process.

Module (from nf-core/modules#11178):
- Dual-container: GPU container (conda-forge::pytorch-gpu, ~3.5 GB)
  when task.accelerator is set, CPU container (~546 MB) otherwise
- Both conda and container directives use ternaries on task.accelerator
- Fix GString bug: $task.memory.toGiga() -> ${task.memory.toGiga()}

Pipeline:
- Add use_gpu_ribodetector param with accelerator + containerOptions
  closures in the RIBODETECTOR process config
- Add validation: errors on ARM or wrong ribo_removal_tool
- Fix .nftignore pattern for ribodetector log files
- Add GPU test (real + stub) tagged "gpu", skip_bbsplit for determinism
- Generalize CI skip from SKIP_PARABRICKS to SKIP_GPU (backward compat)

Closes #1780

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords added a commit to nf-core/rnaseq that referenced this pull request Apr 14, 2026
Resolve modules.json conflict: take dev updates, restore ribodetector
SHA from nf-core/modules#11178.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants