feat: add GPU support to ribodetector (dual-container)#11178
Merged
pinin4fjords merged 2 commits intomasterfrom Apr 14, 2026
Merged
feat: add GPU support to ribodetector (dual-container)#11178pinin4fjords merged 2 commits intomasterfrom
pinin4fjords merged 2 commits intomasterfrom
Conversation
This was referenced Apr 14, 2026
ea874d4 to
042b648
Compare
Switch between GPU and CPU containers based on task.accelerator.
CPU users keep the existing ~546 MB container; GPU users get a ~3.5 GB
container with CUDA-enabled PyTorch from conda-forge::pytorch-gpu.
Module changes:
- Container ternary: switches between GPU and CPU containers for both
Docker and Singularity based on task.accelerator
- Conda ternary: environment.gpu.yml (pytorch-gpu) when GPU,
environment.yml (unchanged) when CPU
- Fix GString bug: $task.memory.toGiga() -> ${task.memory.toGiga()}
- Fix stub gzip syntax for nf-core lint compliance
GPU environment uses only standard conda-forge + bioconda channels
(conda-forge::pytorch-gpu=1.11.0, built from the pytorch-cpu-feedstock
as a GPU output variant).
Test improvements (CPU and GPU, identical assertions):
- nft-fastq plugin for order-independent sorted read name comparison
- Log content assertions for classification counts (4159 non-rRNA,
0 rRNA) with ANSI color stripping
- GPU tests tagged with "gpu" following the parabricks pattern
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
68e09d8 to
88816cb
Compare
jfy133
approved these changes
Apr 14, 2026
Member
Author
|
Thanks @jfy133 ! |
pinin4fjords
added a commit
to nf-core/website
that referenced
this pull request
Apr 14, 2026
Add guidance for nf-core modules that support GPU acceleration: - Software requirements: dual-container pattern using task.accelerator ternaries for both conda and container directives, environment.gpu.yml for GPU deps, ARM limitation note - Resource requirements: GPU acceleration via task.accelerator, pipeline controls allocation not the module - Testing: GPU test file conventions (main.gpu.nf.test), tagging with "gpu", nextflow.gpu.config, nft-fastq plugin for non-deterministic output, same assertions for GPU and CPU Based on patterns established in nf-core/modules#11178 (ribodetector GPU support) and existing parabricks conventions. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords
added a commit
to nf-core/website
that referenced
this pull request
Apr 14, 2026
Add guidance for nf-core modules that support GPU acceleration, based on patterns from ribodetector (nf-core/modules#11178) and parabricks. Software requirements: - Three container approaches (dual-container, single, vendor-provided) - Dual-container ternary pattern for conda + container directives - environment.gpu.yml convention - Conda/mamba guard for vendor containers - Binary selection and multi-GPU via task.accelerator - ARM __cuda limitation note Resource requirements: - task.accelerator for GPU detection (module reads, pipeline sets) - Tip pointing to pipeline-side containerOptions pattern Testing: - GPU test file conventions (main.gpu.nf.test, nextflow.gpu.config) - gpu and gpu_highmem tags for CI runner selection - Same assertions for GPU and CPU to catch divergence Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords
added a commit
to nf-core/rnaseq
that referenced
this pull request
Apr 14, 2026
Replace local patch with the merged upstream module that includes the dual-container GPU support. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords
added a commit
to nf-core/rnaseq
that referenced
this pull request
Apr 14, 2026
) Add use_gpu_ribodetector boolean parameter. When enabled, requests a GPU accelerator and applies container GPU flags for the RIBODETECTOR process. Module (from nf-core/modules#11178): - Dual-container: GPU container (conda-forge::pytorch-gpu, ~3.5 GB) when task.accelerator is set, CPU container (~546 MB) otherwise - Both conda and container directives use ternaries on task.accelerator - Fix GString bug: $task.memory.toGiga() -> ${task.memory.toGiga()} Pipeline: - Add use_gpu_ribodetector param with accelerator + containerOptions closures in the RIBODETECTOR process config - Add validation: errors on ARM or wrong ribo_removal_tool - Fix .nftignore pattern for ribodetector log files - Add GPU test (real + stub) tagged "gpu", skip_bbsplit for determinism - Generalize CI skip from SKIP_PARABRICKS to SKIP_GPU (backward compat) Closes #1780 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pinin4fjords
added a commit
to nf-core/rnaseq
that referenced
this pull request
Apr 14, 2026
Resolve modules.json conflict: take dev updates, restore ribodetector SHA from nf-core/modules#11178. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
task.acceleratoris set, uses a CUDA-enabled container withpytorch-gpu; otherwise uses the existing CPU containercondaandcontainerdirectives use ternaries based ontask.accelerator$task.memory.toGiga()->${task.memory.toGiga()})nft-fastqplugin for order-independent sorted read name comparison, log content assertions for classification counts (4159 non-rRNA, 0 rRNA), ANSI color strippingContainer size
The GPU container is ~3.5 GB vs ~546 MB for the CPU container. The size comes from pytorch-gpu bundling the CUDA toolkit, cuDNN, and GPU-specific pytorch builds. With the dual-container approach, CPU-only users are unaffected.
GPU environment
The
environment.gpu.ymlusesconda-forge::pytorch-gpu=1.11.0. The CPUenvironment.ymlis unchanged.ARM limitation
The GPU container is x86_64 only. This is a conda packaging limitation, not a hardware one (ARM + NVIDIA GPU setups like Jetson do exist):
pytorch-gpubuilds (oldest available is 2.1.2) depend on the__cudavirtual package, which requires CUDA drivers to be present at conda solve time. Wave's micromamba base image does not provide this, so the solve fails.pytorch-gpu=1.11.0works because it predates the__cudaconvention. It depends oncudatoolkitinstead, which is a regular installable package that gets bundled into the container.__cudaand would fail the same way. The version pin to 1.11.0 is not just for reproducibility - it's the newest x86 version that avoids the__cudagate.__cudaentirely (they usepytorch-cudaandpytorch-mutexreal packages instead), but the pytorch channel has no ARM builds at all.CONDA_OVERRIDE_CUDA=12.4(which tells conda to pretend CUDA is present) but Wave does not support injecting environment variables at solve time, only at container runtime (--config-env).Pipelines using this module should guard GPU ribodetector on ARM, as nf-core/rnaseq does:
Resolution paths:
CONDA_OVERRIDE_CUDAat build/solve time__cudarequirement for ARM pytorch-gpu (unlikely, it serves a purpose)Rationale
The ribodetector module already selects between GPU (
ribodetector) and CPU (ribodetector_cpu) binaries based ontask.accelerator, but the existing container only includes CPU pytorch, making the GPU binary fail with "CUDA unavailable". This PR adds the missing GPU container and wires up the selection.Related pipeline PR: nf-core/rnaseq#1790
Test plan
🤖 Generated with Claude Code