conda-file builds cannot install packages requiring __cuda virtual package
Summary
Wave's conda-file builds cannot install any conda-forge package that depends on the __cuda virtual package, because Wave's build servers do not have GPUs and the templates provide no mechanism to set CONDA_OVERRIDE_CUDA during the conda solve step. This affects all GPU-enabled packages on conda-forge from mid-2022 onwards, including PyTorch >= 1.12.1, JAX, TensorFlow, and others.
The standard conda/mamba workaround for building on CPU-only systems is setting CONDA_OVERRIDE_CUDA=<version> as an environment variable during the solve. Wave does not support this.
This was previously reported in the community forum (Problems with GPU-enabled wave images, Jan 2025) by Nico_Trummer, who hit the same __cuda resolution failure building JAX GPU containers. That thread is unresolved - Paolo responded about the build timeout aspect (#597) but the core CONDA_OVERRIDE_CUDA gap was not addressed. This issue documents the problem fully with reproduction steps, tested workarounds, and concrete fix proposals.
Reproduction
Given this environment.gpu.yml:
channels:
- conda-forge
- bioconda
dependencies:
- "bioconda::ribodetector=0.3.3"
- "conda-forge::pytorch-gpu>=2.0"
wave --conda-file environment.gpu.yml --platform linux/amd64 --freeze --await
Fails with:
Could not solve for environment specs
pytorch-gpu >=2.0 is not installable because it requires
pytorch [...], which requires
__cuda =* *, which is missing on the system.
Build log: https://wave.seqera.io/view/builds/bd-bbf66b1b68ac0df5_1
Why this matters
The __cuda virtual package requirement was introduced in conda-forge's PyTorch builds starting at version 1.12.1. We verified this by testing every major version on a Linux x86_64 system without CONDA_OVERRIDE_CUDA set:
| pytorch-gpu version |
Resolves without __cuda? |
CUDA packaging |
| 1.11.0 |
Yes |
Uses cudatoolkit (regular package) |
| 1.12.1 |
No |
Requires __cuda virtual package |
| 1.13.1 |
No |
Requires __cuda virtual package |
| 2.0.0 |
No |
Requires __cuda virtual package |
| 2.5.1 |
No |
Requires __cuda virtual package |
This means Wave can only install pytorch-gpu=1.11.0 (March 2022, CUDA 11.1) via --conda-file. All newer versions fail.
This is a real problem in production: the nf-core/rnaseq pipeline's GPU ribodetector test stalls for 2 hours under Singularity because the old PyTorch 1.11.0 container (CUDA 11.1 runtime) deadlocks on CUDA 12.x hosts. We need a container with modern PyTorch (CUDA 12.x), but Wave's --conda-file path cannot build one.
What we tried
--config-env 'CONDA_OVERRIDE_CUDA=12.6': This sets the env var in the final container image, not during the build/solve step. The conda solver never sees it. (Build log)
--conda-run-command: Per the Wave source (TemplateUtils.java, addCommands method), this appends Dockerfile RUN commands after the micromamba install step. It cannot influence the conda solve.
--conda-base-image nvidia/cuda:12.6.3-base-ubuntu24.04: Fails because the NVIDIA base image does not have micromamba installed. (Build log)
Adding __cuda>=12 as a dependency in the YAML: The solver correctly identifies it as a virtual package that must be provided by the system and refuses to install it. (Build log)
Workaround: custom Dockerfile (works, but no community freeze)
A custom Dockerfile with CONDA_OVERRIDE_CUDA set inline works:
FROM mambaorg/micromamba:1.5.10-noble
COPY conda.yml /tmp/conda.yml
RUN CONDA_OVERRIDE_CUDA="12.6" micromamba install -y -n base -f /tmp/conda.yml && micromamba clean -a -y
USER root
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"
wave -f Dockerfile --context . --platform linux/amd64 --await --tower-token $TOKEN
# Returns: wave.seqera.io/wt/51f41f22dbb8/wave/build:96b4265a2148d918
We pulled this container and verified it contains PyTorch 2.10.0 with CUDA 12.9 and ribodetector 0.3.3 works correctly:
$ docker run --rm <image> python -c "import torch; print(torch.__version__, torch.version.cuda)"
2.10.0 12.9
$ docker run --rm <image> ribodetector --version
ribodetector 0.3.3
However, the custom Dockerfile path requires --build-repo for --freeze, meaning it cannot be frozen to the community Wave registry (community.wave.seqera.io/library/...). This makes it unusable for nf-core modules, which rely on community-frozen container URLs.
Proposed solutions
There are several ways to address this, ranging from minimal to comprehensive:
Option A: Add {{conda_env_prefix}} template placeholder (minimal, targeted)
Add a new placeholder to the Dockerfile templates that injects environment variables before micromamba install:
FROM {{mamba_image}} AS build
COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml
-RUN micromamba install -y -n base -f /tmp/conda.yml \
+RUN {{conda_env_prefix}}micromamba install -y -n base -f /tmp/conda.yml \
Expose this via a new CLI flag (e.g., --conda-solve-env KEY=VALUE) and Nextflow config option. The placeholder would render as CONDA_OVERRIDE_CUDA="12.6" when set, or empty string when not.
This requires changes to:
- All 8 conda templates (v1/v2, Docker/Singularity, file/packages)
CondaOpts.java (new field)
TemplateUtils.java (new binding)
- Wave CLI (new flag)
- Nextflow Wave config (new option)
Option B: Set CONDA_OVERRIDE_CUDA unconditionally in templates
Ruled out: testing shows that for packages with both CPU and GPU variants (e.g. bare pytorch without the -gpu/-cpu suffix), the override causes the solver to prefer CUDA builds, silently pulling in hundreds of MB of CUDA toolkit into containers that never intended to use a GPU.
Option C: Detect GPU packages and set override automatically
Wave could inspect the conda environment spec for known GPU metapackages (pytorch-gpu, jaxlib, tensorflow-gpu, etc.) or __cuda dependencies and automatically set CONDA_OVERRIDE_CUDA during the solve. This would be the most user-friendly option but requires more implementation effort.
Option D: Two-pass solve with automatic retry
Run the solve normally. If it fails and the error output contains __cuda, retry with CONDA_OVERRIDE_CUDA set. The template's RUN command would become something like:
micromamba install -y -n base -f /tmp/conda.yml || \
(micromamba install --dry-run -y -n base -f /tmp/conda.yml 2>&1 | grep -q __cuda \
&& CONDA_OVERRIDE_CUDA="12" micromamba install -y -n base -f /tmp/conda.yml)
This requires no new CLI flags, no package list, and no repodata inspection. The solver itself discovers whether __cuda is needed. The cost is one extra failed solve (~4s) for GPU environments only; non-GPU environments succeed on the first pass with zero overhead. This could be implemented entirely within the existing templates.
Related
conda-file builds cannot install packages requiring
__cudavirtual packageSummary
Wave's conda-file builds cannot install any conda-forge package that depends on the
__cudavirtual package, because Wave's build servers do not have GPUs and the templates provide no mechanism to setCONDA_OVERRIDE_CUDAduring the conda solve step. This affects all GPU-enabled packages on conda-forge from mid-2022 onwards, including PyTorch >= 1.12.1, JAX, TensorFlow, and others.The standard conda/mamba workaround for building on CPU-only systems is setting
CONDA_OVERRIDE_CUDA=<version>as an environment variable during the solve. Wave does not support this.This was previously reported in the community forum (Problems with GPU-enabled wave images, Jan 2025) by Nico_Trummer, who hit the same
__cudaresolution failure building JAX GPU containers. That thread is unresolved - Paolo responded about the build timeout aspect (#597) but the coreCONDA_OVERRIDE_CUDAgap was not addressed. This issue documents the problem fully with reproduction steps, tested workarounds, and concrete fix proposals.Reproduction
Given this
environment.gpu.yml:Fails with:
Build log: https://wave.seqera.io/view/builds/bd-bbf66b1b68ac0df5_1
Why this matters
The
__cudavirtual package requirement was introduced in conda-forge's PyTorch builds starting at version 1.12.1. We verified this by testing every major version on a Linux x86_64 system withoutCONDA_OVERRIDE_CUDAset:__cuda?cudatoolkit(regular package)__cudavirtual package__cudavirtual package__cudavirtual package__cudavirtual packageThis means Wave can only install
pytorch-gpu=1.11.0(March 2022, CUDA 11.1) via--conda-file. All newer versions fail.This is a real problem in production: the nf-core/rnaseq pipeline's GPU ribodetector test stalls for 2 hours under Singularity because the old PyTorch 1.11.0 container (CUDA 11.1 runtime) deadlocks on CUDA 12.x hosts. We need a container with modern PyTorch (CUDA 12.x), but Wave's
--conda-filepath cannot build one.What we tried
--config-env 'CONDA_OVERRIDE_CUDA=12.6': This sets the env var in the final container image, not during the build/solve step. The conda solver never sees it. (Build log)--conda-run-command: Per the Wave source (TemplateUtils.java,addCommandsmethod), this appends DockerfileRUNcommands after themicromamba installstep. It cannot influence the conda solve.--conda-base-image nvidia/cuda:12.6.3-base-ubuntu24.04: Fails because the NVIDIA base image does not have micromamba installed. (Build log)Adding
__cuda>=12as a dependency in the YAML: The solver correctly identifies it as a virtual package that must be provided by the system and refuses to install it. (Build log)Workaround: custom Dockerfile (works, but no community freeze)
A custom Dockerfile with
CONDA_OVERRIDE_CUDAset inline works:We pulled this container and verified it contains PyTorch 2.10.0 with CUDA 12.9 and ribodetector 0.3.3 works correctly:
However, the custom Dockerfile path requires
--build-repofor--freeze, meaning it cannot be frozen to the community Wave registry (community.wave.seqera.io/library/...). This makes it unusable for nf-core modules, which rely on community-frozen container URLs.Proposed solutions
There are several ways to address this, ranging from minimal to comprehensive:
Option A: Add
{{conda_env_prefix}}template placeholder (minimal, targeted)Add a new placeholder to the Dockerfile templates that injects environment variables before
micromamba install:FROM {{mamba_image}} AS build COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml -RUN micromamba install -y -n base -f /tmp/conda.yml \ +RUN {{conda_env_prefix}}micromamba install -y -n base -f /tmp/conda.yml \Expose this via a new CLI flag (e.g.,
--conda-solve-env KEY=VALUE) and Nextflow config option. The placeholder would render asCONDA_OVERRIDE_CUDA="12.6"when set, or empty string when not.This requires changes to:
CondaOpts.java(new field)TemplateUtils.java(new binding)Option B: SetCONDA_OVERRIDE_CUDAunconditionally in templatesRuled out: testing shows that for packages with both CPU and GPU variants (e.g. bare
pytorchwithout the-gpu/-cpusuffix), the override causes the solver to prefer CUDA builds, silently pulling in hundreds of MB of CUDA toolkit into containers that never intended to use a GPU.Option C: Detect GPU packages and set override automatically
Wave could inspect the conda environment spec for known GPU metapackages (
pytorch-gpu,jaxlib,tensorflow-gpu, etc.) or__cudadependencies and automatically setCONDA_OVERRIDE_CUDAduring the solve. This would be the most user-friendly option but requires more implementation effort.Option D: Two-pass solve with automatic retry
Run the solve normally. If it fails and the error output contains
__cuda, retry withCONDA_OVERRIDE_CUDAset. The template'sRUNcommand would become something like:This requires no new CLI flags, no package list, and no repodata inspection. The solver itself discovers whether
__cudais needed. The cost is one extra failed solve (~4s) for GPU environments only; non-GPU environments succeed on the first pass with zero overhead. This could be implemented entirely within the existing templates.Related