conda-file builds cannot install packages requiring __cuda virtual package

conda-file builds cannot install packages requiring `__cuda` virtual package


### Summary

Wave's conda-file builds cannot install any conda-forge package that depends on the `__cuda` virtual package, because Wave's build servers do not have GPUs and the templates provide no mechanism to set `CONDA_OVERRIDE_CUDA` during the conda solve step. This affects all GPU-enabled packages on conda-forge from mid-2022 onwards, including PyTorch >= 1.12.1, JAX, TensorFlow, and others.

The standard conda/mamba workaround for building on CPU-only systems is setting `CONDA_OVERRIDE_CUDA=<version>` as an environment variable during the solve. Wave does not support this.

This was previously reported in the community forum ([Problems with GPU-enabled wave images](https://community.seqera.io/t/problems-with-gpu-enabled-wave-images/1596), Jan 2025) by Nico_Trummer, who hit the same `__cuda` resolution failure building JAX GPU containers. That thread is unresolved - Paolo responded about the build timeout aspect ([#597](https://github.com/seqeralabs/wave/issues/597)) but the core `CONDA_OVERRIDE_CUDA` gap was not addressed. This issue documents the problem fully with reproduction steps, tested workarounds, and concrete fix proposals.

### Reproduction

Given this `environment.gpu.yml`:
```yaml
channels:
  - conda-forge
  - bioconda
dependencies:
  - "bioconda::ribodetector=0.3.3"
  - "conda-forge::pytorch-gpu>=2.0"
```

```bash
wave --conda-file environment.gpu.yml --platform linux/amd64 --freeze --await
```

Fails with:
```
Could not solve for environment specs
pytorch-gpu >=2.0 is not installable because it requires
  pytorch [...], which requires
    __cuda =* *, which is missing on the system.
```

Build log: https://wave.seqera.io/view/builds/bd-bbf66b1b68ac0df5_1

### Why this matters

The `__cuda` virtual package requirement was introduced in conda-forge's PyTorch builds starting at version 1.12.1. We verified this by testing every major version on a Linux x86_64 system without `CONDA_OVERRIDE_CUDA` set:

| pytorch-gpu version | Resolves without `__cuda`? | CUDA packaging |
|---|---|---|
| 1.11.0 | Yes | Uses `cudatoolkit` (regular package) |
| 1.12.1 | **No** | Requires `__cuda` virtual package |
| 1.13.1 | **No** | Requires `__cuda` virtual package |
| 2.0.0 | **No** | Requires `__cuda` virtual package |
| 2.5.1 | **No** | Requires `__cuda` virtual package |

This means Wave can only install `pytorch-gpu=1.11.0` (March 2022, CUDA 11.1) via `--conda-file`. All newer versions fail.

This is a real problem in production: the nf-core/rnaseq pipeline's GPU ribodetector test [stalls for 2 hours under Singularity](https://github.com/nf-core/rnaseq/pull/1788) because the old PyTorch 1.11.0 container (CUDA 11.1 runtime) deadlocks on CUDA 12.x hosts. We need a container with modern PyTorch (CUDA 12.x), but Wave's `--conda-file` path cannot build one.

### What we tried

**`--config-env 'CONDA_OVERRIDE_CUDA=12.6'`**: This sets the env var in the final container image, not during the build/solve step. The conda solver never sees it. ([Build log](https://wave.seqera.io/view/builds/bd-566e7b0788a824ff_1))

**`--conda-run-command`**: Per the Wave source (`TemplateUtils.java`, `addCommands` method), this appends Dockerfile `RUN` commands *after* the `micromamba install` step. It cannot influence the conda solve.

**`--conda-base-image nvidia/cuda:12.6.3-base-ubuntu24.04`**: Fails because the NVIDIA base image does not have micromamba installed. ([Build log](https://wave.seqera.io/view/builds/bd-b587fdb7e4ed8bfc_1))

**Adding `__cuda>=12` as a dependency in the YAML**: The solver correctly identifies it as a virtual package that must be provided by the system and refuses to install it. ([Build log](https://wave.seqera.io/view/builds/bd-5a19be48d123eec7_1))

### Workaround: custom Dockerfile (works, but no community freeze)

A custom Dockerfile with `CONDA_OVERRIDE_CUDA` set inline works:

```dockerfile
FROM mambaorg/micromamba:1.5.10-noble
COPY conda.yml /tmp/conda.yml
RUN CONDA_OVERRIDE_CUDA="12.6" micromamba install -y -n base -f /tmp/conda.yml && micromamba clean -a -y
USER root
ENV PATH="$MAMBA_ROOT_PREFIX/bin:$PATH"
```

```bash
wave -f Dockerfile --context . --platform linux/amd64 --await --tower-token $TOKEN
# Returns: wave.seqera.io/wt/51f41f22dbb8/wave/build:96b4265a2148d918
```

We pulled this container and verified it contains PyTorch 2.10.0 with CUDA 12.9 and ribodetector 0.3.3 works correctly:

```
$ docker run --rm <image> python -c "import torch; print(torch.__version__, torch.version.cuda)"
2.10.0 12.9

$ docker run --rm <image> ribodetector --version
ribodetector 0.3.3
```

However, the custom Dockerfile path requires `--build-repo` for `--freeze`, meaning it cannot be frozen to the community Wave registry (`community.wave.seqera.io/library/...`). This makes it unusable for nf-core modules, which rely on community-frozen container URLs.

### Proposed solutions

There are several ways to address this, ranging from minimal to comprehensive:

#### Option A: Add `{{conda_env_prefix}}` template placeholder (minimal, targeted)

Add a new placeholder to the Dockerfile templates that injects environment variables before `micromamba install`:

```diff
 FROM {{mamba_image}} AS build
 COPY --chown=$MAMBA_USER:$MAMBA_USER conda.yml /tmp/conda.yml
-RUN micromamba install -y -n base -f /tmp/conda.yml \
+RUN {{conda_env_prefix}}micromamba install -y -n base -f /tmp/conda.yml \
```

Expose this via a new CLI flag (e.g., `--conda-solve-env KEY=VALUE`) and Nextflow config option. The placeholder would render as `CONDA_OVERRIDE_CUDA="12.6" ` when set, or empty string when not.

This requires changes to:
- All 8 conda templates (v1/v2, Docker/Singularity, file/packages)
- `CondaOpts.java` (new field)
- `TemplateUtils.java` (new binding)
- Wave CLI (new flag)
- Nextflow Wave config (new option)

#### ~~Option B: Set `CONDA_OVERRIDE_CUDA` unconditionally in templates~~

*Ruled out: testing shows that for packages with both CPU and GPU variants (e.g. bare `pytorch` without the `-gpu`/`-cpu` suffix), the override causes the solver to prefer CUDA builds, silently pulling in hundreds of MB of CUDA toolkit into containers that never intended to use a GPU.*

#### Option C: Detect GPU packages and set override automatically

Wave could inspect the conda environment spec for known GPU metapackages (`pytorch-gpu`, `jaxlib`, `tensorflow-gpu`, etc.) or `__cuda` dependencies and automatically set `CONDA_OVERRIDE_CUDA` during the solve. This would be the most user-friendly option but requires more implementation effort.

#### Option D: Two-pass solve with automatic retry

Run the solve normally. If it fails and the error output contains `__cuda`, retry with `CONDA_OVERRIDE_CUDA` set. The template's `RUN` command would become something like:

```bash
micromamba install -y -n base -f /tmp/conda.yml || \
  (micromamba install --dry-run -y -n base -f /tmp/conda.yml 2>&1 | grep -q __cuda \
   && CONDA_OVERRIDE_CUDA="12" micromamba install -y -n base -f /tmp/conda.yml)
```

This requires no new CLI flags, no package list, and no repodata inspection. The solver itself discovers whether `__cuda` is needed. The cost is one extra failed solve (~4s) for GPU environments only; non-GPU environments succeed on the first pass with zero overhead. This could be implemented entirely within the existing templates.

### Related

- [Community thread: Problems with GPU-enabled wave images](https://community.seqera.io/t/problems-with-gpu-enabled-wave-images/1596) (Jan 2025, unresolved)
- [conda-forge docs: CONDA_OVERRIDE_CUDA](https://conda-forge.org/docs/user/tipsandtricks/#installing-cuda-enabled-packages-like-tensorflow-and-pytorch)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

conda-file builds cannot install packages requiring __cuda virtual package #1026

Summary

Reproduction

Why this matters

What we tried

Workaround: custom Dockerfile (works, but no community freeze)

Proposed solutions

Option A: Add `{{conda_env_prefix}}` template placeholder (minimal, targeted)

Option B: Set `CONDA_OVERRIDE_CUDA` unconditionally in templates

Option C: Detect GPU packages and set override automatically

Option D: Two-pass solve with automatic retry

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pytorch-gpu version	Resolves without `__cuda`?	CUDA packaging
1.11.0	Yes	Uses `cudatoolkit` (regular package)
1.12.1	No	Requires `__cuda` virtual package
1.13.1	No	Requires `__cuda` virtual package
2.0.0	No	Requires `__cuda` virtual package
2.5.1	No	Requires `__cuda` virtual package

conda-file builds cannot install packages requiring __cuda virtual package #1026

Description

Summary

Reproduction

Why this matters

What we tried

Workaround: custom Dockerfile (works, but no community freeze)

Proposed solutions

Option A: Add {{conda_env_prefix}} template placeholder (minimal, targeted)

Option B: Set CONDA_OVERRIDE_CUDA unconditionally in templates

Option C: Detect GPU packages and set override automatically

Option D: Two-pass solve with automatic retry

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Option A: Add `{{conda_env_prefix}}` template placeholder (minimal, targeted)

Option B: Set `CONDA_OVERRIDE_CUDA` unconditionally in templates