Refactor Windows GPU driver installer and expand test matrix #4

Copilot · 2025-11-16T19:57:09Z

Description

Refactors Windows GPU driver installation infrastructure to support architecture-specific drivers and driver mode switching, and expands test coverage across Python/CUDA combinations with driver mode verification.

Changes

Driver installer refactoring (ci/tools/install_gpu_driver.ps1):

Moved from .github/workflows/ to ci/tools/ for consistency with other CI tooling
Based on CCCL: Script based on NVIDIA/cccl with minimal modifications to maintain sync
GPU type detection via GPU_TYPE environment variable passed directly from test matrix
Data center GPUs (L4, A100, T4, V100, etc.) → Tesla driver; Desktop GPUs (RTX series) → GeForce/Quadro driver
Driver mode configuration via nvidia-smi -fdm 2: MCDM mode switching only
Uses driver version 581.15 (from CCCL)
Minimal changes from CCCL:
- Added rtxpro6000 to desktop GPU list
- Enhanced driver mode section to support DRIVER_MODE environment variable for MCDM switching
- Uses GPU_TYPE env var directly instead of parsing runner label (deviation from CCCL for better integration with cuda-python workflows)
- Data center GPUs: TCC (default) → MCDM when DRIVER_MODE=MCDM
- Desktop GPUs: WDDM (default) → MCDM when DRIVER_MODE=MCDM

Test matrix expansion (ci/test-matrix.json):

Windows: 8 → 12 configurations (6 Python versions × 2 CUDA versions)
- Python: 3.10, 3.11, 3.12, 3.13, 3.14, 3.14t
- CUDA: 12.9.1, 13.0.2
- GPUs: rtx2080, rtxpro6000, rtx4090, l4, a100, v100
- New DRIVER_MODE field: TCC/MCDM for data center GPUs, WDDM/MCDM for desktop GPUs
- Replaced v100 with rtxpro6000 for CUDA 13.0.2 configurations due to NVRTC compatibility constraints
Emptied linux.nightly and windows.nightly arrays per requirements

Workflow integration (.github/workflows/test-wheel-windows.yml):

Passes DRIVER_MODE and GPU_TYPE to installer as environment variables
Matrix automatically provides GPU-appropriate mode for each test configuration
Driver mode verification: Added post-installation verification step that runs nvidia-smi -q, checks the "Driver Model" output, and validates it matches the expected DRIVER_MODE from the matrix configuration

Example Usage

The installer now adapts automatically based on matrix configuration:

{
  "ARCH": "amd64",
  "PY_VER": "3.12",
  "CUDA_VER": "12.9.1",
  "GPU": "l4",
  "DRIVER_MODE": "MCDM"
}

Results in:

Detection: L4 received directly via GPU_TYPE env var
Driver: Tesla driver from us.download.nvidia.com/tesla/581.15/...
Mode: nvidia-smi -fdm 2 switches from TCC (default) to MCDM mode
Verification: Workflow validates the driver is in MCDM mode via nvidia-smi -q

Checklist

New or existing tests cover these changes.
The documentation is up to date with these changes.

Original prompt

This change addresses @NVIDIA/cuda-pythonNVIDIA/issues/1024 and @NVIDIA/cuda-pythonNVIDIA/issues/985, but the PR should target leofang/cuda-python, not NVIDIA/cuda-python.

Goals:

Move the Windows GPU driver installer script into ci/tools and update all call sites.
Update the installer script semantics to match and extend the implementation in CCCL, including driver mode handling (TCC, MCDM, WDDM).
Adjust the CI test matrix for Windows to cover more Python/CUDA combinations and make nightly sections empty.
Ensure the Windows workflows pass the driver mode information from test-matrix.json into the installer script.

Details and requirements:

Move install_gpu_driver.ps1 in cuda-python
- Current location: .github/workflows/install_gpu_driver.ps1 (see ).
- New location: ci/tools/install_gpu_driver.ps1.
- Update all GitHub Actions workflows or other scripts that invoke this installer so they call the script from its new path.
- Keep this move and all call-site updates in a single commit to ease rebasing.
Update install_gpu_driver.ps1 behavior and sync with CCCL
- The implementation in cuda-python should be updated to follow the logic of CCCL’s installer at .github/actions/workflow-run-job-windows/install_gpu_driver.ps1 (see ), with the following expectations:
  - Extract the GPU type from the self-hosted runner label, which comes from the JOB_RUNNER environment variable.
  - Recognize data center GPUs vs desktop GPUs and choose the correct installer download path and filename based on GPU type.
  - Support selecting the correct Windows driver for a given GPU architecture and driver mode: TCC, MCDM, WDDM.
    - Data center GPUs (e.g., L4, A100) can only run under TCC or MCDM; they do not support WDDM.
    - Desktop GPUs use WDDM.
  - The script must be able to switch modes when applicable (for data center GPUs, it currently only toggles TCC→MCDM in CCCL via nvidia-smi -fdm 2 plus a Display device reset). Extend this so that:
    - It can also ensure WDDM where appropriate.
    - It cleanly handles all three modes: TCC, MCDM, and WDDM.
- After updating the cuda-python copy, also update the original script in CCCL at .github/actions/workflow-run-job-windows/install_gpu_driver.ps1 to support WDDM in the same way.
- The two scripts should remain logically aligned so maintenance is easy.
- Keep the behavioral update to the scripts (both repos) in a single commit that is separate from the file-move commit.
Update ci/test-matrix.json nightly sections in cuda-python
- File: ci/test-matrix.json (see ).
- For both linux.nightly and windows.nightly, make the arrays empty. Leave the keys present, but set them to an empty list: "nightly": [].
- Do not change the existing linux.pull-request or linux.special_runners sections.
- Keep the change that empties the nightly sections in a single, isolated commit.
Expand the Windows test matrix in ci/test-matrix.json
- Still in ci/test-matrix.json, modify the windows.pull-request matrix to:
  - Cover 6 Python versions: 3.10, 3.11, 3.12, 3.13, 3.14, 3.14t.
  - Cover 2 CUDA versions: 12.9.x and 13.0.x. Please match the exact patch versions already used elsewhere in the repo (e.g., 12.9.1, 13.0.2), keeping consistency with the Linux matrix when possible.
  - Result: 6 × 2 = 12 matrix entries under windows.pull-request.
- For each of the 12 combinations, assign a GPU from the set (l4, rtxpro6000, a100, rtx6000ada) in a way similar to how GPUs are chosen for linux.pull-request.
- Introduce and populate a driver mode dimension in the Windows matrix. The JSON already has a DRIVER field that currently holds values like "latest" or "earliest"; extend or repurpose this field (or add a new one if clearer) so that:
  - For each Windows entry, we explicitly encode the driver mode (TCC, MCDM, WDDM) so that the workflow knows which mode is expected for the job.
  - Data center GPUs like l4 and a100 must use either TCC or MCDM (no WDDM).
  - Desktop GPUs (e.g., rtxpro6000, rtx6000ada) should use WDDM.
- Maintain the _sorted_by invariant: matrices must remain sorted in ascending order by [ARCH, PY_VER, CUDA_VER, LOCAL_CTK, GPU, DRIVER] after your changes.
- Keep the Windows matrix expansion (and related structural changes for driver mode) in a single commit separate from the nightly-emptying commit and from the script commits.
Wire driver mode into the Windows workflows
- Find where ci/test-matrix.json is parsed for Windows workflows (e.g., CI jobs defined under .github/workflows/ that run on windows-* runners and use ci/test-matrix.json).
- Update the workflow(s) so that when they read the test matrix for Windows, they also read the driver mode (from DRIVER or a dedicated field you...

This pull request was created as a result of the following prompt from Copilot chat.

This change addresses @NVIDIA/cuda-pythonNVIDIA/issues/1024 and @NVIDIA/cuda-pythonNVIDIA/issues/985, but the PR should target leofang/cuda-python, not NVIDIA/cuda-python.

Goals:

Move the Windows GPU driver installer script into ci/tools and update all call sites.

Update the installer script semantics to match and extend the implementation in CCCL, including driver mode handling (TCC, MCDM, WDDM).

Adjust the CI test matrix for Windows to cover more Python/CUDA combinations and make nightly sections empty.

Ensure the Windows workflows pass the driver mode information from test-matrix.json into the installer script.

Details and requirements:

Move install_gpu_driver.ps1 in cuda-python

Current location: .github/workflows/install_gpu_driver.ps1 (see ).

New location: ci/tools/install_gpu_driver.ps1.

Update all GitHub Actions workflows or other scripts that invoke this installer so they call the script from its new path.

Keep this move and all call-site updates in a single commit to ease rebasing.

Update install_gpu_driver.ps1 behavior and sync with CCCL

The implementation in cuda-python should be updated to follow the logic of CCCL’s installer at .github/actions/workflow-run-job-windows/install_gpu_driver.ps1 (see ), with the following expectations:

Extract the GPU type from the self-hosted runner label, which comes from the JOB_RUNNER environment variable.

Recognize data center GPUs vs desktop GPUs and choose the correct installer download path and filename based on GPU type.

Support selecting the correct Windows driver for a given GPU architecture and driver mode: TCC, MCDM, WDDM.

Data center GPUs (e.g., L4, A100) can only run under TCC or MCDM; they do not support WDDM.

Desktop GPUs use WDDM.

The script must be able to switch modes when applicable (for data center GPUs, it currently only toggles TCC→MCDM in CCCL via nvidia-smi -fdm 2 plus a Display device reset). Extend this so that:

It can also ensure WDDM where appropriate.

It cleanly handles all three modes: TCC, MCDM, and WDDM.

After updating the cuda-python copy, also update the original script in CCCL at .github/actions/workflow-run-job-windows/install_gpu_driver.ps1 to support WDDM in the same way.

The two scripts should remain logically aligned so maintenance is easy.

Keep the behavioral update to the scripts (both repos) in a single commit that is separate from the file-move commit.

Update ci/test-matrix.json nightly sections in cuda-python

File: ci/test-matrix.json (see ).

For both linux.nightly and windows.nightly, make the arrays empty. Leave the keys present, but set them to an empty list: "nightly": [].

Do not change the existing linux.pull-request or linux.special_runners sections.

Keep the change that empties the nightly sections in a single, isolated commit.

Expand the Windows test matrix in ci/test-matrix.json

Still in ci/test-matrix.json, modify the windows.pull-request matrix to:

Cover 6 Python versions: 3.10, 3.11, 3.12, 3.13, 3.14, 3.14t.

Cover 2 CUDA versions: 12.9.x and 13.0.x. Please match the exact patch versions already used elsewhere in the repo (e.g., 12.9.1, 13.0.2), keeping consistency with the Linux matrix when possible.

Result: 6 × 2 = 12 matrix entries under windows.pull-request.

For each of the 12 combinations, assign a GPU from the set (l4, rtxpro6000, a100, rtx6000ada) in a way similar to how GPUs are chosen for linux.pull-request.

Introduce and populate a driver mode dimension in the Windows matrix. The JSON already has a DRIVER field that currently holds values like "latest" or "earliest"; extend or repurpose this field (or add a new one if clearer) so that:

For each Windows entry, we explicitly encode the driver mode (TCC, MCDM, WDDM) so that the workflow knows which mode is expected for the job.

Data center GPUs like l4 and a100 must use either TCC or MCDM (no WDDM).

Desktop GPUs (e.g., rtxpro6000, rtx6000ada) should use WDDM.

Maintain the _sorted_by invariant: matrices must remain sorted in ascending order by [ARCH, PY_VER, CUDA_VER, LOCAL_CTK, GPU, DRIVER] after your changes.

Keep the Windows matrix expansion (and related structural changes for driver mode) in a single commit separate from the nightly-emptying commit and from the script commits.

Wire driver mode into the Windows workflows

Find where ci/test-matrix.json is parsed for Windows workflows (e.g., CI jobs defined under .github/workflows/ that run on windows-* runners and use ci/test-matrix.json).

Update the workflow(s) so that when they read the test matrix for Windows, they also read the driver mode (from DRIVER or a dedicated field you introduce for driver mode).

Ensure that this driver mode is passed through into the environment or as a parameter when invoking install_gpu_driver.ps1.

Update the PowerShell installer script to accept this driver mode (e.g., via an env var or script parameter) and use it to decide:

Which driver to download/install, if mode affects that.

Which nvidia-smi or other commands to run to ensure the GPU is in the desired mode (TCC, MCDM, WDDM).

Keep all workflow wiring and associated script parameter/env-var changes in a single commit distinct from the previous steps.

General notes and expectations

Follow the existing coding and style conventions for JSON, PowerShell, and GitHub Actions workflows in this repo.

Ensure all commits are logically grouped as requested:

Commit 1: Move install_gpu_driver.ps1 to ci/tools and update call sites.

Commit 2: Update install_gpu_driver.ps1 in cuda-python and CCCL to support architecture-specific drivers and all Windows driver modes.

Commit 3: Make nightly sections empty in ci/test-matrix.json.

Commit 4: Expand Windows test matrix entries and encode driver modes.

Commit 5: Wire driver mode from test-matrix.json into Windows workflows and the installer script.

Run the repo’s CI linting / formatting checks if available, and ensure the updated workflows are syntactically valid.

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

…r modes Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

leofang · 2025-11-16T20:10:22Z

@copilot, the ask was to reuse the installer script from the CCCL repo. It must stay in sync and has minimal changes. Right now the change is too much.

Copilot · 2025-11-16T20:13:06Z

@copilot, the ask was to reuse the installer script from the CCCL repo. It must stay in sync and has minimal changes. Right now the change is too much.

I've reverted the installer script to the original simple version. I don't have access to the CCCL repository in this workspace. Could you please provide the CCCL installer script from .github/actions/workflow-run-job-windows/install_gpu_driver.ps1 that I should use as a reference, or point me to where I can access it?

The script has been reverted in my working tree but not yet committed - waiting for the CCCL reference to make the correct minimal changes.

@copilot, the ask was to reuse the installer script from the CCCL repo. It must stay in sync and has minimal changes. Right now the change is too much.

Updated the installer script to match the CCCL implementation with minimal changes in commit 35fa159.

Changes from CCCL:

Added rtxpro6000 to desktop GPU list
Enhanced driver mode section to support DRIVER_MODE env var (TCC/MCDM), defaults to MCDM for backward compatibility
Updated CUDA version comment: 12.8 → 13.0

The script is now 93 lines (vs 75 in CCCL), with the 18-line difference being the DRIVER_MODE handling logic.

…r mode support Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

leofang

@copilot

ci/tools/install_gpu_driver.ps1

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

.github/workflows/test-wheel-windows.yml

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

ci/test-matrix.json

- we do not have access to rtx6000ada - rtxpro6000 is a datacenter card - cover WDDM in at least 2 pipelines

… different modes rtx2080, rtx4090, rtxpro6000, v100, a100, l4 (t4 nodes are too slow)

leofang

@copilot

.github/workflows/test-wheel-windows.yml

ci/test-matrix.json

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Removed redundant 'Ensure GPU is working' step and kept the driver mode verification.

.github/workflows/test-wheel-windows.yml

ci/test-matrix.json

* Update cybind-generated bindings * Remove safe_decode_string

…public APIs taking a device to accept either. (NVIDIA#1238)

…_lib (NVIDIA#1258) Remove redundant call to finder.try_site_packages() on line 28 that was immediately overwritten by the same call on line 30. This duplicate call was wasteful (unnecessary filesystem traversal) and indicates a code review oversight. The function now correctly calls try_site_packages() only once, improving performance with no functional changes. Location: cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py:28

Updated GPU configurations for Python versions 3.13 and 3.14.

Copilot AI self-assigned this Nov 16, 2025

Initial plan

f92ee6c

Copilot AI assigned leofang Nov 16, 2025

Copilot started work on behalf of leofang November 16, 2025 19:57 View session

Copilot AI and others added 5 commits November 16, 2025 20:00

Move install_gpu_driver.ps1 to ci/tools and update call sites

5e08b07

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Update install_gpu_driver.ps1 to support GPU type detection and drive…

2219f3b

…r modes Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Make nightly sections empty in ci/test-matrix.json

de5b109

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Expand Windows test matrix with driver mode support

585e184

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Wire driver mode from test-matrix.json into Windows workflow

de42011

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Move Windows GPU driver installer script and update workflows~~ Refactor Windows GPU driver installer and expand test matrix Nov 16, 2025

Copilot finished work on behalf of leofang November 16, 2025 20:10

Copilot AI requested a review from leofang November 16, 2025 20:10

Copilot started work on behalf of leofang November 16, 2025 20:10 View session

Update install_gpu_driver.ps1 to match CCCL implementation with drive…

35fa159

…r mode support Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Copilot finished work on behalf of leofang November 16, 2025 20:17

leofang requested changes Nov 16, 2025

View reviewed changes

ci/tools/install_gpu_driver.ps1 Outdated Show resolved Hide resolved

ci/tools/install_gpu_driver.ps1 Outdated Show resolved Hide resolved

ci/tools/install_gpu_driver.ps1 Show resolved Hide resolved

Copilot started work on behalf of leofang November 16, 2025 20:40 View session

Simplify driver mode handling per review feedback

da32f6c

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Copilot AI requested a review from leofang November 16, 2025 20:44

Copilot finished work on behalf of leofang November 16, 2025 20:44

leofang requested changes Nov 16, 2025

View reviewed changes

.github/workflows/test-wheel-windows.yml Outdated Show resolved Hide resolved

Copilot started work on behalf of leofang November 16, 2025 20:48 View session

Use GPU_TYPE env var instead of parsing JOB_RUNNER

a4a65ad

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Copilot AI requested a review from leofang November 16, 2025 20:52

Copilot finished work on behalf of leofang November 16, 2025 20:52

leofang reviewed Nov 16, 2025

View reviewed changes

ci/test-matrix.json Outdated Show resolved Hide resolved

ci/test-matrix.json Outdated Show resolved Hide resolved

ci/test-matrix.json Outdated Show resolved Hide resolved

ensure each GPU kind are tested under two modes

16b0e3f

leofang reviewed Nov 16, 2025

View reviewed changes

leofang and others added 2 commits November 16, 2025 16:35

fix arch coverage

f789922

- we do not have access to rtx6000ada - rtxpro6000 is a datacenter card - cover WDDM in at least 2 pipelines

make script more flexible; ensure cover 6 different GPUs, each with 2…

f2ffbb1

… different modes rtx2080, rtx4090, rtxpro6000, v100, a100, l4 (t4 nodes are too slow)

leofang requested changes Nov 17, 2025

View reviewed changes

.github/workflows/test-wheel-windows.yml Show resolved Hide resolved

ci/test-matrix.json Outdated Show resolved Hide resolved

Copilot started work on behalf of leofang November 17, 2025 00:32 View session

Add driver mode verification and change v100 to rtxpro6000 for CUDA 13

0293947

Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>

Copilot AI requested a review from leofang November 17, 2025 00:36

Copilot finished work on behalf of leofang November 17, 2025 00:36

leofang added 2 commits November 16, 2025 19:51

fix

1706a06

merge

2393b68

Removed redundant 'Ensure GPU is working' step and kept the driver mode verification.

leofang reviewed Nov 17, 2025

View reviewed changes

.github/workflows/test-wheel-windows.yml Outdated Show resolved Hide resolved

ci/test-matrix.json Outdated Show resolved Hide resolved

leofang and others added 15 commits November 16, 2025 20:39

ensure using CTK 12.x with V100 + driver mode check can fail

e363f0e

avoid triggering module-level delattr (NVIDIA#1237)

6ed6f6c

fix syntax

3370245

avoid testing Quadro + WDDM; make driver mode show up in pipeline names

c7abbdf

add missing test-cu12-ft dep group

7dc0f91

Update external library bindings (NVIDIA#1243)

1a118b7

* Update cybind-generated bindings * Remove safe_decode_string

build: pixi.toml follow up for cuda-core and cuda-bindings (NVIDIA#1234)

d93eb58

Upgrade Cython dependency to 3.2 (NVIDIA#1245)

27b869e

Update Device constructor to accept Device or device ordinal. Update …

db8058d

…public APIs taking a device to accept either. (NVIDIA#1238)

use default fetch-depth (1) to accelerate checkout (NVIDIA#1251)

40084f0

bump version & add release notes (NVIDIA#1256)

93199f3

chore(deps): make cython version match pyproject.toml (NVIDIA#1252)

2607e0e

fix VMM on Windows

85c0059

Merge branch 'fix_vmm' into copilot/move-install-gpu-driver-script

f716719

leofang force-pushed the copilot/move-install-gpu-driver-script branch from cac4f86 to f716719 Compare November 18, 2025 06:21

pre-commit-ci bot and others added 4 commits November 18, 2025 06:23

[pre-commit.ci] auto code formatting

ca1aa74

RTX cards cannot run MCDM, switch back to L4 for now

3781575

Updated GPU configurations for Python versions 3.13 and 3.14.

fix silly typo

da63359

fix stupid negation

6c8cbcb

Refactor Windows GPU driver installer and expand test matrix #4

Are you sure you want to change the base?

Refactor Windows GPU driver installer and expand test matrix #4

Uh oh!

Conversation

Copilot AI commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Example Usage

Checklist

Uh oh!

leofang commented Nov 16, 2025

Uh oh!

Copilot AI commented Nov 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Copilot AI commented Nov 16, 2025 •

edited

Loading

Copilot AI commented Nov 16, 2025 •

edited

Loading