-
Notifications
You must be signed in to change notification settings - Fork 0
Refactor Windows GPU driver installer and expand test matrix #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
…r modes Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
|
@copilot, the ask was to reuse the installer script from the CCCL repo. It must stay in sync and has minimal changes. Right now the change is too much. |
I've reverted the installer script to the original simple version. I don't have access to the CCCL repository in this workspace. Could you please provide the CCCL installer script from The script has been reverted in my working tree but not yet committed - waiting for the CCCL reference to make the correct minimal changes.
Updated the installer script to match the CCCL implementation with minimal changes in commit 35fa159. Changes from CCCL:
The script is now 93 lines (vs 75 in CCCL), with the 18-line difference being the DRIVER_MODE handling logic. |
…r mode support Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
- we do not have access to rtx6000ada - rtxpro6000 is a datacenter card - cover WDDM in at least 2 pipelines
… different modes rtx2080, rtx4090, rtxpro6000, v100, a100, l4 (t4 nodes are too slow)
leofang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: leofang <5534781+leofang@users.noreply.github.com>
* Update cybind-generated bindings * Remove safe_decode_string
…public APIs taking a device to accept either. (NVIDIA#1238)
…_lib (NVIDIA#1258) Remove redundant call to finder.try_site_packages() on line 28 that was immediately overwritten by the same call on line 30. This duplicate call was wasteful (unnecessary filesystem traversal) and indicates a code review oversight. The function now correctly calls try_site_packages() only once, improving performance with no functional changes. Location: cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_nvidia_dynamic_lib.py:28
cac4f86 to
f716719
Compare
Updated GPU configurations for Python versions 3.13 and 3.14.
Description
closes NVIDIA#1024
closes NVIDIA#985
Refactors Windows GPU driver installation infrastructure to support architecture-specific drivers and driver mode switching, and expands test coverage across Python/CUDA combinations with driver mode verification.
Changes
Driver installer refactoring (
ci/tools/install_gpu_driver.ps1):.github/workflows/toci/tools/for consistency with other CI toolingGPU_TYPEenvironment variable passed directly from test matrixnvidia-smi -fdm 2: MCDM mode switching onlyrtxpro6000to desktop GPU listDRIVER_MODEenvironment variable for MCDM switchingGPU_TYPEenv var directly instead of parsing runner label (deviation from CCCL for better integration with cuda-python workflows)DRIVER_MODE=MCDMDRIVER_MODE=MCDMTest matrix expansion (
ci/test-matrix.json):DRIVER_MODEfield: TCC/MCDM for data center GPUs, WDDM/MCDM for desktop GPUslinux.nightlyandwindows.nightlyarrays per requirementsWorkflow integration (
.github/workflows/test-wheel-windows.yml):DRIVER_MODEandGPU_TYPEto installer as environment variablesnvidia-smi -q, checks the "Driver Model" output, and validates it matches the expectedDRIVER_MODEfrom the matrix configurationExample Usage
The installer now adapts automatically based on matrix configuration:
{ "ARCH": "amd64", "PY_VER": "3.12", "CUDA_VER": "12.9.1", "GPU": "l4", "DRIVER_MODE": "MCDM" }Results in:
GPU_TYPEenv varus.download.nvidia.com/tesla/581.15/...nvidia-smi -fdm 2switches from TCC (default) to MCDM modenvidia-smi -qChecklist
Original prompt
This change addresses @NVIDIA/cuda-pythonNVIDIA/issues/1024 and @NVIDIA/cuda-pythonNVIDIA/issues/985, but the PR should target leofang/cuda-python, not NVIDIA/cuda-python.
Goals:
ci/toolsand update all call sites.test-matrix.jsoninto the installer script.Details and requirements:
Move install_gpu_driver.ps1 in cuda-python
.github/workflows/install_gpu_driver.ps1(seeci/tools/install_gpu_driver.ps1.Update install_gpu_driver.ps1 behavior and sync with CCCL
cuda-pythonshould be updated to follow the logic of CCCL’s installer at.github/actions/workflow-run-job-windows/install_gpu_driver.ps1(seeJOB_RUNNERenvironment variable.nvidia-smi -fdm 2plus a Display device reset). Extend this so that:cuda-pythoncopy, also update the original script in CCCL at.github/actions/workflow-run-job-windows/install_gpu_driver.ps1to support WDDM in the same way.Update
ci/test-matrix.jsonnightly sections in cuda-pythonci/test-matrix.json(seelinux.nightlyandwindows.nightly, make the arrays empty. Leave the keys present, but set them to an empty list:"nightly": [].linux.pull-requestorlinux.special_runnerssections.Expand the Windows test matrix in
ci/test-matrix.jsonci/test-matrix.json, modify thewindows.pull-requestmatrix to:3.10,3.11,3.12,3.13,3.14,3.14t.12.9.xand13.0.x. Please match the exact patch versions already used elsewhere in the repo (e.g.,12.9.1,13.0.2), keeping consistency with the Linux matrix when possible.windows.pull-request.GPUfrom the set(l4, rtxpro6000, a100, rtx6000ada)in a way similar to how GPUs are chosen forlinux.pull-request.DRIVERfield that currently holds values like"latest"or"earliest"; extend or repurpose this field (or add a new one if clearer) so that:l4anda100must use either TCC or MCDM (no WDDM).rtxpro6000,rtx6000ada) should use WDDM._sorted_byinvariant: matrices must remain sorted in ascending order by[ARCH, PY_VER, CUDA_VER, LOCAL_CTK, GPU, DRIVER]after your changes.Wire driver mode into the Windows workflows
ci/test-matrix.jsonis parsed for Windows workflows (e.g., CI jobs defined under.github/workflows/that run onwindows-*runners and useci/test-matrix.json).DRIVERor a dedicated field you...This pull request was created as a result of the following prompt from Copilot chat.
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.