Skip to content

Conversation

@adityagesh
Copy link
Collaborator

No description provided.

@adityagesh adityagesh requested a review from LiliDeng as a code owner November 26, 2025 05:32
@adityagesh adityagesh force-pushed the aditya/gpu_regression_fix branch from b35f262 to cc89971 Compare November 26, 2025 07:14
@adityagesh
Copy link
Collaborator Author

Test is ongoing. The PR potentially impacts a lot of distro and variants.

@adityagesh adityagesh changed the title GPU: Add unsupported distro list, add NVIDIA GPU driver installation for Debian GPU: Add unsupported distro list, add NVIDIA GPU driver installation for Debian, SUSE Nov 26, 2025
@adityagesh adityagesh marked this pull request as draft November 26, 2025 11:36
@adityagesh adityagesh force-pushed the aditya/gpu_regression_fix branch 5 times, most recently from ecf55d7 to 447f69c Compare December 3, 2025 06:26
@adityagesh adityagesh changed the title GPU: Add unsupported distro list, add NVIDIA GPU driver installation for Debian, SUSE GPU: Add unsupported distro list, add NVIDIA GPU driver installation for Debian Dec 3, 2025
@adityagesh adityagesh force-pushed the aditya/gpu_regression_fix branch from 447f69c to 6c67f3f Compare December 3, 2025 06:33
@adityagesh adityagesh marked this pull request as ready for review December 3, 2025 09:41
@adityagesh
Copy link
Collaborator Author

Not include changes for SUSE in this PR due to multiple issues faced while enabling SUSE. It may need more time

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds NVIDIA GPU driver installation support for Debian distributions and improves the overall GPU driver installation architecture. It refactors the Ubuntu installation to share common installation logic with Debian through a new _install_cuda_debian_based helper method, and improves test coverage by explicitly declaring unsupported operating systems in test requirements rather than checking at runtime.

Key Changes

  • Added Debian support for NVIDIA CUDA driver installation with version checks for Debian 10, 11, and 12
  • Refactored Ubuntu and Debian installation methods to share common logic via _install_cuda_debian_based
  • Changed _install_package_from_url from private to public method (install_package_from_url) in the Posix operating system class
  • Moved GPU test OS compatibility checks from runtime validation to declarative test requirements

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
lisa/tools/gpu_drivers.py Added Debian OS support, new constants for CUDA repos, refactored Ubuntu/Debian installation to share common logic, updated dependency installation for Debian, and changed private method calls to public
lisa/operating_system.py Changed _install_package_from_url method visibility from private to public
lisa/microsoft/testsuites/gpu/gpusuite.py Added explicit unsupported OS lists (AlmaLinux, Oracle, Suse) to test requirements and removed redundant runtime OS checks from _check_driver_installed
lisa/microsoft/testsuites/cvm/cvm_attestation_tool.py Updated method call from private _install_package_from_url to public install_package_from_url
lisa/sut_orchestrator/azure/features.py Updated TODO comment to clarify separation of concerns between cloud and OS-level GPU requirements

Testing Recommendations (LISA Guidelines):

Key Test Cases:

verify_load_gpu_driver|verify_gpu_extension_installation|verify_gpu_cuda_with_pytorch

Impacted LISA Features:

Gpu

Tested Azure Marketplace Images:

debian debian-11 11-gen2 latest
debian debian-12 12 latest
debian debian-12 12-gen2 latest
canonical 0001-com-ubuntu-server-jammy 22_04-lts-gen2 latest
canonical ubuntu-24_04-lts server latest

@LiliDeng
Copy link
Collaborator

LiliDeng commented Dec 8, 2025

@adityagesh please rebase the code, also let know once the PR is ready for review.

@LiliDeng
Copy link
Collaborator

LiliDeng commented Dec 8, 2025

Test fail when run against debian debian-13 13 0.20250911.2232, please double check

Move logic to skip GPU cases based on distro
to case requirement
Checking of GPU support is not supposed to happen in
verification process. The is_supported checks are spread across
multiple functions for GPU. We need to gradually reduce this
incosistency
@adityagesh adityagesh force-pushed the aditya/gpu_regression_fix branch from 6c67f3f to 85e67fb Compare December 8, 2025 19:19
adityagesh and others added 2 commits December 9, 2025 00:58
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@adityagesh
Copy link
Collaborator Author

adityagesh commented Dec 9, 2025

Test fail when run against debian debian-13 13 0.20250911.2232, please double check

While raising the PR, the cuda packages weren't available. Currently the package is available, but post installation the driver doesn't seem to be correctly installed. I think the package for Debian 13 is still not fully ready.

nvidia-smi --version
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

@LiliDeng LiliDeng merged commit 2ac566c into main Dec 10, 2025
58 checks passed
@LiliDeng LiliDeng deleted the aditya/gpu_regression_fix branch December 10, 2025 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants