Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[k8s][docs] Clarify nvidia runtime is required for k8s #2957

Merged
merged 2 commits into from
Jan 9, 2024

Conversation

romilbhardwaj
Copy link
Collaborator

Clarifies that nvidia must also be set as the default runtime for the container engine. This is a common gotcha on RKE2, since users may use the nvidia's default instructions instead of their recommendation for RKE2.

Tested (run the relevant ones):

  • Locally rendered docs
image
  • Tested on RKE2 cluster
    image

@romilbhardwaj romilbhardwaj added the k8s Kubernetes related items label Jan 9, 2024
@@ -169,9 +169,12 @@ Setting up GPU support
~~~~~~~~~~~~~~~~~~~~~~
If your Kubernetes cluster has Nvidia GPUs, ensure that:

1. The Nvidia device plugin is installed (i.e., ``nvidia.com/gpu`` resource is available on each node).
1. The Nvidia GPU operator is installed (i.e., ``nvidia.com/gpu`` resource is available on each node) and is ``nvidia`` is set as the default runtime for your container engine. See `Nvidia's installation guide <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#install-nvidia-gpu-operator>`_ for more details.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. The Nvidia GPU operator is installed (i.e., ``nvidia.com/gpu`` resource is available on each node) and is ``nvidia`` is set as the default runtime for your container engine. See `Nvidia's installation guide <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#install-nvidia-gpu-operator>`_ for more details.
1. The Nvidia GPU operator is installed (i.e., ``nvidia.com/gpu`` resource is available on each node) and ``nvidia`` is set as the default runtime for your container engine. See `Nvidia's installation guide <https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html#install-nvidia-gpu-operator>`_ for more details.

Is it possible to give a quick tip on how to check the second condition

nvidia is set as the default runtime for your container engine.

?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked, unfortunately it depends on the specific container runtime in use (and can be overriden by k8s). Best is to check by running a small job, added a note + k8s yaml on that.

Copy link
Collaborator

@concretevitamin concretevitamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @romilbhardwaj.

@romilbhardwaj romilbhardwaj merged commit 1e53317 into master Jan 9, 2024
19 checks passed
@romilbhardwaj romilbhardwaj deleted the k8s_nvidiaruntime_docs branch January 9, 2024 01:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
k8s Kubernetes related items
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants