Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new flag -"-gpu" to enable Nvidia container runtime #17314

Merged
merged 13 commits into from
Oct 6, 2023

Conversation

spowelljr
Copy link
Member

@spowelljr spowelljr commented Sep 27, 2023

Rework of #17287 that removes the nvidia-docker container-runtime and uses the --gpus flag instead.

$ minikube start --gpus all
πŸ˜„  minikube v1.31.2 on Debian rodete
✨  Using the docker driver based on user configuration
πŸ“Œ  Using Docker driver with root privileges
πŸ‘  Starting control plane node minikube in cluster minikube
🚜  Pulling base image ...
πŸ”₯  Creating docker container (CPUs=2, Memory=32100MB) ...
❗  Using GPUs with the Docker driver is experimental, if you experience any issues please report them at: https://github.com/kubernetes/minikube/issues/new/choose
πŸ› οΈ   Installing the NVIDIA Container Toolkit...
🐳  Preparing Kubernetes v1.28.2 on Docker 24.0.6 ...
    β–ͺ Generating certificates and keys ...
    β–ͺ Booting up control plane ...
    β–ͺ Configuring RBAC rules ...
πŸ”—  Configuring bridge CNI (Container Networking Interface) ...
πŸ”Ž  Verifying Kubernetes components...
    β–ͺ Using image nvcr.io/nvidia/k8s-device-plugin:v0.14.1
    β–ͺ Using image gcr.io/k8s-minikube/storage-provisioner:v5
🌟  Enabled addons: storage-provisioner, nvidia-device-plugin, default-storageclass
πŸ„  Done! kubectl is now configured to use "minikube" cluster and "default" namespace by default

$ cat << EOF | kubectl create -f -
apiVersion: v1
kind: Pod
metadata:
  name: nvidia-version-check
spec:
  restartPolicy: OnFailure
  containers:
  - name: nvidia-version-check
    image: "nvidia/cuda:11.0.3-base-ubuntu20.04"
    command: ["nvidia-smi"]
    resources:
      limits:
         nvidia.com/gpu: "1"
EOF
pod/nvidia-version-check created

$ kubectl logs nvidia-version-check
Fri Sep 22 18:45:31 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P1000        On   | 00000000:65:00.0 Off |                  N/A |
| 34%   26C    P8    N/A /  47W |     15MiB /  4096MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

$ minikube start --driver kvm --gpus all
πŸ˜„  minikube v1.31.2 on Debian rodete (kvm/amd64)
✨  Using the kvm2 driver based on user configuration

❌  Exiting due to MK_USAGE: The gpus flag can only be used with the docker driver and docker container-runtime

$ minikube start --gpus cat
πŸ˜„  minikube v1.31.2 on Debian rodete
✨  Automatically selected the docker driver

❌  Exiting due to MK_USAGE: The gpus flag must be passed a value of "nvidia" or "all"
Screenshot 2023-09-27 at 1 40 51 PM

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 27, 2023
@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Sep 27, 2023
cmd/minikube/cmd/start_flags.go Outdated Show resolved Hide resolved
cmd/minikube/cmd/start_flags.go Outdated Show resolved Hide resolved
@medyagh medyagh changed the title Automate installing NVIDIA Container Toolkit w/ flag new flag -"-gpu" to enable Nvidia container runtime Oct 3, 2023
@medyagh
Copy link
Member

medyagh commented Oct 3, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added the ok-to-test Indicates a non-member PR verified by an org member that is safe to test. label Oct 3, 2023
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@spowelljr spowelljr force-pushed the gpusFlag branch 3 times, most recently from 8569da1 to 81acfe3 Compare October 4, 2023 22:10
@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let also add a warning for the users that this feature is Beta and we like to get their feedback

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot

This comment has been minimized.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2023
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 6, 2023
Copy link
Member

@medyagh medyagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you !

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: medyagh, spowelljr

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@minikube-pr-bot
Copy link

kvm2 driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17314) |
+----------------+----------+---------------------+
| minikube start | 51.1s    | 51.4s               |
| enable ingress | 27.8s    | 28.4s               |
+----------------+----------+---------------------+

Times for minikube start: 51.1s 50.8s 52.5s 49.7s 51.5s
Times for minikube (PR 17314) start: 51.5s 51.7s 52.3s 51.3s 50.4s

Times for minikube ingress: 28.2s 28.1s 26.3s 28.5s 27.7s
Times for minikube (PR 17314) ingress: 28.1s 28.2s 28.5s 28.6s 28.7s

docker driver with docker runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17314) |
+----------------+----------+---------------------+
| minikube start | 24.0s    | 23.9s               |
| enable ingress | 20.7s    | 21.2s               |
+----------------+----------+---------------------+

Times for minikube ingress: 20.8s 20.3s 20.8s 20.8s 20.8s
Times for minikube (PR 17314) ingress: 20.8s 22.8s 20.9s 20.8s 20.4s

Times for minikube start: 24.6s 24.2s 24.8s 21.7s 24.6s
Times for minikube (PR 17314) start: 21.7s 25.1s 25.7s 22.0s 24.9s

docker driver with containerd runtime

+----------------+----------+---------------------+
|    COMMAND     | MINIKUBE | MINIKUBE (PR 17314) |
+----------------+----------+---------------------+
| minikube start | 22.6s    | 21.4s               |
| enable ingress | 32.1s    | 31.9s               |
+----------------+----------+---------------------+

Times for minikube start: 23.2s 19.9s 23.4s 22.9s 23.4s
Times for minikube (PR 17314) start: 20.8s 22.7s 23.3s 19.9s 20.3s

Times for minikube ingress: 31.3s 31.3s 47.3s 31.3s 19.3s
Times for minikube (PR 17314) ingress: 31.3s 31.3s 18.4s 47.3s 31.4s

@minikube-pr-bot

This comment has been minimized.

@minikube-pr-bot
Copy link

These are the flake rates of all failed tests.

Environment Failed Tests Flake Rate (%)
KVM_Linux_containerd TestAddons/parallel/Ingress (gopogh) 0.00 (chart)
KVM_Linux TestNoKubernetes/serial/StartNoArgs (gopogh) 4.84 (chart)
Docker_Linux_crio_arm64 TestPause/serial/SecondStartNoReconfiguration (gopogh) 10.40 (chart)
Hyper-V_Windows TestRunningBinaryUpgrade (gopogh) 32.00 (chart)

To see the flake rates of all tests by environment, click here.

@medyagh medyagh merged commit 3fabfbe into kubernetes:master Oct 6, 2023
24 of 37 checks passed
@spowelljr spowelljr deleted the gpusFlag branch October 9, 2023 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants