NVIDIA driver installation support on GPU instances #645

everpeace · 2017-05-11T16:34:46Z

Hello community! Thank you for developing the great tool! I'm glad to have opportunity to contribute this project because my colleague @mumoshu always encourages me.

As everybody know, AWS offers Nvidia GPU ready instance type families (P2 and G2). And, of course Kubernetes supports GPU resource scheduling since 1.6. However Nvidia drivers is not installed in default coreos ami used in kube-aws. Then, let's support it!

This PR implements auto installation support of Nvidia GPU driver. I borrowed some driver installation script from /Clarifai/coreos-nvidia.

Design summary

Configuration and what will happen

New configuration for this feature is really simple. worker.nodePool[i].gpu.nvidia.{enabled,version} is introduced in cluster.yaml.

default value of enabled is false.
user will be warned if
- user set enabled: true when instanceType doesn't support GPU. In this case the configuration will be ignored.
- user set enabled: false when instanceType does support GPU
when enabled: true on GPU supported instance type,
- nvidia driver will be installed automatically in each node in the nodepool.
- The installation will happen just before kubelet.service starting (see below).
- And, kubelet will start with --feature-gates="Accelerators=true"
- then container can mount nvidia driver like this
several tags are assigned to the node for enabling schedule on appropriate GPU model and its driver version by using nodeAffinity.
- alpha.kubernetes.io/nvidia-gpu-name=<GPU hardware type name>
- kube-aws.coreos.com/gpu=nvidia,
- kube-aws.coreos.com/nvidia-gpu-version=<version>
- Because substitution are not used in unit definition, I introduced /etc/default/kubectl for defining these label values in this commit.

Driver installation process

Most of installation script is borrowed from /Clarifai/coreos-nvidia. Especially, for device node installation, I referenced to Clarifai/coreos-nvidia#4 . I just described summary of installation process.

kubelet.service ruires nvidia-start.service
nvidia-start.service invokes build-and-install.sh, which installs nvidia drivers and kernel module files, via ExecStartPre. nvidia-start.service will create device nodes(nvidiactl and nvidia0,1,...). Other dynamic device nodes are controlled byudevadam (configuration is in this rule file)
- nvidia-start.service is type=oneshot because kubelet.service should wait until nvidia-start.sh completely succeeded.
- Restart policy cannot be used withtype=oneshot. nvidia-start.service doesn't use systemd's retry feature is not used but manual retry.sh is used.
nvidia-persistenced is also enabled for speeding up startup. this service is started/stopped via udevadam too.

How to try

build kube-aws on this branch

kube-aws up with minimal nodepool configuration below

worker:
 nodePools:
  - name: p2xlarge
    count: 1
    instanceType: p2.xlarge
    rootVolume:
      size: 30
      type: gp2
    gpu:
      nvidia:
        enabled: true
        version: "375.66"

check kubectl get nodes --show-labels. Then you'll see one node with gpu related labels.
try starting this pod
```
kubectl create -f pod.yaml
```
log reports sample matrix multiplication is computed on gpus.
```
kubectl logs gpu-pod
```

Feedbacks are always welcome!!

k8s-ci-robot · 2017-05-11T16:34:49Z

Thanks for your pull request. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please follow instructions at https://github.com/kubernetes/kubernetes/wiki/CLA-FAQ to sign the CLA.

It may take a couple minutes for the CLA signature to be fully registered; after that, please reply here with a new comment and we'll verify. Thanks.

If you've already signed a CLA, it's possible we don't have your GitHub username or you're using a different email address. Check your existing CLA data and verify that your email is set on your git commits.
If you signed the CLA as a corporation, please sign in with your organization's credentials at https://identity.linuxfoundation.org/projects/cncf to be authorized.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

codecov-io · 2017-05-11T16:46:03Z

Codecov Report

Merging #645 into master will decrease coverage by 1.12%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master     #645      +/-   ##
==========================================
- Coverage   38.26%   37.14%   -1.13%     
==========================================
  Files          51       51              
  Lines        3316     3201     -115     
==========================================
- Hits         1269     1189      -80     
+ Misses       1845     1836       -9     
+ Partials      202      176      -26

Impacted Files	Coverage Δ
model/gpu.go	`0% <0%> (ø)`
model/node_pool_config.go	`20.28% <0%> (-0.93%)`	⬇️
core/controlplane/config/credential.go	`57.14% <0%> (-3.09%)`	⬇️
core/controlplane/config/tls_config.go
core/controlplane/config/token_config.go
core/controlplane/config/encrypted_assets.go	`73.09% <0%> (ø)`
core/controlplane/config/config.go	`56.42% <0%> (+0.44%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f22f2cf...0546e94. Read the comment docs.

mumoshu · 2017-05-12T00:49:00Z

cc @jollinshead @redbaron I believe you're currently running batch workloads on your kube-aws clusters. Are you also willing to run machine learning workloads utilizing GPUs? 😃

mumoshu · 2017-05-12T03:19:36Z

core/controlplane/config/templates/cluster.yaml

+#      # (Experimental) GPU Driver installation support
+#      # Currently, only Nvidia driver is supported.
+#      # This setting takes effect only when node's instance family is p2 of g2.
+#      # Otherwise, installation will be skipped even if enabled.


According to https://github.com/kubernetes-incubator/kube-aws/pull/645/files#diff-5e5dcac90c0e906cb335a42b0352ce9cR47?, it seems like kube-aws emits a validation error when the gpu support is enabled on a node pool with instance type other than p2 or g2?

Ah, sorry! I've missed that there is only a warning.

Anyway, I believe we'd better make it an error rather than a warning because an user does seem to intend to enable the GPU support but kube-aws was unable to do so.
WDYT?

oh, ok. yes, I agree with you. kube-aws should prohibit 'enabled:true' with instance types which doesn't support GPU. I will update my code.

mumoshu · 2017-05-12T03:26:23Z

core/controlplane/config/templates/cloud-config-worker

+      ExecStart=/opt/nvidia/current/bin/nvidia-persistenced --user nvidia-persistenced --no-persistence-mode --verbose
+      ExecStopPost=/bin/rm -rf /var/run/nvidia-persistenced
+
+  - path: /opt/nvidia-build/nvidia-start.service


Just for my education, would you mind sharing me how this systemd unit gets installed to systemd?
Can we set up the systemd unit via the units section of cloud-config like others?

please see the comment. nvidia-install.sh does.

mumoshu · 2017-05-12T03:54:36Z

core/controlplane/config/templates/cloud-config-worker

+      tar -C ${ARTIFACT_DIR} -cvj ${TOOLS} > tools-${VERSION}.tar.bz2
+      tar -C ${ARTIFACT_DIR}/kernel -cvj $(basename -a ${ARTIFACT_DIR}/kernel/*.ko) > modules-${COMBINED_VERSION}.tar.bz2
+
+  - path: /opt/nvidia-build/build.sh


AFAICS, coreos-nvidia supports cross-building against any version of Container Linux.
Then, just curious, but can we run build.sh locally in e.g. a Vagrant machine hosting Container Linux and then export built assets to the disk, then embed to cloud-config or put to S3 for faster startup of GPU-enabled nodes?

yeah, probably. I will try this my local vagrant!

As you pointed out, building libraries, kernel modules and nvidia tools succeeded in vagrant!
Puting pre-build binaries to gpu nodes directly makes it startup faster. However kube-aws up process will be a more complex instead and it requires users to install vagrant and virtualbox.

Honestly speaking, building process takes several minutes (it probably 5 ~10 min. which depends on speed of downloading coreos dev container and nvidia installer). I believe this duration would be acceptable for many users because gpu node pool usually doesn't need to scale so quickly like normal nodepool which hosts service pods.

What do you think?? Do you prefer local build & put pre-build binaries directly for faster startup??

I believe this duration would be acceptable for many users because gpu node pool usually doesn't need to scale so quickly like normal nodepool which hosts service pods.

I completely agree with you here 👍
The local build feature could be an extra thing we may or may not add in the future.

mumoshu · 2017-05-12T03:57:03Z

model/gpu.go

+}
+
+// This function is used when rendering cloud-config-worker
+func (c NvidiaSetting) IsEnabledOn(instanceType string) bool {


I like the nice naming 👍

mumoshu · 2017-05-12T04:03:52Z

core/controlplane/config/templates/cluster.yaml

+#      # Make sure to choose 'docker' as container runtime when enabled this feature.
+#      gpu:
+#        nvidia:
+#          enabled: true


I guess an installed driver can be unusable once Container Linux is updated afterwards, due to an updated kernel.
// Do you think so too?
Then, I believe we'd better document it - maybe something like:

Ensure that automatic Container Linux is disabled(it is disabled by default btw). Otherwise the installed driver may stop working when an OS update resulted in an updated kernel

would work.

Yes, you're absolutely right. I will put this to the comment.

everpeace · 2017-05-12T12:44:20Z

core/controlplane/config/templates/cloud-config-worker

+      cp *.service /etc/systemd/system/
+      systemctl daemon-reload
+      systemctl enable nvidia-start.service
+      systemctl start nvidia-start.service


nvidia-install.sh install nvidia-start.service and nvidia-persitenced.service to the systemd. and this script only stats nvidia-start.service. the unit insmod nvidia module and then udevadam will spawn several actions in 71-nvidia.rules which includes nvidia-persisnteced.service.

Thanks! So, basically we're copying systemd unit files written via write_files into /etc/systemd/system?
Then, it is possible to just write those units directly into the units section, right?

if we put nvidia-start.service by unit section, can we control when the service start? Unless we don't define enabled explicitly in unit definition, this doesn't automatically, right?

I guess so.

You can even enable it by default.
More concretely, my idea is modifying nvidia-start.service to something like:

[Unit] Description=Start NVIDIA daemon(?) After=local-fs.target Before=kublet.service [Service] Type=oneshot RemainAfterExit=true ExecStartPre=/opt/bin/retry /opt/nvidia/current/bin/nvidia-start.sh ExecStart=/bin/true [Install] RequiredBy=kubelet.service

And trigger it via a systemd dependency from a newly introduced nvidia-install.service:

[Unit] Description=Start NVIDIA daemon(?) After=local-fs.target Before=nvidia-start.service [Service] Type=oneshot RemainAfterExit=true ExecStartPre=/opt/bin/retry /opt/nvidia/current/bin/nvidia-install.sh ExecStart=/bin/true [Install] RequiredBy=nvidia-start.service

where /opt/bin/retry is:

#/usr/bin/bash set -e # could be improved to a finite loop while true; do if "$@"; then exit 0 fi echo retrying "$@" # could be improved to an exponential backoff sleep 1 done

and omit ExecStartPre=/opt/nvidia-build/build-and-install.sh from kubelet.service.

…nce types. This change is caused by: kubernetes-retired#645 (comment)

This change is caused by: kubernetes-retired#645 (comment)

creation for nvidia-persistenced user to `users` section, too. This change is caused by: kubernetes-retired#645 (comment)

mumoshu · 2017-05-16T12:48:02Z

core/controlplane/config/templates/cloud-config-worker

+
+      systemctl daemon-reload
+      systemctl enable nvidia-start.service
+      systemctl start nvidia-start.service


It may sound like a nit but -

If we could transform the build-and-install.sh in ExecStartPre to a systemd unit like suggested in #645 (comment), systemctl daemon-reload and enable, start can be omitted altogether and such dependencies can be handled completely by systemd rather than by a bash script?

@mumoshu
Yes, sounds nice! I'll fix this.

deleted `systemctl` command from bash script. Instead, above unit dependency is introduced. nvidia-install.service, which just invokes build-and-install.sh is implemented type=oneshot because nvidia-start should wait until nvidia-install.service successed completely. Enabling retry build-and-install.sh, /opt/nvidia-build/util/retry.sh is introduced. It is because type=oneshot and Restart=always can't be used in systemd.

…ld-and-install.sh via ExecStartPre with retry.sh kubelet.service 'Requires' and 'After' nvidia-star.service.

everpeace · 2017-05-18T13:20:32Z

I updated 'Driver installation process' in this PR summary because of 2710606 and 0546e94

everpeace · 2017-05-20T02:30:42Z

@mumoshu I updated systemd units' dependencies. I'm glad if you could take a look!

mumoshu · 2017-05-22T00:27:32Z

@everpeace LGTM. Thanks for your efforts on the great feature 👍

* kubernetes-incubator/master: Fix "install-kube-system" script when "clusterAutoscaler" is disabled. Remove obsolete etcd locking logic Re: cluster-autoscaler support Make `go test` timeout longer enough for Travis Fixes kubernetes-retired#667 NVIDIA driver installation support on GPU instances (kubernetes-retired#645) Make kubelet flags more consistent Fix taint being assigned as labels Avoid unnecessary node replacements when TLS bootstrapping is enabled (kubernetes-retired#639) Update Kubernetes dashboard to v1.6.1. Update calico to v2.2.1. Fix typo in help message

kylegato · 2017-07-19T20:16:33Z

Has anyone tested this w/ the new G3 instances yet?

…ed#645) AWS offers Nvidia GPU ready instance type families (P2 and G2). And, of course Kubernetes supports GPU resource scheduling since 1.6. However Nvidia drivers is not installed in default coreos ami used in kube-aws. Then, let's support it! This implements auto installation support of Nvidia GPU driver. Some driver installation script are borrowed from [/Clarifai/coreos-nvidia](https://github.com/Clarifai/coreos-nvidia/). ## Design summary ### Configuration and what will happen New configuration for this feature is really simple. `worker.nodePool[i].gpu.nvidia.{enabled,version}` is introduced in `cluster.yaml`. - default value of `enabled` is false. - user will be warned if - user set `enabled: true` when `instanceType` doesn't support GPU. In this case the configuration will be ignored. - user set `enabled: false` when `instanceType` does support GPU - when `enabled: true` on GPU supported instance type, - nvidia driver will be installed automatically in each node in the nodepool. - The installation will happen just before `kubelet.service` starting (see below). - And, `kubelet` will start with [`--feature-gates="Accelerators=true"`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L212-L214) - then container can mount nvidia driver [like this](https://gist.github.com/everpeace/9e03050467d5ef5f66b7ce96b5fefa72#file-pod-yaml-L30-L53) - several tags are assigned to the node for enabling schedule on appropriate GPU model and its driver version by using `nodeAffinity`. - `alpha.kubernetes.io/nvidia-gpu-name=<GPU hardware type name>` - `kube-aws.coreos.com/gpu=nvidia`, - `kube-aws.coreos.com/nvidia-gpu-version=<version>` - Because substitution are not used in unit definition, I introduced `/etc/default/kubectl` for defining these label values in [this commit](kubernetes-retired@5c59944). ### Driver installation process Most of installation script is borrowed from [/Clarifai/coreos-nvidia](https://github.com/Clarifai/coreos-nvidia/). Especially, for device node installation, I referenced to Clarifai/coreos-nvidia#4 . I just described summary of installation process. - [`kubelet.service`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L144-L147) ruires [`nvidia-start.service`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L456-L471) - [`nvidia-start.service`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L456-L471) invokes [`build-and-install.sh`](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L918-L947), which installs nvidia drivers and kernel module files, via `ExecStartPre`. `nvidia-start.service` will create device nodes(`nvidiactl` and `nvidia0,1,...`). Other dynamic device nodes are controlled by`udevadam` (configuration is in [this rule file](https://github.com/everpeace/kube-aws/blob/feature/nvidia-gpu/core/controlplane/config/templates/cloud-config-worker#L905-L939)) - `nvidia-start.service` is `type=oneshot` because `kubelet.service` should wait until `nvidia-start.sh` completely succeeded. - `Restart` policy cannot be used with`type=oneshot`. `nvidia-start.service` doesn't use systemd's retry feature is not used but manual `retry.sh` is used. - [nvidia-persistenced](https://docs.nvidia.com/deploy/driver-persistence/#persistence-daemon) is also enabled for speeding up startup. this service is started/stopped via `udevadam` too. ## How to try 1. build `kube-aws` on this branch 2. `kube-aws up` with minimal nodepool configuration below ``` worker: nodePools: - name: p2xlarge count: 1 instanceType: p2.xlarge rootVolume: size: 30 type: gp2 gpu: nvidia: enabled: true version: "375.66" ``` 3. check `kubectl get nodes --show-labels`. Then you'll see one node with gpu related labels. 4. try starting this [pod](https://gist.github.com/everpeace/9e03050467d5ef5f66b7ce96b5fefa72#file-pod-yaml) ``` kubectl create -f pod.yaml ``` 5. log reports sample matrix multiplication is computed on gpus. ``` kubectl logs gpu-pod ``` ## Full changelog * add /etc/default/kubelet to worker nodes. * add nvidia driver installation support. * add gpu related config test. * it should be error when user gpu.nvidia.true with GPU unspported intance types. This change is caused by: kubernetes-retired#645 (comment) * add note which warns that driver may stop working when OS is updated. This change is caused by: kubernetes-retired#645 (comment) * move nvidia-{start, persisntenced}.service to `coreos.units` section. creation for nvidia-persistenced user to `users` section, too. This change is caused by: kubernetes-retired#645 (comment) * introduce unit dependency: kubelet --> nvidia-start --> nvidia-install deleted `systemctl` command from bash script. Instead, above unit dependency is introduced. nvidia-install.service, which just invokes build-and-install.sh is implemented type=oneshot because nvidia-start should wait until nvidia-install.service successed completely. Enabling retry build-and-install.sh, /opt/nvidia-build/util/retry.sh is introduced. It is because type=oneshot and Restart=always can't be used in systemd. * delete nvidia-install.service and now nvidia-start.service invoke build-and-install.sh via ExecStartPre with retry.sh kubelet.service 'Requires' and 'After' nvidia-star.service.

everpeace added 3 commits May 11, 2017 23:55

add /etc/default/kubelet to worker nodes.

5c59944

add nvidia driver installation support.

e2de4ca

add gpu related config test.

07b9229

k8s-ci-robot added the cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. label May 11, 2017

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: no Indicates the PR's author has not signed the CNCF CLA. labels May 11, 2017

mumoshu changed the title ~~Support Nvidia driver installation support on GPU instances.~~ NVIDIA driver installation support on GPU instances May 12, 2017

mumoshu reviewed May 12, 2017

View reviewed changes

mumoshu modified the milestones: backlog, wip, tbd May 12, 2017

mumoshu added this to To be reminded in v0.9.7 May 12, 2017

mumoshu modified the milestones: tbd, v0.9.7-rc.<tbd> May 12, 2017

mumoshu added the awaiting reply label May 12, 2017

everpeace commented May 12, 2017

View reviewed changes

everpeace added 3 commits May 12, 2017 23:16

it should be error when user gpu.nvidia.true with GPU unspported inta…

571e27b

…nce types. This change is caused by: kubernetes-retired#645 (comment)

add note which warns that driver may stop working when OS is updated.

063713b

This change is caused by: kubernetes-retired#645 (comment)

move nvidia-{start, persisntenced}.service to coreos.units section.

bb77f9f

creation for nvidia-persistenced user to `users` section, too. This change is caused by: kubernetes-retired#645 (comment)

mumoshu reviewed May 16, 2017

View reviewed changes

everpeace force-pushed the feature/nvidia-gpu branch from 48adb9b to 2710606 Compare May 17, 2017 15:52

delete nvidia-install.service and now nvidia-start.service invoke bui…

0546e94

…ld-and-install.sh via ExecStartPre with retry.sh kubelet.service 'Requires' and 'After' nvidia-star.service.

mumoshu merged commit ae54601 into kubernetes-retired:master May 22, 2017

mumoshu modified the milestones: v0.9.7-rc.1, v0.9.7-rc.<tbd> May 22, 2017

everpeace mentioned this pull request May 25, 2017

Update ROADMAP.md #675

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA driver installation support on GPU instances #645

NVIDIA driver installation support on GPU instances #645

everpeace commented May 11, 2017 •

edited

k8s-ci-robot commented May 11, 2017

codecov-io commented May 11, 2017 •

edited

mumoshu commented May 12, 2017

mumoshu May 12, 2017

mumoshu May 12, 2017

everpeace May 12, 2017

mumoshu May 12, 2017

everpeace May 12, 2017

mumoshu May 12, 2017

everpeace May 12, 2017 •

edited

everpeace May 12, 2017

mumoshu May 12, 2017

mumoshu May 12, 2017

mumoshu May 12, 2017

everpeace May 12, 2017

everpeace May 12, 2017

mumoshu May 12, 2017 •

edited

everpeace May 12, 2017 •

edited

mumoshu May 12, 2017

mumoshu May 16, 2017

everpeace May 17, 2017

everpeace commented May 18, 2017

everpeace commented May 20, 2017

mumoshu commented May 22, 2017

kylegato commented Jul 19, 2017

NVIDIA driver installation support on GPU instances #645

NVIDIA driver installation support on GPU instances #645

Conversation

everpeace commented May 11, 2017 • edited

Design summary

Configuration and what will happen

Driver installation process

How to try

k8s-ci-robot commented May 11, 2017

codecov-io commented May 11, 2017 • edited

Codecov Report

mumoshu commented May 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

everpeace May 12, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mumoshu May 12, 2017 • edited

Choose a reason for hiding this comment

everpeace May 12, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

everpeace commented May 18, 2017

everpeace commented May 20, 2017

mumoshu commented May 22, 2017

kylegato commented Jul 19, 2017

everpeace commented May 11, 2017 •

edited

codecov-io commented May 11, 2017 •

edited

everpeace May 12, 2017 •

edited

mumoshu May 12, 2017 •

edited

everpeace May 12, 2017 •

edited