Pre-install nvidia container runtime + drivers on GPU instances #11628

olemarkus · 2021-05-30T08:33:17Z

install nvidia container runtime
install nvidia drivers
make it opt-in to avoid clash with those who prebake/use gpu operator or similar
add device plugin as addon

olemarkus · 2021-05-30T09:18:17Z

Worth mentioning that we override the containerd configuration override. Or at least modify it. kOps only knows about if the instance will have GPUs during nodeup, while cloudup will set the config override to the kOps default config by default (hence the containerd config override will always be set to the assumed "final" config).

hakman · 2021-06-09T03:35:06Z

The idea is good in general, we should make this easy for people.

In particular, here are some of my thoughts:

I think the vision for nodeup is to become dumb, as dump as possible. This is why we moved the containerd logic out of it. I think you can call it defensive programming, but probably it's a longer topic that would be good for tomorrow's office hours.
Maybe add this as a new option of the container runtime, allowing to override of the runtime to something other than runc. Describing instance types just for this purpose seems wasteful for every node. It can also be done much earlier when generating the cloud model.
The APT source task, maybe would be better just as APT key instead and a separate file task for the repo file.
The APT list is focused on Ubuntu, there is a different list for debian10 for example.
The APT key should be sent as a file asset url, same as the containerd package.
containerd config is done at the moment outside nodeup. The reason it didn't change was because of logic moving to nodeup that we want to avoid. Again, topic for office hours. Though, it either should move completely to nodeup or manage it outside it as we do now.

olemarkus · 2021-06-09T07:33:44Z

* I think the vision for nodeup is to become dumb, as dump as possible. This is why we moved the containerd logic out of it. I think you can call it defensive programming, but probably it's a longer topic that would be good for tomorrow's office hours.

* Maybe add this as a new option of the container runtime, allowing to override of the runtime to something other than runc. Describing instance types just for this purpose seems wasteful for every node. It can also be done much earlier when generating the cloud model.

The above has a challenge if someone creates a mixed instance group with both GPU and non-GPU instance types. I am not sure why anyone would do that, but it is possible, and if nvidia runtime is used on non-GPU, it will break.

That being said, we could do some hardware probing instead, I guess. Just seemed more reliable to describe the instance type.

* The APT source task, maybe would be better just as APT key instead and a separate file task for the repo file.

* The APT list is focused on Ubuntu, there is a different list for debian10 for example.

Yeah, I am aware. I plan on adding other distros as needed and in the interim solve this with some distro checks and documentation.

* The APT key should be sent as a file asset url, same as the containerd package.

Thanks. I didn't like the way I did this one. asset file url is a much better idea.

* containerd config is done at the moment outside nodeup. The reason it didn't change was because of logic moving to nodeup that we want to avoid. Again, topic for office hours. Though, it either should move completely to nodeup or manage it outside it as we do now.

If we find a way to do this in cloudup I'd also like to see if this can be put the final config aux. I don't like using the override flag as a carry mechanism cloudup -> nodeup.

hakman · 2021-06-09T07:41:18Z

* I think the vision for nodeup is to become dumb, as dump as possible. This is why we moved the containerd logic out of it. I think you can call it defensive programming, but probably it's a longer topic that would be good for tomorrow's office hours.

* Maybe add this as a new option of the container runtime, allowing to override of the runtime to something other than runc. Describing instance types just for this purpose seems wasteful for every node. It can also be done much earlier when generating the cloud model.
The above has a challenge if someone creates a mixed instance group with both GPU and non-GPU instance types. I am not sure why anyone would do that, but it is possible, and if nvidia runtime is used on non-GPU, it will break.

That being said, we could do some hardware probing instead, I guess. Just seemed more reliable to describe the instance type.

No need for that, we have validation. If we can validate that instances should not have a mix of ARM and AMD, we can make sure GPU instances are validated in same way

* The APT source task, maybe would be better just as APT key instead and a separate file task for the repo file.

* The APT list is focused on Ubuntu, there is a different list for debian10 for example.
Yeah, I am aware. I plan on adding other distros as needed and in the interim solve this with some distro checks and documentation.

Meant that the code checks for Debian family instead of Ubuntu specific

* The APT key should be sent as a file asset url, same as the containerd package.
Thanks. I didn't like the way I did this one. asset file url is a much better idea.

👍

* containerd config is done at the moment outside nodeup. The reason it didn't change was because of logic moving to nodeup that we want to avoid. Again, topic for office hours. Though, it either should move completely to nodeup or manage it outside it as we do now.
If we find a way to do this in cloudup I'd also like to see if this can be put the final config aux. I don't like using the override flag as a carry mechanism cloudup -> nodeup.

I agree, but to keep this on track we can do it later, when the config aux is merged. I agree that aux would be a better place for it.

olemarkus · 2021-06-09T08:03:20Z

So if we don't allow mixing GPU and non-GPU instance types, building the config cloudup should be fine.
I would need to send some setting to nodeup telling it to do those package installs. Again seems like something worth putting on aux.

hakman · 2021-06-09T08:16:28Z

A prerequisite would be to allow configuring containerd for each IG, not sure if possible now. Once that is possible, the package installs can just be done in the ContainerRuntime builder. Package installs would still be hardcoded I guess for now based on OS type.

olemarkus · 2021-06-09T08:52:00Z

Right. I think that is a challenge with how nodeup config is being built. Probably the containerd config should be built based on IG and cluster spec and then written to one of the configs rather than writing it back to the IG/cluster spec. But we are a long way away from that.

I think continuing the current path makes sense for now. There is nothing here that is exposed to the user, so things can be moved later on (maybe even before 1.22 GA)

olemarkus · 2021-06-29T19:03:44Z

I didn't do many of the APT changes. Using assets was challenging as it is arch dependent, which doesn't make sense in this case. It could also run into run-time issues if we use hashes (upstream can change keys) and it is the TLS cert we trust here.

olemarkus · 2021-06-29T19:04:15Z

I'll defer device addon to another PR.

johngmyers · 2021-06-30T05:01:14Z

Determining the arch of an instance and whether or not it has a GPU appears to be firmly in the bailiwick of nodeup. I don't see why there's an objection to putting such logic in there. Making the admin configure this through the API is just causing them grief.

I do not understand or agree with this desire to make nodeup dumb. It should do the things it is well suited for and not do the things it is not well suited for.

hakman · 2021-06-30T05:16:32Z

GPU stuff is something that very few operators need or use. Checking each node in every cluster if it has that capability just for that is not exactly ideal. More over, this has to work on the various supported platforms, not just AWS.

johngmyers · 2021-06-30T05:50:55Z

Can't nodeup to the equivalent of lspci or somesuch?

justinsb · 2021-09-05T14:30:12Z

upup/pkg/fi/cloudup/populate_instancegroup_spec.go

+				ig.Spec.NodeLabels = make(map[string]string)
+			}
+			ig.Spec.NodeLabels["kops.k8s.io/gpu"] = "1"
+			ig.Spec.Taints = append(ig.Spec.Taints, "nvidia.com/gpu:NoSchedule")


I think this is a behavioral change? i.e. workloads that previously ran on GPU instances will need to add this toleration to remain schedulable?

Or is there a webhook or similar that will add this automatically?

Good catch. I think this one should additionally be blocked by NvidiaGPU.enabled. That way one has explicitly subscribed to kOps way of doing things. I'll add docs to match as well.

justinsb · 2021-09-05T14:36:31Z

pkg/model/iam/iam_builder.go

@@ -765,6 +765,7 @@ func addNodeupPermissions(p *Policy, enableHookSupport bool) {
 	addASLifecyclePolicies(p, enableHookSupport)
 	p.unconditionalAction.Insert(
 		"ec2:DescribeInstances", // aws.go
+		"ec2:DescribeInstanceTypes",


I was originally a little sad that we couldn't get this from the metadata or DescribeInstances. But I came round to your point of view, because InstanceTypes are generic data that won't really be sensitive, and if there's one we should get rid of it's probably DescribeInstances :-)

justinsb · 2021-09-05T14:38:57Z

I think this looks great, and as we agreed in office hours we can nest it under containerd.

My one remaining challenge is the automatic taint, because I'm worried that we'll break clusters that upgrade that are using GPU instances until users add that toleration. Or is there some other mechanism at play here: e.g. everyone is already using this toleration, or there's a webhook that adds it?

olemarkus · 2021-09-05T20:36:51Z

These concerns should be addressed now.

justinsb · 2021-09-10T15:51:54Z

Thanks @olemarkus

This lgtm now and we did agree on this at our last discussion; I'm going to approve but hold until after office hours.

/approve
/lgtm
/hold

k8s-ci-robot · 2021-09-10T15:52:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinsb

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [justinsb]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

hakman · 2021-09-10T17:35:13Z

/lgtm

olemarkus · 2021-09-11T19:16:18Z

/hold cancel

…628-origin-release-1.22 Automated cherry pick of #11628: Add nvidia configuration to the api

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/nodeup labels May 30, 2021

k8s-ci-robot requested review from KashifSaadat and mikesplain May 30, 2021 08:33

olemarkus force-pushed the gpu-runtime branch from a630bc1 to 3f893ae Compare May 30, 2021 09:13

olemarkus force-pushed the gpu-runtime branch 2 times, most recently from 9798157 to 57fc757 Compare May 30, 2021 09:36

k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 30, 2021

olemarkus requested a review from hakman May 30, 2021 09:43

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label May 31, 2021

olemarkus force-pushed the gpu-runtime branch from 57fc757 to c840345 Compare June 29, 2021 18:11

k8s-ci-robot added area/api and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jun 29, 2021

olemarkus changed the title ~~[WIP] Pre-install nvidia container runtime + drivers on GPU instances~~ Pre-install nvidia container runtime + drivers on GPU instances Jun 29, 2021

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 29, 2021

olemarkus requested a review from hakman September 2, 2021 18:49

olemarkus force-pushed the gpu-runtime branch from 13c26aa to 5e34b41 Compare September 2, 2021 19:52

justinsb reviewed Sep 5, 2021

View reviewed changes

Ole Markus With added 8 commits September 5, 2021 20:09

Add nvidia configuration to the api

e9b0f28

Install nvidia container runtime

2d013e4

Have instances learn about their GPU capabilities

4ab75b0

Add validation rules for nvidia

528807c

Don't allow IGs with both GPU and non-GPU types

b852a80

Install nvidia device driver addon

b144304

Add labels and taints to gpu nodes

dae4b12

Move nvidia config under containerd

f5fed2a

olemarkus force-pushed the gpu-runtime branch from 5e34b41 to f5fed2a Compare September 5, 2021 18:35

Add docs on gpu

25f7ba9

k8s-ci-robot added the area/documentation label Sep 5, 2021

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 10, 2021

k8s-ci-robot assigned justinsb Sep 10, 2021

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 10, 2021

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 10, 2021

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Sep 11, 2021

k8s-ci-robot merged commit 1b431b4 into kubernetes:master Sep 11, 2021

k8s-ci-robot modified the milestones: v1.22, v1.23 Sep 11, 2021

olemarkus mentioned this pull request Sep 11, 2021

Automated cherry pick of #11628: Add nvidia configuration to the api #12315

Merged

k8s-ci-robot added a commit that referenced this pull request Sep 16, 2021

Merge pull request #12315 from olemarkus/automated-cherry-pick-of-#11…

1f4a364

…628-origin-release-1.22 Automated cherry pick of #11628: Add nvidia configuration to the api

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pre-install nvidia container runtime + drivers on GPU instances #11628

Pre-install nvidia container runtime + drivers on GPU instances #11628

olemarkus commented May 30, 2021 •

edited

Loading

olemarkus commented May 30, 2021

hakman commented Jun 9, 2021

olemarkus commented Jun 9, 2021

hakman commented Jun 9, 2021

olemarkus commented Jun 9, 2021

hakman commented Jun 9, 2021

olemarkus commented Jun 9, 2021

olemarkus commented Jun 29, 2021

olemarkus commented Jun 29, 2021

johngmyers commented Jun 30, 2021

hakman commented Jun 30, 2021

johngmyers commented Jun 30, 2021

justinsb Sep 5, 2021

olemarkus Sep 5, 2021

justinsb Sep 5, 2021

justinsb commented Sep 5, 2021

olemarkus commented Sep 5, 2021

justinsb commented Sep 10, 2021

k8s-ci-robot commented Sep 10, 2021

hakman commented Sep 10, 2021

olemarkus commented Sep 11, 2021

Pre-install nvidia container runtime + drivers on GPU instances #11628

Pre-install nvidia container runtime + drivers on GPU instances #11628

Conversation

olemarkus commented May 30, 2021 • edited Loading

olemarkus commented May 30, 2021

hakman commented Jun 9, 2021

olemarkus commented Jun 9, 2021

hakman commented Jun 9, 2021

olemarkus commented Jun 9, 2021

hakman commented Jun 9, 2021

olemarkus commented Jun 9, 2021

olemarkus commented Jun 29, 2021

olemarkus commented Jun 29, 2021

johngmyers commented Jun 30, 2021

hakman commented Jun 30, 2021

johngmyers commented Jun 30, 2021

justinsb Sep 5, 2021

Choose a reason for hiding this comment

olemarkus Sep 5, 2021

Choose a reason for hiding this comment

justinsb Sep 5, 2021

Choose a reason for hiding this comment

justinsb commented Sep 5, 2021

olemarkus commented Sep 5, 2021

justinsb commented Sep 10, 2021

k8s-ci-robot commented Sep 10, 2021

hakman commented Sep 10, 2021

olemarkus commented Sep 11, 2021

olemarkus commented May 30, 2021 •

edited

Loading