add kernel version to node label #51006

xilabao · 2017-08-21T09:04:31Z

What this PR does / why we need it:
Some pods may need a higher kernel version. so we doesn't need to upgrade the kernel for all node.
The label will be updated at the same time when the node start up.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #51234

Special notes for your reviewer:

Release note:

add kernel version to node label

k8s-ci-robot · 2017-08-21T09:04:39Z

Hi @xilabao. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

FengyunPan · 2017-08-21T09:18:52Z

It seems we can get kernel version from node: node.Status.NodeInfo.KernelVersion.

xilabao · 2017-08-21T09:23:41Z

@FengyunPan the change is for using node affinity.

dchen1107 · 2017-08-23T23:30:01Z

/ok-to-test

dchen1107 · 2017-08-23T23:30:28Z

/lgtm

k8s-github-robot · 2017-08-23T23:30:41Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: dchen1107, xilabao

No associated issue. Update pull-request body to add a reference to an issue, or get approval with /approve no-issue

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

~~pkg/kubelet/OWNERS~~ [dchen1107]

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

dchen1107 · 2017-08-23T23:32:15Z

@xilabao can you file an issue for this request? Also I think the change is small, but deserve a release note.

xilabao · 2017-08-24T00:44:41Z

/test pull-kubernetes-node-e2e
/test pull-kubernetes-e2e-gce-gpu
/test pull-kubernetes-kubemark-e2e-gce

xilabao · 2017-08-24T02:59:01Z

/retest

fejta-bot · 2017-08-24T05:55:37Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

fejta-bot · 2017-08-24T09:04:39Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

luxas · 2017-08-24T11:45:34Z

cc @liggitt as we talked about the labels the node should be able to when registering itself.

fejta-bot · 2017-08-24T13:58:37Z

/retest
This bot automatically retries jobs that failed/flaked on approved PRs (send feedback to @fejta).

Review the full test history for this PR.

liggitt · 2017-08-24T15:18:35Z

we're actively trying to reduce the self-labeling capabilities of nodes (see kubernetes/community#911), since it lets them steer workloads to themselves. every label a node sets currently will have to be whitelisted.

if this is used so workloads can target nodes that are capable of running the workload, that's fine. If it's used by an administrator to find nodes with a vulnerable kernel, or to steer pods away from nodes with a vulnerable kernel, that's not fine, since it is self-reported by the node.

also, I have some questions about the cardinality and maintenance of this label:

are nodes with super-specific kernel version labels useful in practice for scheduling? (edit: talked with @derekwaynecarr, this is needed for things like gpu support)
this label is only set when the node API object is created... it is never updated after that, which means it will be inaccurate if a node's kernel is upgraded. I really don't want nodes to start modifying their labels on update, since it can interfere with central admin management of node labels. what is the plan to maintain this label?

liggitt · 2017-08-24T15:19:21Z

cc @kubernetes/sig-node-pr-reviews

luxas · 2017-08-24T15:48:31Z

cc @kubernetes/sig-auth-pr-reviews

xilabao · 2017-08-30T02:08:43Z

I agree with @vishh it's a trivial feature. and it's useful for me that I want to create pods to run some testcases relate with kernel. Just as @liggitt say, kernel version can be considered a fundamental thing. maybe we can add it to the default label.
Someone may use a customized kernel (some parts changed e.g. filesystem, network) for special purposes (e.g. performance). they might have interest in the label.

xilabao · 2017-08-30T02:08:57Z

/retest

liggitt · 2017-08-30T02:24:11Z

re: #51006 (comment), for scheduling to nodes with specific device versions, is the expectation that kubelets will start adding sets of labels for the various attributes of devices they are making available for scheduling?

This seems like the first of many attributes like this, and I was hoping a whitelist of labels under node control would start small (mostly for backwards compatibility), and stay small or shrink, rather than continuously grow.

vishh · 2017-08-30T02:30:52Z

Yes. We are introducing a new extension point called device plugins at the kubelet level. Deploying device plugins requires "root" privileges on the node as of now. Device plugins are expected to advertize node labels to aid in scheduling. Just having a string resource name for scheduling is insufficient for many use cases.

…

On Tue, Aug 29, 2017 at 7:24 PM, Jordan Liggitt ***@***.***> wrote: re: #51006 (comment) <#51006 (comment)>, for scheduling to nodes with specific device versions, is the expectation that kubelets will start adding sets of labels for the various attributes of devices they are making available for scheduling? This seems like the first of many attributes like this, and I was hoping a whitelist of labels under node control would start small (mostly for backwards compatibility), and stay small or shrink, rather than continuously grow. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#51006 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGvIKOgV3xYmyCGyYs4J8wIaMphQ6wKhks5sdMfvgaJpZM4O9Ede> .

liggitt · 2017-08-30T02:40:39Z

Yes. We are introducing a new extension point called device plugins at the kubelet level. Deploying device plugins requires "root" privileges on the node as of now.

The goal is for root privileges on a node to not automatically translate to root privileges on the cluster (against the API or other nodes). Control over all labels and taints allows steering workloads that can transitively grant access to high-value credentials.

Device plugins are expected to advertize node labels to aid in scheduling.

Do you have pointers to those designs/docs (the ones I saw didn't call out populating node labels)? It would be helpful to see the overall direction for this. Finding a balance might be as simple as putting the device labels under a label key prefix/namespace that makes it clear the label was under the node's control, whitelisting that prefix, and documenting that those should not be used for security-related partitioning of a cluster. If so, it would be ideal to use such a prefix/namespace for this

liggitt · 2017-08-30T02:51:02Z

pkg/kubelet/apis/well_known_labels.go


-	LabelOS   = "beta.kubernetes.io/os"
-	LabelArch = "beta.kubernetes.io/arch"


it's safe to assume there are workloads in the real world that assume the presence of these labels today (with the beta keys). when combined with making controller selectors immutable (#50808 & #50719), we have to think through the implications of changing these, and how someone would modify their existing workloads to use the non-beta labels non-disruptively. do we need to announce deprecation of beta labels, and set both beta and non-beta labels for some period? cc @kubernetes/api-reviewers

I don't think this is safe within our definitions of API stability. This breaks anyone using those labels without warning.

@liggitt @smarterclayton and others, I have created a issue (#51756) for the beta label.

liggitt · 2017-08-30T02:54:52Z

pkg/kubelet/apis/well_known_labels.go

-	LabelZoneFailureDomain = "failure-domain.beta.kubernetes.io/zone"
-	LabelZoneRegion        = "failure-domain.beta.kubernetes.io/region"
+	LabelZoneFailureDomain = "failure-domain.kubernetes.io/zone"
+	LabelZoneRegion        = "failure-domain.kubernetes.io/region"


These are baked into the default scheduler startup options (see DefaultFailureDomains and the kube-scheduler failure-domains flag). I think changing this can break scheduling to version skewed kubelets, at least.

vishh · 2017-08-30T03:30:57Z

@liggitt https://github.com/kubernetes/community/blob/master/contributors/design-proposals/device-plugin.md Please file issues for any label changes. The API is still very much alpha, so we can integrate your suggestions without any concern.

xilabao · 2017-09-01T02:30:36Z

/retest

xilabao · 2017-09-01T13:13:30Z

/test pull-kubernetes-e2e-gce-bazel

rphillips · 2017-09-05T15:28:53Z

I would love to see the host Linux Distribution propagated through as well. perhaps calling lsb_release

edit to add a source blurb:

func EnumerateDistro() string {
	std, err := exec.Command("lsb_release", "-i", "-s", "-r").CombinedOutput()
	if err != nil {
		return ""
	}
	trimmed := strings.TrimSpace(string(std))
	splitted := strings.Split(trimmed, "\n")
	if len(splitted) != 2 {
		return ""
	}
	id := splitted[0]
	release := splitted[1]
	return fmt.Sprintf("%s %s", id, release)
}

edit 2, or read from /etc/os-release

func EnumerateDistro() string {
	file, err := os.Open("/etc/os-release")
	if err != nil {
		return ""
	}
	defer file.Close()

	id := ""
	release := ""
	scanner := bufio.NewScanner(file)
	for scanner.Scan() {
		line := scanner.Text()
		if strings.HasPrefix(line, "ID=") {
			id = line[3:]
		}
		if strings.HasPrefix(line, "VERSION_ID=") {
			release = strings.Replace(line[11:], "\"", "", -1)
		}
		if release != "" && id != "" {
			break
		}
	}
	return fmt.Sprintf("%s %s", id, release)
}

euank · 2017-09-05T17:41:29Z

the change is for using node affinity.

I don't see how this is usable in node affinity unless versions can be compared with anything other than equality.

Encouraging users to express exactly "I need a kernel version of 4.11.9-300.fc26.x86_64" seems to me like a step in the wrong direction. Doing so will discourage users from updating their kernels to avoid breaking affinity (thus making it harder for users of k8s to remain secure), could lead to incredibly large affinity constraints... and at what benefit? I can't think of a single use-case that wouldn't be served better by a finer-grained label.

Let's take @derekwaynecarr's example of gpu support (which depends on some kernel features presumably).
There's a fairly tight coupling between the version of kernel modules installed on the host and the version installed in the container (or, depending, bindmounted in).
Unfortunately, knowing the kernel version does nothing to improve the state of things there since the gpu driver version changes independently of the kernel version.
Perhaps there are kernel config options (though I can't think of any) or a minimum kernel version.
This feature neither exposes whether CONFIG_FOO is enabled (which can't be known from just a kernel version) nor allows the user to express an "at least this version" relationship.

It seems to me like a better way to solve a specific problem like the above would be to have a specific label for it (kubernetes.io/supports-gpu: true type label), and then that label can be applied based on whatever combination of the above is needed, including an "at least this kernel version" requirements.

I can't think of a single case where using an affinity rule based on the exact kernel version would be a better choice than having a dedicated helper component which applies a more specific label based on the combination of kernel config and kernel version.

Given the kernel version is already available in NodeInfo for the purpose of creating those labels, I'm not convinced this PR solves any real problems.
I would like specific examples of cases where this is the right tool, how it would be used exactly, and why more specific feature-targetted affinity labels wouldn't fit better.

@rphillips let's not mix that in with this, that should be a separate issue and PR imo.

vishh · 2017-11-22T18:58:03Z

I need a kernel version of 4.11.9-300.fc26.x86_64" seems to me like a step in the wrong direction. Doing so will discourage users from updating their kernels to avoid breaking affinity (thus making it harder for users of k8s to remain secure), could lead to incredibly large affinity constraints... and at what benefit?

Why does k8s have to be opinionated in how a user upgrades kernels? If a deployment wants to prevent users from depending on kernel versions, then we can either give an option to not expose it, or have such deployments run admission controllers that prevent usage of the kernel version label.

Portable linux containers are expected to be agnostic of the kernel version. If they care about kernel versions, then they need to be treated carefully.
Exposing kubelet and kernel version will benefit management of k8s system addons. For instance a device driver installer may be custom tuned for different kernel versions.

having a dedicated helper component which applies a more specific label based on the combination of kernel config and kernel version

This would not provide any standard APIs for developing portable system addons that depend on kernel version.

simo5 · 2017-11-22T19:22:40Z

@vishh dpending on a specific kernel version is not portable by definition, so it is unclear what you are saying here. It is almost never a good ideda to depend on a specific kernel version, unless you are a component that is tightly coupled with the kernel (ie a kernel module), in all other cases you want to dpend on a feature and its userspace API.

And this is what @euank is saying, what you want to expose and depend on is a feature, which you can discover and expose at runitime by checking if the kernel supports it. But not by exposing a kernel version, but a feature (perhaps versioned) name.

The devide plugin document indeed hints throughout thatthere is a (versioned) device "name" being exposed to the kublet. That's what may be exposed, a specific kernel makes little sense, especially if what you want to advertizes is hardware that may or not be present. A kernel version does not guarantee you anything about what hardware (and therefore devices) are avilable, and whether they work with that kernel version.

Exposing a kernel version can only have the consequence of breaking configurations when upgrades are performed.

euank · 2017-11-22T19:40:43Z

@vishh

Exposing kubelet and kernel version will benefit management of k8s system addons

Those are both already exposed in a standard way under NodeInfo. This PR is adding them as labels in addition, which I'm trying to point out is probably not the right thing in any use-case.

[feature-specific labels] would not provide any standard APIs for developing portable system addons that depend on kernel version.

Neither would exposing the kernel version provide portability since it's inherently non-portable, especially given distros like RHEL/CentOS which backport large features without significantly changing the kernel version.
I think feature-specific addons can much more easily decide to standardize on labels like k8s.io/supports-kernel-feature-$foo: true than on kernel-version: 4.12.12-203.bigco-custom-patchset.44b85

thockin · 2017-11-22T20:59:09Z

If we going down the kernel-version or kubelet-version path (and I am not sure we should) it seems we need multiple labels to allow less precise version locking.

e.g. Given "4.11.9-300.fc26.x86_64", something like:

kubernetes.io/kernel-version-major: 4
kubernetes.io/kernel-version-minor: 4.11
kubernetes.io/kernel-version-patch: 4.11.9
kubernetes.io/kernel-version: 4.11.9-300.fc26.x86_64

Likewise, we probably need kubelet version in the same style. The lack of inequalities in labels still makes this clunky, and no matter what it is not exactly portable.

Alternatively, we can make node-software-version an explicit criteria, formalize these fields into the API, and allow inequalities. That's a LOT more work...

fejta-bot · 2018-02-20T21:42:00Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-03-22T22:28:53Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-04-21T22:45:50Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Aug 21, 2017

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 21, 2017

k8s-github-robot assigned lavalamp and dchen1107 Aug 21, 2017

k8s-github-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. release-note-none Denotes a PR that doesn't merit a release note. labels Aug 21, 2017

k8s-ci-robot removed the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 23, 2017

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 23, 2017

dchen1107 added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 23, 2017

xilabao mentioned this pull request Aug 24, 2017

add kernel version to node label #51234

Closed

k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 24, 2017

k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Aug 24, 2017

k8s-ci-robot added the sig/auth Categorizes an issue or PR as relevant to SIG Auth. label Aug 24, 2017

liggitt reviewed Aug 30, 2017

View reviewed changes

add kernel version to node label

2607bab

xilabao force-pushed the add-kernel-version-to-node-label branch from 1abaeb8 to 2607bab Compare August 31, 2017 01:37

xilabao mentioned this pull request Sep 1, 2017

dispose of labels with the beta key #51756

Closed

yujuhong mentioned this pull request Oct 26, 2017

add distribution id and version to kubelet #54490

Closed

vishh mentioned this pull request Nov 22, 2017

Better handling of device plugin resource deletion #53395

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 20, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Mar 22, 2018

k8s-ci-robot closed this Apr 21, 2018

liggitt mentioned this pull request Jul 5, 2018

Advertise Kernel version and OS image as node labels #65850

Closed

liggitt mentioned this pull request Oct 15, 2019

Windows RuntimeClass KEP kubernetes/enhancements#1302

Merged


		LabelOS = "beta.kubernetes.io/os"
		LabelArch = "beta.kubernetes.io/arch"

add kernel version to node label #51006

add kernel version to node label #51006

Conversation

xilabao commented Aug 21, 2017 • edited Loading

k8s-ci-robot commented Aug 21, 2017

FengyunPan commented Aug 21, 2017

xilabao commented Aug 21, 2017

dchen1107 commented Aug 23, 2017

dchen1107 commented Aug 23, 2017

k8s-github-robot commented Aug 23, 2017

dchen1107 commented Aug 23, 2017

xilabao commented Aug 24, 2017

xilabao commented Aug 24, 2017

fejta-bot commented Aug 24, 2017

fejta-bot commented Aug 24, 2017

luxas commented Aug 24, 2017

fejta-bot commented Aug 24, 2017

liggitt commented Aug 24, 2017 • edited Loading

liggitt commented Aug 24, 2017

luxas commented Aug 24, 2017

xilabao commented Aug 30, 2017

xilabao commented Aug 30, 2017

liggitt commented Aug 30, 2017

vishh commented Aug 30, 2017 via email

liggitt commented Aug 30, 2017 • edited Loading

liggitt Aug 30, 2017 • edited Loading

Choose a reason for hiding this comment

smarterclayton Aug 30, 2017

Choose a reason for hiding this comment

xilabao Sep 1, 2017

Choose a reason for hiding this comment

liggitt Aug 30, 2017

Choose a reason for hiding this comment

vishh commented Aug 30, 2017

xilabao commented Sep 1, 2017

xilabao commented Sep 1, 2017

rphillips commented Sep 5, 2017 • edited Loading

euank commented Sep 5, 2017 • edited Loading

vishh commented Nov 22, 2017

simo5 commented Nov 22, 2017

euank commented Nov 22, 2017 • edited Loading

thockin commented Nov 22, 2017

fejta-bot commented Feb 20, 2018

fejta-bot commented Mar 22, 2018

fejta-bot commented Apr 21, 2018

xilabao commented Aug 21, 2017 •

edited

Loading

liggitt commented Aug 24, 2017 •

edited

Loading

liggitt commented Aug 30, 2017 •

edited

Loading

liggitt Aug 30, 2017 •

edited

Loading

rphillips commented Sep 5, 2017 •

edited

Loading

euank commented Sep 5, 2017 •

edited

Loading

euank commented Nov 22, 2017 •

edited

Loading