Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRD-based custom node labeling #553

Merged
merged 1 commit into from
Nov 23, 2021

Conversation

marquiz
Copy link
Contributor

@marquiz marquiz commented Jul 8, 2021

Enable Custom Resource based label creation in nfd-master. This extends
the previously implemented controller stub for watching NodeFeatureRule
objects. NFD-master watches NodeFeatureRule objects in the cluster and
processes the rules on every incoming labeling request from workers.
The functionality relies on the "raw features" (identical to how
nfd-worker handles custom rules) submitted by nfd-worker, making it
independent of the label source configuration of the worker. This means
that the labeling functions as expected even if all sources in the
worker are disabled.

NOTE: nfd-master is stateless and re-labeling only happens on the
reception of SetLabelsRequest from the worker – i.e. on intervals
specified by the core.sleepInterval configuration option (or
-sleep-interval cmdline flag) of each nfd-worker instance. This means
that modification/creation of NodeFeatureRule objects does not
automatically update the node labels. Instead, the changes only come
visible when workers send their labeling requests.

This PR implements #468, i.e. CRD-based custom node labeling, and builds on top of previous work:

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. labels Jul 8, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: marquiz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jul 8, 2021
@marquiz marquiz changed the title Devel/custom crd CRD-based custom node labeling Jul 8, 2021
@kailun-qin
Copy link

/cc @kailun-qin

@k8s-ci-robot
Copy link
Contributor

@kailun-qin: GitHub didn't allow me to request PR reviews from the following users: kailun-qin.

Note that only kubernetes-sigs members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @kailun-qin

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@marquiz marquiz force-pushed the devel/custom-crd branch 2 times, most recently from bfb8bad to 8a7494c Compare July 9, 2021 13:49
@marquiz
Copy link
Contributor Author

marquiz commented Jul 9, 2021

All feedback is very much appreciated. Especially on the soundness of the whole concept.

Some concerns/questions from the top of my head below, just talking out loud...

Data format

I've been pondering the data representation of "raw" features, i.e. is it good (enough). In this patchset NFD now under the hood uses a unified data model of all the "raw features" (not talking about labels here) it is detecting.

Basically we divide feature types into three categories:

  1. "key-only features": features that don't have any value other than "present" or "not present". Examples are CPUID flags or loaded kernel nodules
  2. "value features": features that have a value associated with it. Examples are kernel config flags or os-release information.
  3. "instance features": features that cannot be represented by a single key-value pair (or key-only) but consist of mulitple attributes (key-value pairs). Examples are pci and usb devices

On the higher level, data representation of all detected features (from all features sources) is fixed into a three level hierarchy:
<source-name>.<feature-name>.<single-feature-instance>.

In go code this is a bit more complicated looks something like this (in a pseudo-ish golang):

// Name of the feature source e.g. "cpu" or "pci"
type FeatureSourceName string

// Name of the feature under the feature source e.g. "cpuid" (cpu) or "device" (pci)
type FeatureName string

// All features from all the feature sources
type AllFeatures map[FeatureSourceName]struct {
	// "key-only" features like cpuid or loaded kernel modules
	Keys map[FeatureName]struct {
		Features map[string]struct{}
	}

	// "key-value" features e.g. kernel version or kernel config flags
	Values map[FeatureName]struct {
		Features map[string]string
	}

	// "instance features" like pci or usb devices
	Instances map[FeatureName]struct {
		Features []struct {
			Attributes map[string]string
		}
	}
}

This data representation trickles down everywhere and to the rule format and CRD, too. The unified representation is nice in the sense that it reduces the lines of code dramatically and also simplifies the CRD schema plus makes it stable. Basically any new features added to NFD is automatically supported (because of the unified data representation and unified rule schema) with essentially no changes required in the CRD spec or rules.

My only concern is that is this flexible enough as we're fixed with the three-level hierarchy and cannot specify arbitrary data structures for the features. For example, Intel Speed Select Technology we now have cpu["sst"].Features["bf.enabled"] even though cpu.sst.bf.enabled (just nested structs and a bool in the end) might be more logical representation. However, the downside is that every feature needs a rule type of its own, similar to what we currently have in master (the custom rules). This also means that any added feature potentially changes the CRD API making is potentially way messier.

MatchExpressions

The key attribute of matchexpressions is implicit os the set of expressions are are presented as a map - the map key is used as the key attribute of the matchexpression. This has the limitation that you cannot specify more than one expression per key. I originally thought that this does not matter but now realized that there is actually one case where you would need this property: if you want to check that a value is between two values - in this case you'd need both Lt and Gt expressions. Talking out loud: maybe this should be changed...

Naming

I'm not happy with all the terms/naming that I've come up with. Need to think through it.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 4, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 10, 2021
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/invalid-commit-message Indicates that a PR should not merge because it has an invalid commit message. label Aug 10, 2021
@ArangoGutierrez
Copy link
Contributor

Not yet ready for review?
I want to jump into this PR as soon as it moves out of draft !

@marquiz
Copy link
Contributor Author

marquiz commented Aug 11, 2021

Not yet ready for review?
I want to jump into this PR as soon as it moves out of draft !

#464 would be the first part. But even that is not ready yet, missing unit tests and documentation. I will work on these in the coming weeks, for sure, as I want to get these finally in shape, too.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2021
@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Aug 19, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 20, 2021
@marquiz marquiz force-pushed the devel/custom-crd branch 6 times, most recently from 6e2c98a to 0df5b14 Compare November 18, 2021 12:49
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Nov 18, 2021
@ArangoGutierrez
Copy link
Contributor

/assign

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 22, 2021
@ArangoGutierrez
Copy link
Contributor

@marquiz rebase needed

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Nov 23, 2021
Enable Custom Resource based label creation in nfd-master. This extends
the previously implemented controller stub for watching NodeFeatureRule
objects. NFD-master watches NodeFeatureRule objects in the cluster and
processes the rules on every incoming labeling request from workers.
The functionality relies on the "raw features" (identical to how
nfd-worker handles custom rules) submitted by nfd-worker, making it
independent of the label source configuration of the worker. This means
that the labeling functions as expected even if all sources in the
worker are disabled.

NOTE: nfd-master is stateless and re-labeling only happens on the
reception of SetLabelsRequest from the worker – i.e. on intervals
specified by the core.sleepInterval configuration option (or
-sleep-interval cmdline flag) of each nfd-worker instance. This means
that modification/creation of NodeFeatureRule objects does not
automatically update the node labels. Instead, the changes only come
visible when workers send their labeling requests.
@marquiz
Copy link
Contributor Author

marquiz commented Nov 23, 2021

#656 was merged -> rebased and PR description updated
/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 23, 2021
@marquiz
Copy link
Contributor Author

marquiz commented Nov 23, 2021

/retest

@zvonkok
Copy link
Contributor

zvonkok commented Nov 23, 2021

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 23, 2021
@k8s-ci-robot k8s-ci-robot merged commit aa0074f into kubernetes-sigs:master Nov 23, 2021
@marquiz marquiz deleted the devel/custom-crd branch November 23, 2021 12:33
@marquiz marquiz mentioned this pull request Dec 22, 2021
22 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants