Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

k8s v1.28.2 and NFD updates #1539

Merged
merged 4 commits into from
Sep 21, 2023
Merged

k8s v1.28.2 and NFD updates #1539

merged 4 commits into from
Sep 21, 2023

Conversation

mythi
Copy link
Contributor

@mythi mythi commented Sep 19, 2023

No description provided.

@mythi
Copy link
Contributor Author

mythi commented Sep 19, 2023

@uniemimu PTAL 5cca930

@tkatila
Copy link
Contributor

tkatila commented Sep 19, 2023

$ kubectl get node -o json | jq .metadata.labels | grep gpu.intel.com
  "gpu.intel.com/cards": "card0",
  "gpu.intel.com/device-id.0300-9a49.count": "1",
  "gpu.intel.com/device-id.0300-9a49.present": "true",
  "gpu.intel.com/gpu-numbers": "0",
  "gpu.intel.com/memory.max": "0",
  "gpu.intel.com/millicores": "1000",
  "gpu.intel.com/tiles": "1",
$ kubectl get node -o json | jq .status.allocatable
{
  "cpu": "8",
  "ephemeral-storage": "450548187210",
  "gpu.intel.com/i915": "8",
  "gpu.intel.com/i915_monitoring": "1",
  "gpu.intel.com/memory.max": "0",
  "gpu.intel.com/millicores": "1000",
  "gpu.intel.com/tiles": "1",
  "hugepages-1Gi": "0",
  "hugepages-2Mi": "0",
  "memory": "32474604Ki",
  "pods": "110"
}

I recall the extended resources vanished from the node labels previously. Now they seem to stay in labels. I wonder if it's by design?

@eero-t
Copy link
Contributor

eero-t commented Sep 19, 2023

As an occasional project contributor... I think it would be good to add some note about how NFD is intended to be used to DEVEL.md.

@mythi
Copy link
Contributor Author

mythi commented Sep 19, 2023

As an occasional project contributor... I think it would be good to add some note about how NFD is intended to be used to DEVEL.md.

@eero-t GPU plugin documentation talks about it and we also have similar steps documented as part of the operator deployment.

@mythi
Copy link
Contributor Author

mythi commented Sep 19, 2023

I recall the extended resources vanished from the node labels previously. Now they seem to stay in labels. I wonder if it's by design?

Fixed by adding a custom nfd-worker.conf where "local" (and all other feature sources) are disabled as "label sources".

@codecov-commenter
Copy link

codecov-commenter commented Sep 19, 2023

Codecov Report

Merging #1539 (d7b5db8) into main (ba3ded1) will decrease coverage by 0.11%.
Report is 8 commits behind head on main.
The diff coverage is 20.00%.

❗ Current head d7b5db8 differs from pull request most recent head 8687d2c. Consider uploading reports for the commit 8687d2c to get more accurate results

@@            Coverage Diff             @@
##             main    #1539      +/-   ##
==========================================
- Coverage   49.58%   49.48%   -0.11%     
==========================================
  Files          42       42              
  Lines        4917     4923       +6     
==========================================
- Hits         2438     2436       -2     
- Misses       2339     2345       +6     
- Partials      140      142       +2     
Files Changed Coverage Δ
pkg/controllers/dlb/controller.go 17.35% <0.00%> (ø)
pkg/controllers/dsa/controller.go 6.77% <0.00%> (ø)
pkg/controllers/fpga/controller.go 16.66% <0.00%> (ø)
pkg/controllers/gpu/controller.go 47.70% <0.00%> (ø)
pkg/controllers/iaa/controller.go 6.77% <0.00%> (ø)
pkg/controllers/qat/controller.go 9.72% <0.00%> (ø)
pkg/controllers/sgx/controller.go 10.74% <0.00%> (ø)
pkg/controllers/reconciler.go 4.56% <28.57%> (+0.30%) ⬆️
cmd/qat_plugin/dpdkdrv/dpdkdrv.go 83.27% <100.00%> (ø)

... and 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@eero-t
Copy link
Contributor

eero-t commented Sep 19, 2023

@eero-t GPU plugin documentation talks about it and we also have similar steps documented as part of the operator deployment.

I did not mean end user documentation of installing NFD rules (updated in this PR), but some mention in internal documentation on how (NFD) labels are supposed to be configured for the plugins, e.g. in the DEVEL.md "Checklist for New Device Plugins" section.

@mythi
Copy link
Contributor Author

mythi commented Sep 19, 2023

@eero-t GPU plugin documentation talks about it and we also have similar steps documented as part of the operator deployment.

I did not mean end user documentation of installing NFD rules (updated in this PR), but some mention in internal documentation on how (NFD) labels are supposed to be configured for the plugins, e.g. in the DEVEL.md "Checklist for New Device Plugins" section.

Added: "+9. Plugin NodeFeatureRules added for Node Feature Discovery labeling."

@mythi mythi force-pushed the PR-2023-046 branch 2 times, most recently from a08fc8c to 8687d2c Compare September 19, 2023 14:15
tkatila
tkatila previously approved these changes Sep 19, 2023
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
With the NFD recent versions (v0.13+), it's no longer necessary to
start NFD with custom nfd-master args/rbac settings to get numeric
labels registered as extended resources.

The same can be specified via NodeFeatureRules which also works for
"local" source with feature files.

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
currently, the QAT plugin warns when it finds a PCI ID that is
not an enabled QAT device. This is too verbose so lower the
log priority to "Info".

Signed-off-by: Mikko Ylinen <mikko.ylinen@intel.com>
@mythi
Copy link
Contributor Author

mythi commented Sep 21, 2023

@hj-johannes-lee PTAL

@tkatila tkatila merged commit b38141d into intel:main Sep 21, 2023
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants