Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document NFD for GPU Labeling #44915

Merged
merged 2 commits into from
Jan 30, 2024

Conversation

ArangoGutierrez
Copy link
Contributor

This patch documents the official k8s-sig NFD project for automated node labeling, and restructures the Automatic node labelling section at scheduling-gpus page

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language labels Jan 27, 2024
@k8s-ci-robot k8s-ci-robot added the sig/docs Categorizes an issue or PR as relevant to SIG Docs. label Jan 27, 2024
@k8s-ci-robot k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 27, 2024
Copy link

netlify bot commented Jan 27, 2024

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 07b14de
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/65b92ba44707530008b71a00
😎 Deploy Preview https://deploy-preview-44915--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@ArangoGutierrez
Copy link
Contributor Author

/cc @marquiz

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 27, 2024
@ArangoGutierrez ArangoGutierrez force-pushed the automatedlabels branch 3 times, most recently from e7e734c to b9cabb2 Compare January 27, 2024 16:39
Copy link

@mikemckiernan mikemckiernan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits and a (likely unwelcome) widening of the scope to identify some of the existing content as the manual method of labeling. I might be waltzing you into work that the doc maintainers don't want, so let's see what they say. Thanks as always for updating the docs!

content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
@sftim
Copy link
Contributor

sftim commented Jan 28, 2024

Thanks

/sig node

@k8s-ci-robot k8s-ci-robot added the sig/node Categorizes an issue or PR as relevant to SIG Node. label Jan 28, 2024
@bart0sh bart0sh added this to Triage in SIG Node PR Triage Jan 28, 2024
@bart0sh bart0sh moved this from Triage to Needs Reviewer in SIG Node PR Triage Jan 28, 2024
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi; thanks for the pull request.

Here's my feedback on the changes.

content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
content/en/docs/tasks/manage-gpus/scheduling-gpus.md Outdated Show resolved Hide resolved
Comment on lines 97 to 123
kind: Pod
metadata:
name: example-vector-add
spec:
# You can use Kubernetes node affinity to schedule this Pod onto a node
# that provides the kind of GPU that its container needs in order to work
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: "gpu.gpu-vendor.example/installed-memory"
operator: Gt # (greater than)
values: ["40535"]
- key: "gpu.gpu-vendor.example/family"
operator: In
values:
- Helium # example product family
- Neon # example product family
restartPolicy: Never
containers:
- name: example-vector-add
image: "registry.example/example-vector-add:v42"
resources:
limits:
gpu-vendor.example/example-gpu: 1 # requesting 1 GPU
{{< /highlight >}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to just be an extension of the example above, with the added affinity section, correct?

However, from reading the docs top-to-bottom, this implies that a label called gpu.gpu-vendor.example/installed-memorywill somehow magically be added to the node by NFD (which isn't true, an NFD plugin is needed to add this).

We should extend the text to make it clear that an NFD plugin is needed (in addition to NFD itself) in order to provide any vendor specific labels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about now?

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2024
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks right.

It looks OK to publish this as a change to the live site.

/lgtm
/approve

Thanks everyone!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f051228a1757be5c042884074869c5865761ae5d

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eero-t, sftim

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 30, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2024
@sftim
Copy link
Contributor

sftim commented Jan 30, 2024

Actually I fixed some highlighting.

@sftim
Copy link
Contributor

sftim commented Jan 30, 2024

OK, it's less wrong now.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 30, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: f7cb87194f3090aa8dc8aac3251584f53886dc86

@k8s-ci-robot k8s-ci-robot merged commit 114fa30 into kubernetes:main Jan 30, 2024
6 checks passed
SIG Node PR Triage automation moved this from Needs Reviewer to Done Jan 30, 2024
@ArangoGutierrez
Copy link
Contributor Author

OK, it's less wrong now. /lgtm

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/node Categorizes an issue or PR as relevant to SIG Node. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

9 participants