Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add labels for workload parent #1032

Merged
merged 4 commits into from Aug 3, 2023

Conversation

achernevskii
Copy link
Contributor

@achernevskii achernevskii commented Aug 2, 2023

What type of PR is this?

/kind feature

What this PR does / why we need it:

  • Added parent name and group-kind labels for workload objects.

  • If parent's name, or group-kind exceeds the limit of 63 characters length, respective labels won't be added to a workload object.

Which issue(s) this PR fixes:

Fixes #992

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Workload objects have the label `kueue.x-k8s.io/job-uid` where the value matches the uid of the parent job, whether that's a Job, MPIJob, RayJob, JobSet

* Added parent name and group-kind labels for workload
  objects.

* If parent's name, or group-kind exceeds the limit of 63
  characters length, respective labels won't be added to a
  workload object.
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Aug 2, 2023
@netlify
Copy link

netlify bot commented Aug 2, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit 0350ad8
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/64cbe270f7ec60000815cac7

@k8s-ci-robot
Copy link
Contributor

Welcome @achernevskii!

It looks like this is your first PR to kubernetes-sigs/kueue 🎉. Please refer to our pull request process documentation to help your PR have a smooth ride to approval.

You will be prompted by a bot to use commands during the review process. Do not be afraid to follow the prompts! It is okay to experiment. Here is the bot commands documentation.

You can also check if kubernetes-sigs/kueue has its own contribution guidelines.

You may want to refer to our testing guide if you run into trouble with your tests not passing.

If you are having difficulty getting your pull request seen, please follow the recommended escalation practices. Also, for tips and tricks in the contribution process you may want to read the Kubernetes contributor cheat sheet. We want to make sure your contribution gets all the attention it needs!

Thank you, and welcome to Kubernetes. 😃

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Aug 2, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @achernevskii. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Aug 2, 2023
@trasc
Copy link
Contributor

trasc commented Aug 2, 2023

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Aug 2, 2023
* UID shouldn't exceed the 63 character label value length
  limit, and it uniquely identifies a parent.
@achernevskii achernevskii changed the title Added labels for workload parent Add labels for workload parent Aug 2, 2023
pkg/controller/constants/constants.go Outdated Show resolved Hide resolved

if utf8.RuneCountInString(jobGVK.GroupKind().String()) < 64 {
wl.Labels[controllerconsts.ParentGroupKindLabel] = jobGVK.GroupKind().String()
if uid := string(job.Object().GetUID()); utf8.RuneCountInString(uid) < 64 && uid != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s validation just uses len
https://github.com/kubernetes/kubernetes/blob/2c6c4566eff972d6c1320b5f8ad795f88c822d09/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L169

I think using the rune len would be weaker.

I'm wondering if we should just try to put the label and let the apiserver return an error. Note that there is a regex check, so that could potentially fail as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should just try to put the label and let the apiserver return an error.

We can do that, but I would rather create a workload without a job-uid label, since this functionality is not essential for the controller. We could also log a warning if a label can't be added to a workload. Please let me know which of those three options is the best in your opinion.

Note that there is a regex check, so that could potentially fail as well.

According to this page, UIDs are standardized as ISO/IEC 9834-8 and as ITU-T X.667. And ISO/IEC 9834-8 says the following:

UUIDs forming a component of an OID are represented in ASN.1 value notation as the decimal representation of their integer value, but for all other display purposes it is more usual to represent them with hexadecimal digits with a hyphen separating the different fields within the 16-octet UUID.

Regex that kuberentes is using for label validation is the following (([A-Za-z0-9][-A-Za-z0-9_.]*)?[A-Za-z0-9])?, if I'm not mistaken. This regex should match the UID format, so I think we're good here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k8s validation just uses len

Replaced RuneCountInString call with len call in 866a09a

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do that, but I would rather create a workload without a job-uid label, since this functionality is not essential for the controller.

In that case, use IsValidLabelValue from https://github.com/kubernetes/kubernetes/blob/2c6c4566eff972d6c1320b5f8ad795f88c822d09/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L167

We could also log a warning if a label can't be added to a workload.

Logging is not visible to users. Webhooks support warnings https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/#best-practices-and-warnings:~:text=Admission%20webhooks%20can%20optionally%20return%20warning%20messages%20that%20are%20returned%20to%20the%20requesting%20client%20in%20HTTP%20Warning%20headers%20with%20a%20warning%20code%20of%20299.%20Warnings%20can%20be%20sent%20with%20allowed%20or%20rejected%20admission%20responses.

But in our case, it's not end users creating Workloads, but users. Maybe a log is not so bad so administrators can take action.

UIDs are standardized as ISO/IEC 9834-8 and as ITU-T X.667.

I think there is a way where an end user can override the UID and potentially use something that wouldn't pass label validation. However this should be very uncommon.
The log should be enough :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced len label check with IsValidLabelValue, and added the log message in 0350ad8.

Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just nits

@@ -32,4 +32,8 @@ const (
// ignores this Job from admission, and takes control of its suspension
// status based on the admission status of the parent workload.
ParentWorkloadAnnotation = "kueue.x-k8s.io/parent-workload"

// JobUIDLabel is the label key in the workload resource, that holds the UID of
// a parent job.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// a parent job.
// the owner job.

nit: Similarly, to avoid confusion with parent which is a relation between jobs in Kueue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the comment in 0350ad8.

@@ -165,6 +165,11 @@ func (w *WorkloadWrapper) ReclaimablePods(rps ...kueue.ReclaimablePod) *Workload
return w
}

func (w *WorkloadWrapper) SetLabels(l map[string]string) *WorkloadWrapper {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (w *WorkloadWrapper) SetLabels(l map[string]string) *WorkloadWrapper {
func (w *WorkloadWrapper) Labels(l map[string]string) *WorkloadWrapper {

nit, but we generally don't use for functions like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed this method in 0350ad8.

* Replace len label check with IsValidLabelValue from k8s
  apimachinery, which is used on the server side to
  validate the label values.

* Add a log message, for the case, when UID label
  validation returns any errors.
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve

wl.Labels[controllerconsts.JobUIDLabel] = jobUid
} else {
log.V(2).Info(
"Validation of the owner job UID label has failed. Creating workload without the label.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Validation of the owner job UID label has failed. Creating workload without the label.",
"Validation of the owner job UID label has failed. Creating workload without the label",

} else {
log.V(2).Info(
"Validation of the owner job UID label has failed. Creating workload without the label.",
"ValidationErrors", errs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"ValidationErrors", errs,
"err", errs.ToAggregate(),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a ErrorListtype here, IsValidLabelValue will return a slice of strings, so I can't call this method here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhhh, that's why the value is not present either

log.V(2).Info(
"Validation of the owner job UID label has failed. Creating workload without the label.",
"ValidationErrors", errs,
"LabelValue", jobUid,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will probably already be part of the error, so no need to include again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the log message, that I'm getting while unit testing:

"Validation of the owner job UID label has failed. Creating workload without the label" 
"job"="ns/job" 
"ValidationErrors"=["must be no more than 63 characters"] 
"LabelValue"="long-uidlong-uidlong-uidlong-uidlong-uidlong-uidlong-uidlong-uid"

Label is not included in the error here.

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: achernevskii, alculquicondor

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 3, 2023
@alculquicondor
Copy link
Contributor

Please add a release note.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 3, 2023
@k8s-ci-robot k8s-ci-robot merged commit b2d395b into kubernetes-sigs:main Aug 3, 2023
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v0.5 milestone Aug 3, 2023
@alculquicondor
Copy link
Contributor

/release-note-edit

Workload objects have the label `kueue.x-k8s.io/job-uid` where the value matches the uid of the parent job, whether that's a Job, MPIJob, RayJob, JobSet

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Aug 3, 2023
@achernevskii achernevskii deleted the feature/workload_labels branch August 3, 2023 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add labels to the Workload object matching the job details
5 participants