Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-168-2: Pending workloads visibility #1300

Merged
merged 1 commit into from
Nov 16, 2023

Conversation

PBundyra
Copy link
Contributor

What type of PR is this?

/kind feature

What this PR does / why we need it:

Introduces a new API to expose information about the position of pending workloads in both ClusterQueue and LocalQueue.

Which issue(s) this PR fixes:

Part of #168

Special notes for your reviewer:

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 30, 2023
@netlify
Copy link

netlify bot commented Oct 30, 2023

Deploy Preview for kubernetes-sigs-kueue canceled.

Name Link
🔨 Latest commit bb3df08
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-sigs-kueue/deploys/6555ed88d35acb0008c37fd6

@k8s-ci-robot
Copy link
Contributor

Hi @PBundyra. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 30, 2023
@PBundyra
Copy link
Contributor Author

/assign @mimowo
/assign @mwielgus

@tenzen-y
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Oct 31, 2023
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
@PBundyra
Copy link
Contributor Author

/assign @tenzen-y


### Goals

- Support listing in order all pending workloads in a ClusterQueue, no matter the size of the queue, and without delay,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the implementation consider the ability to support paged queries? I think this is still pretty important, especially with a lot of workloads.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the K8s Aggregation Layer does not support pagination out of the box, we don't want to commit to it since it would require significant effort. However, it can be implemented on the client side. We also expose the endpoint to fetch information about a single workload, so there is a way to query its position without listing all the remaining pending workloads.

I've updated the non-goals and API Details sections to cover this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify what we mean by pagination.
Let's say I want to list 1000 elements now, and I can only see 100 at a time. Do you keep the 1000 elements somewhere in memory, with some key id, and allow users to query that list in 100 increments? In other words, you get a consistent view if you use the same id?

I don't think we need that. Simply listing elements from position X to Y, at this time, is enough.

Can we do something simple like having 2 query parameters for the API:

  • first position
  • number of elements (and this is some sane value by default: probably around 1000?)

Doing it in the client side (let's say a dashboard) could be wasteful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know at the moment how to have extra query parameters, so went for the simple proposal without pagination at all. In the prototype we implement it here:

func (m *pendingWorkloadsInCQ) Get(ctx context.Context, name string, opts *metav1.GetOptions) (runtime.Object, error) {
var wls []v1alpha1.PendingWorkloadSummary
for _, val := range m.kueueMgr.GetPendingWorkloadsInfo(name) {
wls = append(wls, v1alpha1.PendingWorkloadSummary{
ObjectMeta: metav1.ObjectMeta{
Name: val.Name,
Namespace: val.Namespace,
},
})
}
return &v1alpha1.PendingWorkloadSummaryList{Items: wls}, nil
}
. However, at this layer we don't already have access to http.Request to see the query parameters. It might be possible, but it is not obvious how to get them. AFAIK metrics server also does not do pagination, so we don't have a good example of passing the query parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something simple like having 2 query parameters for the API:

first position
number of elements (and this is some sane value by default: probably around 1000?)

Doing it in the client side (let's say a dashboard) could be wasteful.

I also think it's better to support on the server side or not support. On the client side is not a good choice.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cross linking: The Pod log subresource implements query parameters https://github.com/kubernetes/kubernetes/blob/029452198566a41bc39d04a1ec5bad3f37621a1c/pkg/registry/core/pod/rest/log.go#L78

It's a different interface.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, implementing a different interface allows us to pass query parameters. Hence, following @alculquicondor proposal, we will enable users to fetch pending workloads from position X to Y.

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
@PBundyra PBundyra force-pushed the kep-168-2 branch 4 times, most recently from 7c5e70e to 4171fe6 Compare October 31, 2023 15:05

## Proposal

- Add new API,
Copy link
Contributor

@mimowo mimowo Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The last point does not seem important enough to mention in this section.

Also, I would suggest to formulate to capture the main technical aspects of the proposal, without going too much detail like in the "Design Details" section (or something along the lines):

"Add a new Extension API server to expose on-demand endpoints for fetching information about pending workloads."

Then, it may also be worth adding something like:
"The returned information about pending workloads includes all the necessary information relevant for their position in the queue, along with the position itself. There are three such endpoints: (1) to list the pending workloads in ClusterQueue, (2) list the pending workloads in LocalQueue, and (3) get a specific pending workload."

keps/168-2-pending-workloads-visibility/kep.yaml Outdated Show resolved Hide resolved
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Left some nits, but it is IMO good enough to:
/assign @alculquicondor

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved

### Goals

- Support listing in order all pending workloads in a ClusterQueue, no matter the size of the queue, and without delay,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's clarify what we mean by pagination.
Let's say I want to list 1000 elements now, and I can only see 100 at a time. Do you keep the 1000 elements somewhere in memory, with some key id, and allow users to query that list in 100 increments? In other words, you get a consistent view if you use the same id?

I don't think we need that. Simply listing elements from position X to Y, at this time, is enough.

Can we do something simple like having 2 query parameters for the API:

  • first position
  • number of elements (and this is some sane value by default: probably around 1000?)

Doing it in the client side (let's say a dashboard) could be wasteful.


We introduce a new API that will extend the existing one.

There will be separate endpoints for administrators and regular users. Each endpoint exposes information about a pending workload, such as:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that this mean that there will be 2 endpoints for clusterqueue, 2 for localqueue, etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC there is only one endpoint for ClusterQueue, and one for LocalQueue. These translate in an implied way into endpoints for admins and users, but this may depend on RBAC. We may rephrase this sentence to clarify.
For example: "There will be separate endpoints exposing the information about pending workloads for LocalQueues, and ClusterQueues". Or something along the lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarified

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
#### List all pending workloads in ClusterQueue

```
GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would something like this be possible?

Suggested change
GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME
GET /apis/visibility.kueue.x-k8s.io/VERSION/clusterqueues/CQ_NAME/pending_workloads

Copy link
Contributor

@mimowo mimowo Oct 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tried like this, but there seems to be a limitation of the framework, that the path can only have one fixed fragment. The fragment is the key in this map in the prototype:

visibilityServerResources := map[string]rest.Storage{
"clusterqueues": pendingWorkloadsInCQ,
}
.

Then, depending on the value returned here:

we may yet have the namespace fragment on the path.

However, I don't think we can have the path you suggest here within the framework.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out @alculquicondor. Indeed, it's possible to introduce that kind of subresource so I'll change the KEP


#### Fetch information about a single Workload
```
GET /apis/pending-workloads.kueue.x-k8s.io/VERSION/namespaces/WL_NAMESPACE/workloads/WL_NAME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the workload is admitted or finished? would you see some information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at the moment. However, for admitted or finished workloads users don't need extra information that is dynamically changing, such as position in the queue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's clarify in the KEP.
Maybe this is an argument for keeping the name of the API group as pending-workloads.kueue.x-k8s.io.

Let's go with that and we can iterate in the future as we see the need for new "visibility APIs".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've clarified that this endpoint refers to a single pending workload

@tenzen-y
Copy link
Member

Maybe, I can review this KEP in the next week.

@tenzen-y
Copy link
Member

I came back here today.

Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/approve
I'll leave lgtm to @mimowo

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alculquicondor, PBundyra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2023
Copy link
Contributor

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please squash

Copy link
Member

@tenzen-y tenzen-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than #1300 (comment), LGTM.

@PBundyra PBundyra force-pushed the kep-168-2 branch 2 times, most recently from 8b7ca36 to 49a6991 Compare November 16, 2023 10:05
Copy link
Contributor

@mimowo mimowo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Just some nits

keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
keps/168-2-pending-workloads-visibility/README.md Outdated Show resolved Hide resolved
Update keps/168-2-pending-workloads-visibility/README.md

Co-authored-by: Yuki Iwai <yuki.iwai.tz@gmail.com>

Update keps/168-2-pending-workloads-visibility/README.md

Co-authored-by: Michał Woźniak <mimowo@users.noreply.github.com>

Update keps/168-2-pending-workloads-visibility/kep.yaml

Co-authored-by: Aldo Culquicondor <1299064+alculquicondor@users.noreply.github.com>
@mimowo
Copy link
Contributor

mimowo commented Nov 16, 2023

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Nov 16, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 2d1ff78485b8093d033a843de53e2267bc3a5ade

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants