-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add QueueingStrategy in Capacity api #80
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before reviewing further, I would like to discuss at what level we should have this option. I think we need it at the capacity level to start with since multiple queues can point to the same capacity.
As a followup, and as we get more feedback on how people will be using the system, we could we could add a queue-level option that inherits the capacity level one by default, but can override it within that queue. There is a lot of nuance that we need to discuss in a detailed separate design doc though.
api/v1alpha1/queue_types.go
Outdated
// queue. This field is immutable. | ||
// +kubebuilder:default=createTimeFIFO | ||
// +kubebuilder:validation:Enum=createTimeFIFO | ||
SortingStrategy SortingStrategy `json:"sortingStrategy,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to think about order across queues referencing the same capacity. I feel that we should start by having this option in Capacity not queue.
pkg/queue/queue.go
Outdated
|
||
var lessFun func(a, b workload.Info) bool | ||
|
||
// TODO(denkensk): Add a Simple Framework to support different queuing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use only references to issues, not github handles.
api/v1alpha1/queue_types.go
Outdated
@@ -25,11 +25,24 @@ type QueueSpec struct { | |||
// capacity is a reference to a Capacity object form which resources | |||
// are allocated to this queue. | |||
Capacity CapacityReference `json:"capacity,omitempty"` | |||
|
|||
// SortingStrategy indicates the queueing strategy of the workloads in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
naming: if we move this to Capacity, I would name it QueueingStrategy and name the field Queueing.
api/v1alpha1/queue_types.go
Outdated
|
||
const ( | ||
// CreateTimeFIFO means that workloads are sorted strictly by creation time. | ||
CreateTimeFIFO SortingStrategy = "createTimeFIFO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What would we name the approach based on re-queueing time? Perhaps we can name them StrictFIFO and BestEffortFIFO? I think creation time is implicit in the FIFO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API enums should start with a capital letter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to start at the Capacity level, given that there is a sorting criteria there too.
Although I admit that implementation will be harder, as the Capacity might not exist when the Queue is created. But it's solvable.
api/v1alpha1/queue_types.go
Outdated
|
||
const ( | ||
// CreateTimeFIFO means that workloads are sorted strictly by creation time. | ||
CreateTimeFIFO SortingStrategy = "createTimeFIFO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
API enums should start with a capital letter.
pkg/queue/queue_test.go
Outdated
@@ -26,7 +28,7 @@ import ( | |||
) | |||
|
|||
func TestFIFOQueue(t *testing.T) { | |||
q := newQueue() | |||
q, _ := newQueue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't silence the error
I need to make sure I'm understanding this correctly. By capacity level policy do you mean within or across queues? If it's within a queue, I think we can set the policy in the capacity and when we want to create a new queue, the queue will inherit it from the capacity. If it's across queues, that's addressing fairness rather than starvation. It can't solve the problem here in #46 |
Think of Capacity as a "global" queue of all the queues pointing to the capacity; so the order should be enforced on all workloads sent to the same capacity even if they were submitted to different queues. |
I also think a queueing strategy at the capacity level is needed. But there are still some things to discuss here. Could you give more comments for it? @ahg-g @alculquicondor Personal opinion: If the strategy at the capacity level act directly on all workloads belongs to the capacity, This way the queue does not serve the purpose of dividing the tenants. A tenant submits a lot of workloads before others so that no other tenant's workloads can be scheduled. This is no different from the one-to-one correspondence between capacity and queue. |
I don't think we should globally sort all the entries related to a capacity. But we can sort the entries within a queue, and also sort the heads of the queues, when they enter the scheduling cycle, based on the same criteria |
I mentioned there is a lot of nuance and we need a more detailed doc listing pros/cons and use cases, but lets continue to discuss here. At some point we actually wanted to merge queue and capacity into one API and so fairness was always supposed to happen across all workloads submitted to the same capacity; but then we identified two reasons for continuing to have a namespaced queue API:
Given the above, the namespaced Queue API wasn't meant to offer isolation between tenants since tenants could be given the permission to create/delete queues, and it is not something always limited to the Batch Admin. Ordering within a queue is not going to offer FIFO access to capacity even if we sort the heads based on the same criteria and that could also be a problem/limitation especially if a tenant was given the permission to create queues. For the time being, I am ok if we sort within queues and then sort the heads; but we may change this in a significant way later based on the following:
|
Thanks for your detailed introduction. @ahg-g If this is our design purpose, I got it. The queue is not the isolated unit, the capacity is the isolated unit. Queue may be a classification for jobs under the same tenant. Right? |
Yes; perhaps Capacity is confusing as a name, may be we should call it |
@alculquicondor is this going to be a significant change? if so, can you suggest a simpler path? perhaps the two step ordering aldo suggested earlier could be one step in that direction. |
Maybe ElasticQuota or ClusterElasticQuota ? elastic indicates resource sharing? Also there is a difference in nomenclature with ResourceQuota or ClusterResourceQuota? |
Quota in the name doesn't clarify that this is a queue (ordered entities waiting to get started); while it is not unusual to associate "Queue" with usage limits. Also, we want to introduce budgets, not just quotas (assuming quota is defined as a limit at a specific point in time) |
ResourceQuotas are namespaced objects, ClusterResourceQuotas are not (in other words, they are cluster-scoped)
Do you mean having a single heap (not saying queue to avoid confusion) per Capacity? I think it is a reasonable simplification. We will still need a heap per Queue once we implement limits at that level, but at the moment they are not providing much. It will be significant code moving around, but not much re-implementation. Better now than later. |
Thanks, then I suggest we do the following:
|
SG. And how about the rename of |
I created #85, lets discuss there. |
They probably need to happen in the opposite order. I would like to do 2, if you don't mind @denkensk |
07fb695
to
451cf54
Compare
@alculquicondor Don't mind, and could you provide some basic implementation at your first pr? |
I think the order may not be so strict, we can merge this pr first (because it's ready) and don't use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, I missed that you changed the PR to only have the API field.
@@ -132,8 +132,21 @@ type CapacitySpec struct { | |||
// The name style is similar to label keys. These are just names to link QCs | |||
// together, and they are meaningless otherwise. | |||
Cohort string `json:"cohort,omitempty"` | |||
|
|||
// QueueingStrategy indicates the queueing strategy of the workloads | |||
// across the queues in this Capacity. This field is immutable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Document the meaning of the values here too. The constants don't form part of the API documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated.
pkg/capacity/capacity.go
Outdated
@@ -83,6 +83,11 @@ type Capacity struct { | |||
// Those keys define the affinity terms of a workload | |||
// that can be matched against the flavors. | |||
LabelKeys map[corev1.ResourceName]sets.String | |||
|
|||
// TODO introduce a single heap for per Capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO(#87)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
default: StrictFIFO | ||
description: 'QueueingStrategy indicates the queueing strategy of | ||
the workloads across the queues in this Capacity. This field is | ||
immutable. Current Supported Strategies: - StrictFIFO: workloads |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you need an extra empty line in the gocomment :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(⊙o⊙)… Added
3640662
to
b72890f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: alculquicondor, denkensk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
/lgtm |
What type of PR is this?
/kind feature
/kind api-change
What this PR does / why we need it:
Add QueueingStrategy in Capacity api
Which issue(s) this PR fixes:
First part #8
Special notes for your reviewer: