-
Notifications
You must be signed in to change notification settings - Fork 61
Conversation
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Signed-off-by: Yee Hing Tong <wild-endeavor@users.noreply.github.com>
Codecov Report
@@ Coverage Diff @@
## master #396 +/- ##
==========================================
+ Coverage 76.11% 78.49% +2.37%
==========================================
Files 18 18
Lines 1390 1195 -195
==========================================
- Hits 1058 938 -120
+ Misses 280 205 -75
Partials 52 52
Flags with carried forward coverage won't be shown. Click here to find out more. see 18 files with indirect coverage changes Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
|
||
// If set, these limits will be used as the default limits for tasks that do not specify limits. If not set, but | ||
// defaults (field 1) are set, then the defaults will be used as the limits. | ||
TaskResourceSpec default_limits = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because this naming has tripped people up somewhat already, what if we differentiated between defaults and limits like so
message TaskResourceAttributes
TaskResourceSpec defaults = 1 [deprecated=true];
TaskResourceSpec limits = 2;
TaskResourceDefaults default_values = 1;
}
message TaskResourceDefaults{
TaskResourceSpec requests = 1;
TaskResourceSpec limits = 2;
}
So it's clear exactly what is a default and what is a limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to changing this terminology somehow. Also, didn't realize until now that changing the name of a proto field is easy if you use the same id number?!?! Can we maybe specify something like:
message TaskResourceAttributes {
TaskResourceSpec defaults = 1; [deprecated=true]
TaskResourceSpec default_requests = 1;
TaskResourceSpec limits = 2; // IIUC these are the hard limits imposed by the platform
TaskResourceSpec default_limits = 3;
}
|
||
// If set, these limits will be used as the default limits for tasks that do not specify limits. If not set, but | ||
// defaults (field 1) are set, then the defaults will be used as the limits. | ||
TaskResourceSpec default_limits = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My concern with having different requests and limits is that Kubernetes scheduler will schedule multiple pods on the same node (oversubscribe) and this will lead to pod evictions. Pod evictions may
cause deletions of the pod ignoring finalizers even and in the case inject-finalizer is False then absolutely. This is bad and will cause propeller to think that the pod was externally deleted and relaunch.
If the pod stays, but is evicted then either demystify pending or demystify error should take care of it.
you can specify pod priority to ensure order of evictions.
https://kubernetes.io/docs/concepts/scheduling-eviction/node-pressure-eviction/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so from that document, "Guaranteed pods are guaranteed only when requests and limits are specified for all the containers and they are equal". I didn't know this. But eviction can still happen right? Was this guaranteeing behavior the reason why we set limits to requests in the first place? didn't know that.
We've had numerous requests to allow limits to grow unbounded. we can also add this change and add an option to admin to continue the current behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Guaranteed pods
in the docs is VERY interesting @yee. Maybe it's worth running some benchmarks (ex. large fanout maptasks) where resource requests != limits and requests == limits to see if non guaranteed QoS Pods are evicted at a high rate in practice. Would be very intersted in seeing how this applies EKS vs GKE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes agreed, one day, we will do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just keep in mind that this is handled very differently for cpu and memory!
closing this entirely - we will address this in the future as we get a better handle on evictions. the concern raised was that by setting a default limit, we open up ourselves to over-subscription on the cluster, which reduces reproducibility because of evictions, and makes things unclear because users won't know what level should be used. By setting limits to defaults, we fall into the guaranteed bucket of K8s pods, which get evicted less often, and OOMs become reproducible. The admin PR in progress will continue - we will support Pods that do not have limits by requiring users to expressly set them to 0. |
Just to clarify: all the oversubscription and evictions due to pressure do NOT apply to CPU. |
TL;DR
Add a default limit
TaskResourceSpec
section toTaskResourceAttributes
.There's been lots of confusion around Admin's existing behavior of setting the a task's limits to the same as the requests. Adding a default limit section to admin explicitly will fix this. Limits from now on will just be used for checking that the default requests and default limits conform to the system level limit.
Type
Are all requirements met?
Complete description
NA
Tracking Issue
flyteorg/flyte#3574
flyteorg/flyte#3467