Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pod Overhead: account resources tied to the pod sandbox, but not specific containers #688

Open
tallclair opened this issue Jan 14, 2019 · 23 comments

Comments

Projects
None yet
6 participants
@tallclair
Copy link
Member

commented Jan 14, 2019

Enhancement Description

Please to keep this description up to date. This will help the Enhancement Team track efficiently the evolution of the enhancement

@tallclair

This comment has been minimized.

Copy link
Member Author

commented Jan 14, 2019

/help
I'm looking for someone who is interested in picking back up this proposal. Specifically, the design proposal needs to be reworked in the context of RuntimeClass, and the details need to be worked through with the sig-node community before we can move to implementation.

/cc @egernst

@egernst

This comment has been minimized.

Copy link
Contributor

commented Jan 15, 2019

Thanks @tallclair. I may have questions on the process, but am happy to pick this up.

@tallclair

This comment has been minimized.

Copy link
Member Author

commented Jan 15, 2019

\o/
/remove-help
/assign @egernst

@k8s-ci-robot

This comment has been minimized.

Copy link
Contributor

commented Jan 15, 2019

@tallclair: GitHub didn't allow me to assign the following users: egernst.

Note that only kubernetes members and repo collaborators can be assigned and that issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

\o/
/remove-help
/assign @egernst

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@egernst egernst referenced this issue Jan 15, 2019

Closed

REQUEST: New membership for egernst #360

6 of 6 tasks complete
@tallclair

This comment has been minimized.

Copy link
Member Author

commented Jan 17, 2019

/assign @egernst

@egernst

This comment has been minimized.

Copy link
Contributor

commented Feb 15, 2019

Hey @tallclair et al, I made several suggestions for the WIP RFC @ https://docs.google.com/document/d/1EJKT4gyl58-kzt2bnwkv08MIUZ6lkDpXcxkHqCvvAp4/edit?usp=sharing

I added a section for updating the runtimeClass CRD, and explained how the values would be obtained from the new suggested runtimeClass fields rather than configured in the runtimeController. The suggested runtimeController scope is also greatly reduced, allowing this first iteration to just handle adding the pod overhead.

PTAL.

@vinaykul

This comment has been minimized.

Copy link

commented Mar 26, 2019

@tallclair @egernst I quickly went though the design proposal. We are working on In-Place Vertical Scaling for Pods, and I want to clarify a couple of things.

In the context or in-place resize, do you see PodSpec.Overhead as something VPA should be aware of via metrics reporting? And perhaps in the future, track this field and make recommendations for it. It would be a good idea to nail down early-on if this field should be mutable by an external entity.

CC: @kgolab @bskiba @schylek

@kacole2

This comment has been minimized.

Copy link
Member

commented Apr 12, 2019

Hello @tallclair, I'm the Enhancement Lead for 1.15. It looks like there is no KEP accepted yet. Is it safe to assume this will not make the enhancement freeze deadline for 1.15?

egernst added a commit to egernst/enhancements that referenced this issue Apr 12, 2019

kep: pod-overhead: clarify handling of Overhead
Updates to pod-overhead based in review discussion:
 - Clarify what Overhead is in the pod spec, and behavior when
 this is manually defined without a runtimeClass.
 - Clarify ResourceQuota changes necessary
 - Add in CRI API change to make pod details available
 - Define feature gate
 - Update runtimeClass definition

Fixes: kubernetes#688

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
@tallclair

This comment has been minimized.

Copy link
Member Author

commented Apr 12, 2019

We're still planning on getting this into 1.15. Don't we still have close to 3 weeks to get the KEP in? KEP is here, btw: https://github.com/kubernetes/enhancements/blob/master/keps/sig-node/20190226-pod-overhead.md

@tallclair tallclair added this to the v1.15 milestone Apr 12, 2019

egernst added a commit to egernst/enhancements that referenced this issue Apr 16, 2019

kep: pod-overhead: clarify handling of Overhead
Updates to pod-overhead based in review discussion:
 - Clarify what Overhead is in the pod spec, and behavior when
 this is manually defined without a runtimeClass.
 - Clarify ResourceQuota changes necessary
 - Add in CRI API change to make pod details available
 - Define feature gate
 - Update runtimeClass definition

Fixes: kubernetes#688

Signed-off-by: Eric Ernst <eric.ernst@intel.com>

egernst added a commit to egernst/enhancements that referenced this issue Apr 18, 2019

kep: pod-overhead: clarify handling of Overhead
Updates to pod-overhead based in review discussion:
 - Clarify what Overhead is in the pod spec, and behavior when
 this is manually defined without a runtimeClass.
 - Clarify ResourceQuota changes necessary
 - Add in CRI API change to make pod details available
 - Define feature gate
 - Update runtimeClass definition

Fixes: kubernetes#688

Signed-off-by: Eric Ernst <eric.ernst@intel.com>

egernst added a commit to egernst/enhancements that referenced this issue Apr 18, 2019

kep: pod-overhead: clarify handling of Overhead
Updates to pod-overhead based in review discussion:
 - Clarify what Overhead is in the pod spec, and behavior when
 this is manually defined without a runtimeClass.
 - Clarify ResourceQuota changes necessary
 - Add in CRI API change to make pod details available
 - Define feature gate
 - Update runtimeClass definition

Fixes: kubernetes#688

Signed-off-by: Eric Ernst <eric.ernst@intel.com>

egernst added a commit to egernst/enhancements that referenced this issue Apr 18, 2019

kep: pod-overhead: clarify handling of Overhead
Updates to pod-overhead based in review discussion:
 - Clarify what Overhead is in the pod spec, and behavior when
 this is manually defined without a runtimeClass.
 - Clarify ResourceQuota changes necessary
 - Add in CRI API change to make pod details available
 - Define feature gate
 - Update runtimeClass definition

Fixes: kubernetes#688

Signed-off-by: Eric Ernst <eric.ernst@intel.com>

@tallclair tallclair reopened this Apr 19, 2019

bowei added a commit to bowei/enhancements that referenced this issue Apr 25, 2019

kep: pod-overhead: clarify handling of Overhead
Updates to pod-overhead based in review discussion:
 - Clarify what Overhead is in the pod spec, and behavior when
 this is manually defined without a runtimeClass.
 - Clarify ResourceQuota changes necessary
 - Add in CRI API change to make pod details available
 - Define feature gate
 - Update runtimeClass definition

Fixes: kubernetes#688

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
@tallclair

This comment has been minimized.

Copy link
Member Author

commented May 2, 2019

A piece that is missing from the KEP is a discussion of how overhead interacts with pod QoS.

In my opinion, the application of RuntimeClass overhead should maintain the pod's QoS class, which I think means the overhead and requests need to match for Guaranteed pods. If they need to match for guaranteed, does it make sense to allow them to be different for burstable pods? Furthermore, pod limit is meaningless as currently defined unless all containers in the pod also have limits.

Given this, I'm wondering whether it makes sense to simplify overhead to be a single number (per resource), not differentiating between requests and limits. I.e. change the type from *ResourceRequirements to ResourceList. How the overhead is used would depend on the container requests and limits:

  • No requests or limits (BestEffort) - overhead is entirely ignored. Should RuntimeClass controller leave it out, or still set it on the pod, and have it be ignored by the scheduler & kubelet?
  • Some requests or limits (Burstable) - overhead is included with requests when scheduling
  • All containers have limits, no requests (Burstable) - overhead is included in limits set on the pod cgroup. Overhead is not included in request (0)?
  • All containers have limits (Burstable or Guaranteed) - overhead is added to total requests and limits for scheduling and limiting the pod cgroup

@egernst @derekwaynecarr @dchen1107 @bsalamat WDYT?

@tallclair

This comment has been minimized.

Copy link
Member Author

commented May 2, 2019

An alternative implementation of the above suggestion would be to keep overhead as-is on the PodSpec, and apply the logic in the RuntimeClass controller when the overhead is set. This makes the actual overhead settings explicit, but makes the admission ordering more important (e.g. for interplay between overhead and LimitRanger, or other things that manipulate requests & limits).

@daminisatya

This comment has been minimized.

Copy link
Member

commented May 18, 2019

Hey, @tallclair @egernst I'm the v1.15 docs release shadow.

I see that you are targeting this enhancement for the 1.15 release. Does this require any new docs (or modifications)?

Just a friendly reminder we're looking for a PR against k/website (branch dev-1.15) due by Thursday, May 30th. It would be great if it's the start of the full documentation, but even a placeholder PR is acceptable. Let me know if you have any questions! 😄

Cheers!

@tallclair

This comment has been minimized.

Copy link
Member Author

commented May 20, 2019

Yes, this should get docs (tagged alpha). Thanks for the reminder.

@daminisatya

This comment has been minimized.

Copy link
Member

commented May 27, 2019

@tallclair Thank you, can you share a PR for the documentation if there is any?

@egernst

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

@kacole2

This comment has been minimized.

Copy link
Member

commented May 28, 2019

Hi @egernst @tallclair . Code Freeze is Thursday, May 30th 2019 @ EOD PST. All enhancements going into the release must be code-complete, including tests, and have docs PRs open.

Please list all current k/k PRs so they can be tracked going into freeze. If the PRs aren't merged by freeze, this feature will slip for the 1.15 release cycle. Only release-blocking issues and PRs will be allowed in the milestone.

If you know this will slip, please reply back and let us know. Thanks!

@kacole2

This comment has been minimized.

Copy link
Member

commented May 30, 2019

Hi @egernst @tallclair, today is code freeze for the 1.15 release cycle. I do not see a reply for any k/k PRs to track for this merge. It's now being marked as At Risk in the 1.15 Enhancement Tracking Sheet. If there is no response, or you respond with PRs to track and they are not merged by EOD PST, this will be dropped from the 1.15 Milestone. After this point, only release-blocking issues and PRs will be allowed in the milestone with an exception.

@egernst

This comment has been minimized.

Copy link
Contributor

commented May 30, 2019

@egernst

This comment has been minimized.

Copy link
Contributor

commented May 31, 2019

The API changes are all approved now, but need a rebase based on what merged last night. Unfortunately I am flying next several hours so cannot execute this until midday PST.

Can we extend the time period for this feature add, or at least for the API changes? Is there a forma process for filing for extension? Help?

@kacole2

This comment has been minimized.

Copy link
Member

commented May 31, 2019

@egernst code freeze has been extended to EOD today. If the issues have LGTM labels then you are all set

@kacole2

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

@egernst we are in code freeze and this didn't make it in time. I'm going to remove it from the 1.15 list. Please file an exception if you think it still needs to be added.

/milestone clear

@k8s-ci-robot k8s-ci-robot removed this from the v1.15 milestone Jun 3, 2019

@kacole2 kacole2 added tracked/no and removed tracked/yes labels Jun 3, 2019

logicalhan added a commit to logicalhan/enhancements that referenced this issue Jun 5, 2019

kep: pod-overhead: clarify handling of Overhead
Updates to pod-overhead based in review discussion:
 - Clarify what Overhead is in the pod spec, and behavior when
 this is manually defined without a runtimeClass.
 - Clarify ResourceQuota changes necessary
 - Add in CRI API change to make pod details available
 - Define feature gate
 - Update runtimeClass definition

Fixes: kubernetes#688

Signed-off-by: Eric Ernst <eric.ernst@intel.com>
@kacole2

This comment has been minimized.

Copy link
Member

commented Jul 9, 2019

Hi @egernst @tallclair , I'm the 1.16 Enhancement Lead. Is this feature going to be graduating alpha/beta/stable stages in 1.16? Please let me know so it can be added to the 1.16 Tracking Spreadsheet. If not's graduating, I will remove it from the milestone and change the tracked label.

Once coding begins or if it already has, please list all relevant k/k PRs in this issue so they can be tracked properly.

Milestone dates are Enhancement Freeze 7/30 and Code Freeze 8/29.

Thank you.

@egernst

This comment has been minimized.

Copy link
Contributor

commented Jul 9, 2019

Hi @kacole2 - This will be added as an alpha feature in 1.16. As I cannot edit the issue description, I'm updating details on applicable k/k PRs below:

@kacole2 kacole2 added this to the v1.16 milestone Jul 9, 2019

@kacole2 kacole2 added tracked/yes and removed tracked/no labels Jul 9, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.