Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for request #12035

Merged
merged 1 commit into from Aug 7, 2015
Merged

Add support for request #12035

merged 1 commit into from Aug 7, 2015

Conversation

AnanyaKumar
Copy link
Contributor

@@ -156,6 +156,18 @@ func addDefaultingFuncs() {
obj.APIVersion = "v1"
}
},
func(obj *ResourceRequirements) {
if obj.Limits != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a note that this sets the requests to the limit if the request was not specified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-bot
Copy link

k8s-bot commented Jul 30, 2015

GCE e2e build/test passed for commit f517513f56268764a28a8e3d69829b2d1f059139.

@k8s-bot
Copy link

k8s-bot commented Jul 30, 2015

GCE e2e build/test failed for commit 840a47880ac1387c7e8a251e51eabfcb5f8ab8e3.

@AnanyaKumar
Copy link
Contributor Author

I'll figure out what's wrong in the integration/e2e tests, handle LimitRanger, and then post an update soon. Please review after the update :)

@pmorie
Copy link
Member

pmorie commented Jul 31, 2015

@derekwaynecarr

On Thu, Jul 30, 2015 at 8:00 PM, Ananya Kumar notifications@github.com
wrote:

I'll figure out what's wrong in the integration/e2e tests, handle
LimitRanger, and then post an update soon.


Reply to this email directly or view it on GitHub
#12035 (comment)
.

allErrs = append(allErrs, validateBasicResource(quantity).Prefix(fmt.Sprintf("Resource %s: ", resourceName))...)
}
requestQuantity, exists := requirements.Requests[resourceName]
if exists && quantity.MilliValue() < requestQuantity.MilliValue() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message should use the right units for the resource. Seeing a converted value that looks different than the input value is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this needs a unit test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think you need to make the quantity version different if the resource is cpu or memory.

I think for memory, you want to do quantity.Value() < requestQuantity.Value() and for cpu, you want to use MilliValue(). Always using MilliValue can cause an overflow.

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/resource/quantity.go#L352

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done 2 and 3

@derekwaynecarr
Copy link
Member

I don't think you need to touch LimitRanger yet since you are making Requests == Limits in this PR. I am fine submitting a follow-on to make Limits equal the hard cap, and if we want a different defaulting mechanism for Limits/Requests, I will look to spec that out before end of week.

@AnanyaKumar
Copy link
Contributor Author

@derekwaynecarr Thanks for the review - I'll make the changes you suggested.

I think there is an issue with LimitRanger. Suppose a container comes in with a memory request of 1GB and an unspecified limit. Suppose that the default memory limit for the namespace (enforced by LimitRanger) is 500mb. Then, the container will become (request: 1GB, limit: 500MB) i.e. the container will become invalid.

The only issue is when the request is specified, and the limit isn't. I can think of 2 sensible options to deal with this:

  1. Since the user explicitly requested for 1GB, and we want to give the user what she asked for, set the limit to 1GB.
  2. Since the limit was unspecified, we set the limit to the default limit (500mb). To ensure the container is valid, we reduce the request to 500mb.

I think option 1 makes more sense, but I'm happy to hear your thoughts. More formally, if the request is specified, and the limit is not specified: if request < default limit, limit := default limit, else limit := request

@derekwaynecarr
Copy link
Member

Option 3
We set limit equal to the default (if unspecified), and if it turns out that request is > than limit, user gets a proper validation error.

Food for thought
I think we need to take a step back and ask a few questions before we best know how to proceed, and not make a decision based on how our current Salt scripts provision/use LimitRange resources in the default or kube-system namespace.

  1. Do we expect users to run tier 1-3 pods all in the same Namespace?
  2. Is it actually easier for users to scope a Namespace to only run one tier of pod at a time?
  3. If admins want to support all tiers in a single Namespace, why would an admin add the LimitRange to that Namespace and ever apply a LimitRange.Default?

The answer to 1 is really up to us to recommend. If we answer yes here, then we should strive to keep the meaning of LimitRange.Default in that Namespace as simple as possible which is why I suggest Option 3.

The answer to 2 is maybe yes. Maybe we should think about that as part of your larger design proposal. It would potentially simplify a number of things, and even if not enforced, it may end up being a good way to partition the cluster for users to understand. Right now, I do not see how tier 3 pods are supportable in any namespace that provides a LimitRange with a min/max since that means requests could never be not set.

The answer to 3 is it depends. I think an admin may want to still apply a LimitRange because they want to ensure they do not accept pods with odd shapes that consume too few resources. Think 500 pods that all use Docker minimum of 4 MB of memory. The Kubelet has a limit to number of pods it supports, so you may end up with a lot of unused node resource in a 5 node cluster. So in this case, it would make sense to set a LimitRange.Min, and potentially a LimitRange.Max, but no LimitRange.Default

My initial bias:

  1. Request MUST be less than LimitRange.Max (if specified)
  2. Request MUST be greater than LimitRange.Min (if specified)
  3. Limit (if omitted) MUST default to LimitRange.Default (if specified)

These rules must apply at both the pod and container scopes for a LimitRange

If we want to have a means to default Request, we should debate the following:

  1. Request (if Limit is omitted) could default to LimitRange.Min (if specified)
  2. Request could default to LimitRange.DefaultRequest (if specified, but this is a new field)

Since the Request represents a Min and not a Max for usage, this seems more natural.

At one point, I had mapped Request to LimitRange.Min (if specified), but since Request was not in scope for 1.0, I pulled it out at the last moment.

Maybe this discussion should be happening in a separate issue that relates LimitRange to QoS which is what I was working on authoring, and why I asked that you not change LimitRanger at this time.

Hope that helps.

cc @bgrant0607 @erictune @smarterclayton as I am working on a development proposal to more succinctly state the above.

@derekwaynecarr
Copy link
Member

Another caveat, as I noted in VC we also want to support a user creating a pod with requests, no limits, but server applies the limit via a function relative to the request, most likely via some admission control plugin. We easily anticipate never setting a Limit, but we will still apply a limit on some projects behind the scenes.

@AnanyaKumar
Copy link
Contributor Author

@derekwaynecarr Sounds good - you know a lot more than I do about LimitRanger, and about its use cases, so I look forward to your design. Anyway, it seems like option 3 is what happen right now and I'll leave it as is.

The flow of logic for API objects appears to be: apply defaults -> admission control -> validation. There's only one issue for now: if you create a container without requests and limits, a LimitRanger will apply the limit, but the request will be left unspecified. In other words, because defaulting happens before admission control, request will not be set to limit. This would affect how the scheduler schedules pods, so it won't be backwards compatible.

I just wanted to make a note of this - if you're fine with the behavior as an intermediate step, I'm fine with it too :) If you think I should patch the behavior, I could add a 3-4 line patch too!

@mikedanese mikedanese added the area/api Indicates an issue on api area. label Jul 31, 2015
@mikedanese
Copy link
Member

assigning to @bgrant0607 since it's an api change. feel free to reassign to appropriate reviewer.

@bgrant0607
Copy link
Member

cc @davidopp

@k8s-bot
Copy link

k8s-bot commented Jul 31, 2015

GCE e2e build/test passed for commit 288ff7f6baf31b79577e699142418a2c9980716d.

@derekwaynecarr
Copy link
Member

I am submitting a parallel PR today that makes Limits a CPU quota. I am
concerned with this PR going in before that for systems that have existing
pods with restart policy of always no longer constraining a container.
This PR is fine for new pods that come into the system with the defaulting
logic as I understand it.

On Friday, July 31, 2015, Kubernetes Bot notifications@github.com wrote:

GCE e2e build/test passed for commit 288ff7f
288ff7f
.


Reply to this email directly or view it on GitHub
#12035 (comment)
.

@derekwaynecarr
Copy link
Member

I am concerned about this PR going in for clusters that have existing data with running pods that used Limits. For new data, I see that this PR will default a nil Request to the Limit, but that is insufficient in my view when I have an existing deployment with running pods whose restartPolicy=Always. If one of those containers were to restart after upgrading, they would no longer have any cpushare limiting.

My concerns would be alleviated if we could cap those existing containers with cpu-quota. That feature was only added in Docker 1.7 (moby/moby#10736), and I think we only prereq Docker 1.6 at this time. So we would need to change that, and also update the go-dockerclient to support setting cpu-quota (which it does not appear to do yet today https://github.com/fsouza/go-dockerclient/blob/master/container.go#L178)

Make sense? Objections?

@pmorie
Copy link
Member

pmorie commented Jul 31, 2015

@AnanyaKumar This PR / your eventual squashed commit should have a more descriptive name

@derekwaynecarr
Copy link
Member

cc @eparis

@pmorie
Copy link
Member

pmorie commented Jul 31, 2015

@derekwaynecarr This makes me wonder whether we should have a precondition while bootstrapping the kubelet that checks the docker version on the host and at least warns you if some behaviors may not work.

@pmorie
Copy link
Member

pmorie commented Jul 31, 2015

@AnanyaKumar @derekwaynecarr If we move forward with this someone should do the legwork to explore whether rkt has support for CPU quota and create an issue for it if not. It looks like systemd-nspawn supports setting cgroup CPUShares

@derekwaynecarr
Copy link
Member

Not my day, so go-dockerclient does have CpuQuota on HostConfig, so only blocker looks to be that we would need to prereq docker 1.7. Will put together my parallel PR to map Limits to quota now.

@@ -156,6 +156,19 @@ func addDefaultingFuncs() {
obj.APIVersion = "v1"
}
},
func(obj *ResourceRequirements) {
// Set requests to limits if limits are not specified.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/if limits are not specified/if requests are not specified/

@davidopp
Copy link
Member

Looks basically fine to me.

I am concerned about this PR going in for clusters that have existing data with running pods that used
Limits. For new data, I see that this PR will default a nil Request to the Limit, but that is insufficient in
my view when I have an existing deployment with running pods whose restartPolicy=Always. If one of
those containers were to restart after upgrading, they would no longer have any cpushare limiting.

This is a good point. Maybe the Kubelet should set cpushares from limit when request is unset? IIUC Kubelet should never see a Pod in that condition (request unset, limit set) except when we upgrade the Kubelet with this PR.

@bgrant0607
Copy link
Member

It was late and I was jetlagged. :-)

Yes, using limit when there is no request is equivalent to setting request from limit by default, so that seems reasonable.

Case 3: no request and no limit -- can't set shares based on request if there isn't one :-)

@k8s-bot
Copy link

k8s-bot commented Aug 5, 2015

GCE e2e build/test passed for commit 22464ae660cca48b617e90c14715f6c4cb6c631f.

@AnanyaKumar
Copy link
Contributor Author

@bgrant0607 Oh whoops, you're right about case 3! :)

@k8s-bot
Copy link

k8s-bot commented Aug 5, 2015

GCE e2e build/test passed for commit 7f7637aeb748486fb1a2e3ca3ed9d1e741d59e83.

cpuShares := milliCPUToShares(container.Resources.Limits.Cpu().MilliValue())
cpuRequest := container.Resources.Requests.Cpu()
var cpuShares int64
if cpuRequest.Amount != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment that explains why this is doing what it is doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test failed for commit 39935c7f937935c8b47393d34a9cd180dbadb076.

@bgrant0607
Copy link
Member

Need to update the swagger again.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit 2b414de78b186fd71d4ff4470ad1ac99ef5a5a64.

@k8s-bot
Copy link

k8s-bot commented Aug 6, 2015

GCE e2e build/test passed for commit ef1e576.

@AnanyaKumar
Copy link
Contributor Author

@bgrant0607 Yup, I did, probably right around the same time you commented!

@bgrant0607
Copy link
Member

LGTM.

@derekwaynecarr How would like you like to coordinate merges of the quota and limitrange changes?

@AnanyaKumar
Copy link
Contributor Author

@derekwaynecarr I think the changes here are backwards compatible regardless of LimitRange changes, so this PR should be safe to merge? Let me know what you think! :)

@derekwaynecarr
Copy link
Member

@bgrant0607 - I want to physically test this PR to ensure that the defaults are actually getting applied since we lack a clear e2e in this area. Ask for +1 day to test this tomorrow AM EST. It looks like we should be fine from a backwards compatibility standpoint, but I am cautious.

@bgrant0607
Copy link
Member

SGTM. Will hold off on applying the lgtm label.

@derekwaynecarr
Copy link
Member

LGTM - defaulting worked well after doing an upgrade from old to new. Will send the follow-on PRs for LimitRange/ResourceQuota next week. Thanks @AnanyaKumar !

@bgrant0607
Copy link
Member

LGTM

@bgrant0607 bgrant0607 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2015
@davidopp
Copy link
Member

davidopp commented Aug 7, 2015

LGTM

satnam6502 added a commit that referenced this pull request Aug 7, 2015
@satnam6502 satnam6502 merged commit bee48f4 into kubernetes:master Aug 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. lgtm "Looks good to me", indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet