Add support for request #12035

AnanyaKumar · 2015-07-30T19:57:08Z

vmarmol · 2015-07-30T20:01:02Z

pkg/api/v1/defaults.go

@@ -156,6 +156,18 @@ func addDefaultingFuncs() {
 				obj.APIVersion = "v1"
 			}
 		},
+		func(obj *ResourceRequirements) {
+			if obj.Limits != nil {


Add a note that this sets the requests to the limit if the request was not specified.

k8s-bot · 2015-07-30T20:21:56Z

GCE e2e build/test passed for commit f517513f56268764a28a8e3d69829b2d1f059139.

k8s-bot · 2015-07-30T20:37:08Z

GCE e2e build/test failed for commit 840a47880ac1387c7e8a251e51eabfcb5f8ab8e3.

AnanyaKumar · 2015-07-31T00:00:23Z

I'll figure out what's wrong in the integration/e2e tests, handle LimitRanger, and then post an update soon. Please review after the update :)

pmorie · 2015-07-31T00:05:01Z

@derekwaynecarr

On Thu, Jul 30, 2015 at 8:00 PM, Ananya Kumar notifications@github.com
wrote:

I'll figure out what's wrong in the integration/e2e tests, handle
LimitRanger, and then post an update soon.

—
Reply to this email directly or view it on GitHub
#12035 (comment)
.

derekwaynecarr · 2015-07-31T00:06:03Z

pkg/api/validation/validation.go

+			allErrs = append(allErrs, validateBasicResource(quantity).Prefix(fmt.Sprintf("Resource %s: ", resourceName))...)
+		}
+		requestQuantity, exists := requirements.Requests[resourceName]
+		if exists && quantity.MilliValue() < requestQuantity.MilliValue() {


The error message should use the right units for the resource. Seeing a converted value that looks different than the input value is confusing.

Also this needs a unit test.

I also think you need to make the quantity version different if the resource is cpu or memory.

I think for memory, you want to do quantity.Value() < requestQuantity.Value() and for cpu, you want to use MilliValue(). Always using MilliValue can cause an overflow.

https://github.com/GoogleCloudPlatform/kubernetes/blob/master/pkg/api/resource/quantity.go#L352

Done 2 and 3

derekwaynecarr · 2015-07-31T00:34:41Z

I don't think you need to touch LimitRanger yet since you are making Requests == Limits in this PR. I am fine submitting a follow-on to make Limits equal the hard cap, and if we want a different defaulting mechanism for Limits/Requests, I will look to spec that out before end of week.

AnanyaKumar · 2015-07-31T01:07:33Z

@derekwaynecarr Thanks for the review - I'll make the changes you suggested.

I think there is an issue with LimitRanger. Suppose a container comes in with a memory request of 1GB and an unspecified limit. Suppose that the default memory limit for the namespace (enforced by LimitRanger) is 500mb. Then, the container will become (request: 1GB, limit: 500MB) i.e. the container will become invalid.

The only issue is when the request is specified, and the limit isn't. I can think of 2 sensible options to deal with this:

Since the user explicitly requested for 1GB, and we want to give the user what she asked for, set the limit to 1GB.
Since the limit was unspecified, we set the limit to the default limit (500mb). To ensure the container is valid, we reduce the request to 500mb.

I think option 1 makes more sense, but I'm happy to hear your thoughts. More formally, if the request is specified, and the limit is not specified: if request < default limit, limit := default limit, else limit := request

derekwaynecarr · 2015-07-31T03:03:02Z

Option 3
We set limit equal to the default (if unspecified), and if it turns out that request is > than limit, user gets a proper validation error.

Food for thought
I think we need to take a step back and ask a few questions before we best know how to proceed, and not make a decision based on how our current Salt scripts provision/use LimitRange resources in the default or kube-system namespace.

Do we expect users to run tier 1-3 pods all in the same Namespace?
Is it actually easier for users to scope a Namespace to only run one tier of pod at a time?
If admins want to support all tiers in a single Namespace, why would an admin add the LimitRange to that Namespace and ever apply a LimitRange.Default?

The answer to 1 is really up to us to recommend. If we answer yes here, then we should strive to keep the meaning of LimitRange.Default in that Namespace as simple as possible which is why I suggest Option 3.

The answer to 2 is maybe yes. Maybe we should think about that as part of your larger design proposal. It would potentially simplify a number of things, and even if not enforced, it may end up being a good way to partition the cluster for users to understand. Right now, I do not see how tier 3 pods are supportable in any namespace that provides a LimitRange with a min/max since that means requests could never be not set.

The answer to 3 is it depends. I think an admin may want to still apply a LimitRange because they want to ensure they do not accept pods with odd shapes that consume too few resources. Think 500 pods that all use Docker minimum of 4 MB of memory. The Kubelet has a limit to number of pods it supports, so you may end up with a lot of unused node resource in a 5 node cluster. So in this case, it would make sense to set a LimitRange.Min, and potentially a LimitRange.Max, but no LimitRange.Default

My initial bias:

Request MUST be less than LimitRange.Max (if specified)
Request MUST be greater than LimitRange.Min (if specified)
Limit (if omitted) MUST default to LimitRange.Default (if specified)

These rules must apply at both the pod and container scopes for a LimitRange

If we want to have a means to default Request, we should debate the following:

Request (if Limit is omitted) could default to LimitRange.Min (if specified)
Request could default to LimitRange.DefaultRequest (if specified, but this is a new field)

Since the Request represents a Min and not a Max for usage, this seems more natural.

At one point, I had mapped Request to LimitRange.Min (if specified), but since Request was not in scope for 1.0, I pulled it out at the last moment.

Maybe this discussion should be happening in a separate issue that relates LimitRange to QoS which is what I was working on authoring, and why I asked that you not change LimitRanger at this time.

Hope that helps.

cc @bgrant0607 @erictune @smarterclayton as I am working on a development proposal to more succinctly state the above.

derekwaynecarr · 2015-07-31T03:22:01Z

Another caveat, as I noted in VC we also want to support a user creating a pod with requests, no limits, but server applies the limit via a function relative to the request, most likely via some admission control plugin. We easily anticipate never setting a Limit, but we will still apply a limit on some projects behind the scenes.

AnanyaKumar · 2015-07-31T03:23:28Z

@derekwaynecarr Sounds good - you know a lot more than I do about LimitRanger, and about its use cases, so I look forward to your design. Anyway, it seems like option 3 is what happen right now and I'll leave it as is.

The flow of logic for API objects appears to be: apply defaults -> admission control -> validation. There's only one issue for now: if you create a container without requests and limits, a LimitRanger will apply the limit, but the request will be left unspecified. In other words, because defaulting happens before admission control, request will not be set to limit. This would affect how the scheduler schedules pods, so it won't be backwards compatible.

I just wanted to make a note of this - if you're fine with the behavior as an intermediate step, I'm fine with it too :) If you think I should patch the behavior, I could add a 3-4 line patch too!

mikedanese · 2015-07-31T03:59:37Z

assigning to @bgrant0607 since it's an api change. feel free to reassign to appropriate reviewer.

bgrant0607 · 2015-07-31T04:47:49Z

cc @davidopp

k8s-bot · 2015-07-31T06:01:49Z

GCE e2e build/test passed for commit 288ff7f6baf31b79577e699142418a2c9980716d.

derekwaynecarr · 2015-07-31T13:21:45Z

I am submitting a parallel PR today that makes Limits a CPU quota. I am
concerned with this PR going in before that for systems that have existing
pods with restart policy of always no longer constraining a container.
This PR is fine for new pods that come into the system with the defaulting
logic as I understand it.

On Friday, July 31, 2015, Kubernetes Bot notifications@github.com wrote:

GCE e2e build/test passed for commit 288ff7f
288ff7f
.

Build Log
https://storage.cloud.google.com/kubernetes-jenkins/pr-logs/288ff7f6baf31b79577e699142418a2c9980716d/kubernetes-pull-build-test-e2e-gce/3928/build-log.txt

Test Artifacts
https://console.developers.google.com/storage/browser/kubernetes-jenkins/pr-logs/288ff7f6baf31b79577e699142418a2c9980716d/kubernetes-pull-build-test-e2e-gce/3928/_artifacts/

Internal Jenkins Results
http://goto.google.com/prkubekins/job/kubernetes-pull-build-test-e2e-gce//3928

—
Reply to this email directly or view it on GitHub
#12035 (comment)
.

derekwaynecarr · 2015-07-31T13:52:17Z

I am concerned about this PR going in for clusters that have existing data with running pods that used Limits. For new data, I see that this PR will default a nil Request to the Limit, but that is insufficient in my view when I have an existing deployment with running pods whose restartPolicy=Always. If one of those containers were to restart after upgrading, they would no longer have any cpushare limiting.

My concerns would be alleviated if we could cap those existing containers with cpu-quota. That feature was only added in Docker 1.7 (moby/moby#10736), and I think we only prereq Docker 1.6 at this time. So we would need to change that, and also update the go-dockerclient to support setting cpu-quota (which it does not appear to do yet today https://github.com/fsouza/go-dockerclient/blob/master/container.go#L178)

Make sense? Objections?

pmorie · 2015-07-31T13:54:14Z

@AnanyaKumar This PR / your eventual squashed commit should have a more descriptive name

derekwaynecarr · 2015-07-31T14:05:44Z

cc @eparis

pmorie · 2015-07-31T14:55:49Z

@derekwaynecarr This makes me wonder whether we should have a precondition while bootstrapping the kubelet that checks the docker version on the host and at least warns you if some behaviors may not work.

pmorie · 2015-07-31T16:02:05Z

@AnanyaKumar @derekwaynecarr If we move forward with this someone should do the legwork to explore whether rkt has support for CPU quota and create an issue for it if not. It looks like systemd-nspawn supports setting cgroup CPUShares

derekwaynecarr · 2015-07-31T16:47:07Z

Not my day, so go-dockerclient does have CpuQuota on HostConfig, so only blocker looks to be that we would need to prereq docker 1.7. Will put together my parallel PR to map Limits to quota now.

davidopp · 2015-07-31T16:54:09Z

pkg/api/v1/defaults.go

@@ -156,6 +156,19 @@ func addDefaultingFuncs() {
 				obj.APIVersion = "v1"
 			}
 		},
+		func(obj *ResourceRequirements) {
+			// Set requests to limits if limits are not specified.


s/if limits are not specified/if requests are not specified/

davidopp · 2015-07-31T18:28:02Z

Looks basically fine to me.

I am concerned about this PR going in for clusters that have existing data with running pods that used
Limits. For new data, I see that this PR will default a nil Request to the Limit, but that is insufficient in
my view when I have an existing deployment with running pods whose restartPolicy=Always. If one of
those containers were to restart after upgrading, they would no longer have any cpushare limiting.

This is a good point. Maybe the Kubelet should set cpushares from limit when request is unset? IIUC Kubelet should never see a Pod in that condition (request unset, limit set) except when we upgrade the Kubelet with this PR.

bgrant0607 · 2015-08-05T23:23:10Z

It was late and I was jetlagged. :-)

Yes, using limit when there is no request is equivalent to setting request from limit by default, so that seems reasonable.

Case 3: no request and no limit -- can't set shares based on request if there isn't one :-)

k8s-bot · 2015-08-05T23:25:24Z

GCE e2e build/test passed for commit 22464ae660cca48b617e90c14715f6c4cb6c631f.

AnanyaKumar · 2015-08-05T23:29:47Z

@bgrant0607 Oh whoops, you're right about case 3! :)

k8s-bot · 2015-08-05T23:38:09Z

GCE e2e build/test passed for commit 7f7637aeb748486fb1a2e3ca3ed9d1e741d59e83.

bgrant0607 · 2015-08-05T23:55:38Z

pkg/kubelet/dockertools/manager.go

-	cpuShares := milliCPUToShares(container.Resources.Limits.Cpu().MilliValue())
+	cpuRequest := container.Resources.Requests.Cpu()
+	var cpuShares int64
+	if cpuRequest.Amount != nil {


Please add a comment that explains why this is doing what it is doing.

k8s-bot · 2015-08-06T01:48:21Z

GCE e2e build/test failed for commit 39935c7f937935c8b47393d34a9cd180dbadb076.

bgrant0607 · 2015-08-06T02:06:49Z

Need to update the swagger again.

k8s-bot · 2015-08-06T02:09:16Z

GCE e2e build/test passed for commit 2b414de78b186fd71d4ff4470ad1ac99ef5a5a64.

k8s-bot · 2015-08-06T02:21:40Z

GCE e2e build/test passed for commit ef1e576.

AnanyaKumar · 2015-08-06T03:01:33Z

@bgrant0607 Yup, I did, probably right around the same time you commented!

bgrant0607 · 2015-08-06T03:02:37Z

LGTM.

@derekwaynecarr How would like you like to coordinate merges of the quota and limitrange changes?

AnanyaKumar · 2015-08-06T08:51:32Z

@derekwaynecarr I think the changes here are backwards compatible regardless of LimitRange changes, so this PR should be safe to merge? Let me know what you think! :)

derekwaynecarr · 2015-08-06T20:55:05Z

@bgrant0607 - I want to physically test this PR to ensure that the defaults are actually getting applied since we lack a clear e2e in this area. Ask for +1 day to test this tomorrow AM EST. It looks like we should be fine from a backwards compatibility standpoint, but I am cautious.

bgrant0607 · 2015-08-06T23:20:53Z

SGTM. Will hold off on applying the lgtm label.

derekwaynecarr · 2015-08-07T17:26:40Z

LGTM - defaulting worked well after doing an upgrade from old to new. Will send the follow-on PRs for LimitRange/ResourceQuota next week. Thanks @AnanyaKumar !

bgrant0607 · 2015-08-07T18:41:07Z

LGTM

davidopp · 2015-08-07T19:36:43Z

LGTM

Add support for request

googlebot added the cla: yes label Jul 30, 2015

AnanyaKumar force-pushed the requests branch from f517513 to 840a478 Compare July 30, 2015 20:00

vmarmol reviewed Jul 30, 2015
View reviewed changes

derekwaynecarr reviewed Jul 31, 2015
View reviewed changes

mikedanese assigned bgrant0607 Jul 31, 2015

mikedanese added the area/api Indicates an issue on api area. label Jul 31, 2015

pmorie mentioned this pull request Jul 31, 2015

Should the kubelet perform a version check on the container runtime? #12090

Closed

davidopp reviewed Jul 31, 2015
View reviewed changes

bgrant0607 reviewed Aug 5, 2015
View reviewed changes

AnanyaKumar force-pushed the requests branch from 7f7637a to 39935c7 Compare August 6, 2015 01:45

Add support for request

ef1e576

AnanyaKumar force-pushed the requests branch from 2b414de to ef1e576 Compare August 6, 2015 02:02

bgrant0607 mentioned this pull request Aug 6, 2015

Add resource types for ingress and egress bandwidth #12187

Closed

derekwaynecarr mentioned this pull request Aug 6, 2015

Disable limit checking for scheduler #12345

Closed

bgrant0607 added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2015

satnam6502 added a commit that referenced this pull request Aug 7, 2015

Merge pull request #12035 from AnanyaKumar/requests

bee48f4

Add support for request

satnam6502 merged commit bee48f4 into kubernetes:master Aug 7, 2015

dchen1107 mentioned this pull request Oct 21, 2015

Mirror pods in a delete create loop in version skewed cluster (1.1 master, 1.0 node) #15960

Closed

Add support for request #12035

Add support for request #12035

Conversation

AnanyaKumar commented Jul 30, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Jul 30, 2015

k8s-bot commented Jul 30, 2015

AnanyaKumar commented Jul 31, 2015

pmorie commented Jul 31, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

derekwaynecarr commented Jul 31, 2015

AnanyaKumar commented Jul 31, 2015

derekwaynecarr commented Jul 31, 2015

derekwaynecarr commented Jul 31, 2015

AnanyaKumar commented Jul 31, 2015

mikedanese commented Jul 31, 2015

bgrant0607 commented Jul 31, 2015

k8s-bot commented Jul 31, 2015

derekwaynecarr commented Jul 31, 2015

derekwaynecarr commented Jul 31, 2015

pmorie commented Jul 31, 2015

derekwaynecarr commented Jul 31, 2015

pmorie commented Jul 31, 2015

pmorie commented Jul 31, 2015

derekwaynecarr commented Jul 31, 2015

Choose a reason for hiding this comment

davidopp commented Jul 31, 2015

bgrant0607 commented Aug 5, 2015

k8s-bot commented Aug 5, 2015

AnanyaKumar commented Aug 5, 2015

k8s-bot commented Aug 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

k8s-bot commented Aug 6, 2015

bgrant0607 commented Aug 6, 2015

k8s-bot commented Aug 6, 2015

k8s-bot commented Aug 6, 2015

AnanyaKumar commented Aug 6, 2015

bgrant0607 commented Aug 6, 2015

AnanyaKumar commented Aug 6, 2015

derekwaynecarr commented Aug 6, 2015

bgrant0607 commented Aug 6, 2015

derekwaynecarr commented Aug 7, 2015

bgrant0607 commented Aug 7, 2015

davidopp commented Aug 7, 2015