Deploy kube-dns with cluster-proportional-autoscaler #33239

MrHohn · 2016-09-22T01:28:09Z

This PR integrates cluster-proportional-autoscaler with kube-dns for DNS horizontal autoscaling.

Fixes #28648 and #27781.

This change is

MrHohn · 2016-09-22T03:25:03Z

@k8s-bot gce e2e test this issue:#31561

MrHohn · 2016-09-22T03:39:40Z

@k8s-bot kubemark test this issue: #32789

MrHohn · 2016-09-22T03:41:13Z

@k8s-bot gke e2e test this issue:#28721

k8s-github-robot · 2016-09-22T17:33:41Z

@MrHohn
You must link to the test flake issue which caused you to request this manual re-test.
Re-test requests should be in the form of: k8s-bot test this issue: #<number>
Here is the list of open test flakes.

roberthbailey · 2016-09-23T22:42:53Z

Review status: 0 of 20 files reviewed at latest revision, 3 unresolved discussions.

cluster/addons/dns/Makefile, line 32 [r1] (raw file):

  sed -f transforms2sed.sed $<  | sed s/__SOURCE_FILENAME__/$</g > $@

transform: skydns-rc.yaml.in skydns-rc.yaml.sed skydns-svc.yaml.in skydns-svc.yaml.sed kubedns-autoscaler-configmap.yaml.in kubedns-autoscaler-configmap.yaml.sed kubedns-autoscaler-deployment.yaml.in kubedns-autoscaler-deployment.yaml.sed

I don't see any parameterized variables in any of the new files that warrant the base -> yaml.in and base -> yaml.sed transformations. Until we need variable substitution, it might be simpler to just have kubedns-autoscaler-configmap.yaml and kubedns-autoscaler-deployment.yaml as plain yaml files.

cluster/images/hyperkube/static-pods/addon-manager-multinode.json, line 14 [r1] (raw file):

      {
        "name": "kube-addon-manager",
        "image": "gcr.io/zihongz-kubernetes-codelab/kube-addon-manager-ARCH:v5.2",

Did you intend to commit this change?

cluster/images/hyperkube/static-pods/addon-manager-singlenode.json, line 14 [r1] (raw file):

      {
        "name": "kube-addon-manager",
        "image": "gcr.io/zihongz-kubernetes-codelab/kube-addon-manager-ARCH:v5.2",

And this one?

Comments from Reviewable

MrHohn · 2016-09-23T22:49:53Z

Review status: 0 of 20 files reviewed at latest revision, 3 unresolved discussions.

cluster/addons/dns/Makefile, line 32 at r1 (raw file):

Previously, roberthbailey (Robert Bailey) wrote…

I don't see any parameterized variables in any of the new files that warrant the base -> yaml.in and base -> yaml.sed transformations. Until we need variable substitution, it might be simpler to just have kubedns-autoscaler-configmap.yaml and kubedns-autoscaler-deployment.yaml as plain yaml files.

You are right, I will change it.

cluster/images/hyperkube/static-pods/addon-manager-multinode.json, line 14 at r1 (raw file):

Previously, roberthbailey (Robert Bailey) wrote…

Did you intend to commit this change?

No. I'm not sure when would the new version Addon Manager be pushed (to support ConfigMap). This private one is for testing only.

Comments from Reviewable

MrHohn · 2016-09-30T00:13:25Z

@bowei

MrHohn · 2016-09-30T00:35:05Z

For now the ConfigMap params (cores_to_replicas_map and nodes_to_replicas_map) in used for kube-dns are not well examined and tested.

As @bowei indicates, we should regenerate these params based on some real metrics. And before that, we need to well document this intermediate params.

thockin · 2016-10-04T05:10:32Z

Review status: 0 of 20 files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.

cluster/addons/dns/kubedns-autoscaler-configmap.yaml.base, line 32 at r1 (raw file):

      [
        [ 1,1 ],
        [ 4,2 ],

We don't need a 2nd replica for 4 cores. Maybe for 64 cores

cluster/addons/dns/kubedns-autoscaler-deployment.yaml.base, line 20 at r1 (raw file):

kind: Deployment
metadata:
  name: kube-dns-cluster-proportional-autoscaler-v1

You don't need the "v1" gunk in the name for a deployment.

cluster/addons/dns/kubedns-autoscaler-deployment.yaml.base, line 24 at r1 (raw file):

  labels:
    k8s-app: kubedns-autoscaler
    version: v1

I don't think you need this either

cluster/addons/dns/kubedns-autoscaler-deployment.yaml.base, line 38 at r1 (raw file):

        image: gcr.io/zihongz-kubernetes-codelab/cluster-proportional-autoscaler:0.6
        resources:
          limits:

I would leave limits off, and let them default. I especially would not do burstable for memory. Get a sense of how much you think this really needs to set request properly.

cluster/addons/dns/kubedns-autoscaler-deployment.yaml.base, line 48 at r1 (raw file):

          - --namespace=kube-system
          - --configmap=kube-dns-cluster-proportional-autoscaler-params
          - --target=rc/kube-dns-v19

This means we have to update this file when we update DNS. The only maybe-better answer I see is to set a selector here, and I don't think we want that. Open to ideas.

cluster/addons/dns/kubedns-autoscaler-deployment.yaml.sed, line 1 at r1 (raw file):

# Copyright 2016 The Kubernetes Authors.

I don't think you need the in/base/sed for these files - they don't have any templated content. Just yaml files should suffice.

cluster/saltbase/salt/kube-addons/init.sls, line 95 at r1 (raw file):

  file.managed:
    - source: salt://kube-addons/dns/kubedns-autoscaler-configmap.yaml.in
    - template: jinja

This is not a jinja file, so it doesn't need this.

Comments from Reviewable

thockin · 2016-10-04T05:11:37Z

Is the hscaler itself committed?

Reviewed 20 of 20 files at r1.
Review status: all files reviewed at latest revision, 10 unresolved discussions, some commit checks failed.

Comments from Reviewable

MrHohn · 2016-10-04T05:46:30Z

Thanks for all the comments. Not yet, actually that PR is the one waiting for review.

MrHohn · 2016-10-27T19:52:16Z

The new commits deploy kube-dns with cluster-proportional-autoscaler by making the autoscaler itself as an addon. This feature is turned on by default for gci and container-VM on GCE.

Notice that the this autoscaler still utilizes ConfigMap to retrieve autoscaling parameters, which is passed into the autoscaler pod by startup. This ConfigMap itself is not an addon, and Addon Manager will not reconcile it. Hence customers would be able to modify the autoscaling parameters. If this ConfigMap got deleted accidentally, autoscaler will manage to re-create it according to the default parameters passed in.

Beside, a basic e2e test for this autoscaling feature is added. It covers cases like cluster size changed, parameters changed, ConfigMap got deleted, autoscaler pod got deleted, etc. Total running time around 5 mins.

@thockin @bowei

MrHohn · 2016-10-28T16:53:37Z

cluster/addons/dns-horizontal-autoscaler/dns-horizontal-autoscaler.yaml

+      containers:
+      - name: autoscaler
+        image: gcr.io/google_containers/cluster-proportional-autoscaler-amd64:0.8.5
+        imagePullPolicy: Always


Sorry, will remove this line.

MrHohn · 2016-10-28T18:48:06Z

@rmmh Sorry to bother. Ran ./hack/update_owners.py, but ./hack/verify-test-owners.sh still failed on Jenkins.

MrHohn · 2016-11-07T18:52:57Z

The new commit sets cpu(20m)/memory(10Mi) request for the autoscaler pod. Limit is leaved unset.

GKE config is updated. Also piped down the corresponding env variable to other cloudproviders. But dns autoscaling is turned on by default only on gce.

k8s-ci-robot · 2016-11-07T18:56:54Z

Jenkins Kubemark GCE e2e failed for commit b7eeebf11538187412041464f93b2d86aef0181c. Full PR test history.

The magic incantation to run this job again is @k8s-bot kubemark e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-07T18:56:55Z

Jenkins GKE smoke e2e failed for commit b7eeebf11538187412041464f93b2d86aef0181c. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-07T18:57:24Z

Jenkins GCI GCE e2e failed for commit b7eeebf11538187412041464f93b2d86aef0181c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-07T18:57:24Z

Jenkins GCE etcd3 e2e failed for commit b7eeebf11538187412041464f93b2d86aef0181c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gce etcd3 e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-07T18:57:54Z

Jenkins GCE e2e failed for commit b7eeebf11538187412041464f93b2d86aef0181c. Full PR test history.

The magic incantation to run this job again is @k8s-bot cvm gce e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

k8s-ci-robot · 2016-11-07T18:57:55Z

Jenkins GCI GKE smoke e2e failed for commit b7eeebf11538187412041464f93b2d86aef0181c. Full PR test history.

The magic incantation to run this job again is @k8s-bot gci gke e2e test this. Please help us cut down flakes by linking to an open flake issue when you hit one in your PR.

The e2e tests cover cases like cluster size changed, parameters changed, ConfigMap got deleted, autoscaler pod got deleted, etc. They are separated into a fast part(could be run parallelly) and a slow part(put in [serial]). The fast part of the e2e tests cost around 50 seconds to run.

MrHohn · 2016-11-07T21:15:34Z

Tests passed this time.

bprashanth · 2016-11-07T21:32:49Z

LGTM, make sure you do https://github.com/kubernetes/kubernetes/pull/33239/files#r86699869

bprashanth · 2016-11-08T02:33:55Z

I'm bumping to P2 because we need this to go in before #36008, but I want to lgtm that one as it's the last day before code freeze

k8s-github-robot · 2016-11-08T02:55:41Z

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

k8s-github-robot · 2016-11-08T03:31:31Z

Automatic merge from submit-queue

@yujuhong

Automatic merge from submit-queue CRI: Support string user name. #33239 and #34811 combined together broke the cri e2e test. https://k8s-testgrid.appspot.com/google-gce#gci-gce-cri The reason is that: 1) In dockershim and dockertools, we assume that `Image.Config.User` should be an integer. However, sometimes when user build the image with `USER nobody:nobody` or `USER root:root`, the field will become `nobody:nobody` and `root:root`. This makes dockershim to always return error. 2) The new kube-dns-autoscaler image is using `USER nobody:nobody`. (See https://github.com/kubernetes-incubator/cluster-proportional-autoscaler/blob/master/Dockerfile.in#L21) This doesn't break the normal e2e test, because in dockertools [we only inspect image uid if `RunAsNonRoot` is set](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/dockertools/docker_manager.go#L2333-L2338), which is just a coincidence. However, in kuberuntime, [we always inspect image uid first](https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/kuberuntime/kuberuntime_container.go#L141). This PR adds literal `root` and `nobody` support. One problem is that `nobody` is not quite the same in different OS distros. Usually it should be `65534`, but some os distro doesn't follow that. For example, Fedora is using `99`. (See https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/Q5GCKZ7Q7PAUQW66EV7IBJGSRJWYXBBH/?sort=date) Possible solution: * Option 1: ~~Just use `65534`. This is fine because currently we only need to know whether the user is root or not.~~ Actually, we need to pass the user id to runtime when creating a container. * Option 2: Return the uid as string in CRI, and let kuberuntime handle the string directly. This PR is using option 1. @yujuhong @feiskyer /cc @kubernetes/sig-node /cc @MrHohn

MrHohn · 2016-11-10T00:07:54Z

Fixes #27781.

@piosz

Automatic merge from submit-queue Migrates addons from RCs to Deployments Fixes #33698. Below addons are being migrated: - kube-dns - GLBC default backend - Dashboard UI - Kibana For the new deployments, the version suffixes are removed from their names. Version related labels are also removed because they are confusing and not needed any more with regard to how Deployment and the new Addon Manager works. The `replica` field in `kube-dns` Deployment manifest is removed for the incoming DNS horizontal autoscaling feature #33239. The `replica` field in `Dashboard` Deployment manifest is also removed because the rescheduler e2e test is manually scaling it. Some resource limit related fields in `heapster-controller.yaml` are removed, as they will be set up by the `addon resizer` containers. Detailed reasons in #34513. Three e2e tests are modified: - `rescheduler.go`: Changed to resize Dashboard UI Deployment instead of ReplicationController. - `addon_update.go`: Some namespace related changes in order to make it compatible with the new Addon Manager. - `dns_autoscaling.go`: Changed to examine kube-dns Deployment instead of ReplicationController. Both of above two tests passed on my own cluster. The upgrade process --- from old Addons with RCs to new Addons with Deployments --- was also tested and worked as expected. The last commit upgrades Addon Manager to v6.0. It is still a work in process and currently waiting for #35220 to be finished. (The Addon Manager image in used comes from a non-official registry but it mostly works except some corner cases.) @piosz @gmarek could you please review the heapster part and the rescheduler test? @mikedanese @thockin cc @kubernetes/sig-cluster-lifecycle --- Notes: - Kube-dns manifest still uses *-rc.yaml for the new Deployment. The stale file names are preserved here for receiving faster review. May send out PR to re-organize kube-dns's file names after this. - Heapster Deployment's name remains in the old fashion(with `-v1.2.0` suffix) for avoiding describe this upgrade transition explicitly. In this way we don't need to attach fake apply labels to the old Deployments.

googlebot added the cla: yes label Sep 22, 2016

k8s-github-robot assigned roberthbailey Sep 22, 2016

k8s-github-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. release-note-label-needed labels Sep 22, 2016

MrHohn mentioned this pull request Sep 22, 2016

Rescheduler e2e should not scale kube-dns pods. #33289

Closed

MrHohn force-pushed the dns-autoscaler branch from cc4f1b8 to 8b819f1 Compare September 23, 2016 03:09

k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 25, 2016

roberthbailey assigned thockin Oct 19, 2016

thockin assigned bowei Oct 19, 2016

MrHohn force-pushed the dns-autoscaler branch from 8b819f1 to c792303 Compare October 27, 2016 19:26

MrHohn changed the title ~~[wip] Deploy kube-dns with cluster-proportional-autoscaler~~ Deploy kube-dns with cluster-proportional-autoscaler Oct 27, 2016

k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 27, 2016

MrHohn force-pushed the dns-autoscaler branch 2 times, most recently from 13ffc58 to d0a4e03 Compare October 27, 2016 21:23

MrHohn commented Oct 28, 2016

View reviewed changes

MrHohn force-pushed the dns-autoscaler branch 2 times, most recently from 5523410 to 4c859c0 Compare October 28, 2016 17:57

MrHohn force-pushed the dns-autoscaler branch from b7eeebf to 452e6d8 Compare November 7, 2016 19:29

bprashanth added lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note-none Denotes a PR that doesn't merit a release note. and removed release-note-label-needed labels Nov 7, 2016

bprashanth added this to the v1.5 milestone Nov 7, 2016

bprashanth added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Nov 8, 2016

bprashanth added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Nov 8, 2016

k8s-github-robot merged commit a0c34ee into kubernetes:master Nov 8, 2016

Random-Liu mentioned this pull request Nov 8, 2016

CRI: Support string user name. #36423

Merged

MrHohn mentioned this pull request Nov 28, 2016

Enable DNS horizontal autoscaling feature for more Cloud Providers #37567

Closed

chentao1596 mentioned this pull request Dec 5, 2016

WIP:kubelet: support multi-headers when getting pod from HTTP source #38089

Closed

mumoshu mentioned this pull request Dec 16, 2016

Feature: DNS horizontal autoscaling kubernetes-retired/kube-aws#172

Closed

MrHohn deleted the dns-autoscaler branch May 16, 2017 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy kube-dns with cluster-proportional-autoscaler #33239

Deploy kube-dns with cluster-proportional-autoscaler #33239

MrHohn commented Sep 22, 2016 •

edited

Loading

MrHohn commented Sep 22, 2016

MrHohn commented Sep 22, 2016

MrHohn commented Sep 22, 2016

k8s-github-robot commented Sep 22, 2016

roberthbailey commented Sep 23, 2016

MrHohn commented Sep 23, 2016

MrHohn commented Sep 30, 2016

MrHohn commented Sep 30, 2016 •

edited

Loading

thockin commented Oct 4, 2016

thockin commented Oct 4, 2016

MrHohn commented Oct 4, 2016

MrHohn commented Oct 27, 2016 •

edited

Loading

MrHohn Oct 28, 2016

MrHohn commented Oct 28, 2016

MrHohn commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

MrHohn commented Nov 7, 2016

bprashanth commented Nov 7, 2016

bprashanth commented Nov 8, 2016

k8s-github-robot commented Nov 8, 2016

k8s-github-robot commented Nov 8, 2016

MrHohn commented Nov 10, 2016

Deploy kube-dns with cluster-proportional-autoscaler #33239

Deploy kube-dns with cluster-proportional-autoscaler #33239

Conversation

MrHohn commented Sep 22, 2016 • edited Loading

MrHohn commented Sep 22, 2016

MrHohn commented Sep 22, 2016

MrHohn commented Sep 22, 2016

k8s-github-robot commented Sep 22, 2016

roberthbailey commented Sep 23, 2016

MrHohn commented Sep 23, 2016

MrHohn commented Sep 30, 2016

MrHohn commented Sep 30, 2016 • edited Loading

thockin commented Oct 4, 2016

thockin commented Oct 4, 2016

MrHohn commented Oct 4, 2016

MrHohn commented Oct 27, 2016 • edited Loading

MrHohn Oct 28, 2016

Choose a reason for hiding this comment

MrHohn commented Oct 28, 2016

MrHohn commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

k8s-ci-robot commented Nov 7, 2016

MrHohn commented Nov 7, 2016

bprashanth commented Nov 7, 2016

bprashanth commented Nov 8, 2016

k8s-github-robot commented Nov 8, 2016

k8s-github-robot commented Nov 8, 2016

MrHohn commented Nov 10, 2016

MrHohn commented Sep 22, 2016 •

edited

Loading

MrHohn commented Sep 30, 2016 •

edited

Loading

MrHohn commented Oct 27, 2016 •

edited

Loading