Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

cluster autoscaler --skip-nodes-with-system-pods=false ? #1253

Closed
cmcconnell1 opened this issue Apr 24, 2018 · 13 comments
Closed

cluster autoscaler --skip-nodes-with-system-pods=false ? #1253

cmcconnell1 opened this issue Apr 24, 2018 · 13 comments

Comments

@cmcconnell1
Copy link
Contributor

cmcconnell1 commented Apr 24, 2018

Hello All,

Apologies if I'm missing something, but with a recently new cluster deployed with kube-aws v0.9.10-rc.3, I seem to be missing autoscaler functionality. The problem is that this new cluster's AS doesn't honor the min settings in our node pool configurations as specified in the cluster.yaml and will not terminate after we've manually scaled the nodepool up, and then after we've done a kube-aws update with the new lower/min setting.

Looking at the actual AS git code/repo:
ref: Cluster Autoscaler on AWS

By default, cluster autoscaler will not terminate nodes running pods in the kube-system namespace. You can override this default behaviour by passing in the --skip-nodes-with-system-pods=false flag.

autoscaler/cluster-autoscaler/simulator/cluster.go#L42,L43,L44

On that note, I received a private message / response to my post in the kube-aws IRC channel stating

thats known problem, you need to edit user-data files and add option for cluster-autoscaler --skip-nodes-with-system-pods=false

Regarding kube-aws deployed clusters and the above fixes and suggestions, it's not clear how the above should be done.

Thanks

@cmcconnell1
Copy link
Contributor Author

Just checking all our current kube-aws provisioned clusters and I don't have the above suggested workaround/hack/fix config directive in any of them (but note that they are all older versions though). They all do have the following though:
--skip-nodes-with-local-storage=false

Perhaps if we were to hack the userdata file from kube-aws provisioned cluster, it would go in this section below?:

./userdata/cloud-config-controller
LINE#
4052                   command:
4053                     - ./cluster-autoscaler
4054                     - --v=4
4055                     - --stderrthreshold=info
4056                     - --cloud-provider=aws
4057                     - --skip-nodes-with-local-storage=false
4058                     - --expander=least-waste
4059                     - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/{{.ClusterName}}

Where we'd insert
--skip-nodes-with-system-pods=false
to line 4058 (in this case for example) ?

I haven't tried this yet as I need to provision a test cluster first as I cannot disturb the current cluster.
If anyone else has come across this would appreciate any details.

Thanks

cmcconnell1 added a commit to cmcconnell1/kube-aws that referenced this issue Apr 30, 2018
modifies the behavior of the autoscaler to terminate worker and
controller nodes no longer needed when minSize is set manually.

Note that I had to delete the initial autoscaler pod after
deploying a new cluster with this modification in place.
After that initial autoscaler pod was deleted the second
pod functioned without errors and as expected.
I will update the issue comments with more details.

Fixes kubernetes-retired#1253
@cmcconnell1
Copy link
Contributor Author

One significant caveat/concern that I wanted to expand on and provide more details:
After deploying a new cluster with the modification to the userdata file in the PR: the initial autoscaler pod continued to crash--perhaps this was coincidental, but wanted to note what I saw.

I tested this on the most recent version of kube-aws version that we have: kube-aws version v0.9.10-rc.3

after intial cluster deployment inspecting AS pod logs:

I0430 17:44:26.564742       1 auto_scaling.go:138] Failed to describe ASG tags for keys [k8s.io/cluster-autoscaler/enabled kubernetes.io/cluster/opsdev-pr] : RequestError: send request failed
caused by: Post https://autoscaling.us-west-1.amazonaws.com/: dial tcp: i/o timeout
F0430 17:44:26.564788       1 cloud_provider_builder.go:112] Failed to create AWS cloud provider: Failed to get ASGs: RequestError: send request failed
caused by: Post https://autoscaling.us-west-1.amazonaws.com/: dial tcp: i/o timeout
goroutine 65 [running]:
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.stacks(0xc4206db700, 0xc420a245a0, 0xec, 0x106)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:766 +0xa7
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.(*loggingT).output(0x5618fa0, 0xc400000003, 0xc420b368f0, 0x528d12b, 0x19, 0x70, 0x0)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:717 +0x348
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.(*loggingT).printf(0x5618fa0, 0x3, 0x3739198, 0x27, 0xc420b6ed20, 0x1, 0x1)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:655 +0x14f
k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog.Fatalf(0x3739198, 0x27, 0xc420b6ed20, 0x1, 0x1)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/github.com/golang/glog/glog.go:1145 +0x67
k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder.CloudProviderBuilder.Build(0x7ffe6838a85a, 0x3, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/cloudprovider/builder/cloud_provider_builder.go:112 +0x76a
k8s.io/autoscaler/cluster-autoscaler/core.NewAutoscalingContext(0xa, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0x0, 0x4e200, 0x0, 0x186a00000, 0x0, 0x7ffe6838a8db, ...)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaling_context.go:148 +0x466
k8s.io/autoscaler/cluster-autoscaler/core.NewStaticAutoscaler(0xa, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0x0, 0x4e200, 0x0, 0x186a00000, 0x0, 0x7ffe6838a8db, ...)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/static_autoscaler.go:56 +0x14d
k8s.io/autoscaler/cluster-autoscaler/core.(*AutoscalerBuilderImpl).Build(0xc42073f380, 0x412d18, 0xc420b6f850, 0x412d18, 0x1a0)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler_builder.go:71 +0x10e
k8s.io/autoscaler/cluster-autoscaler/core.NewPollingAutoscaler(0x5514020, 0xc42073f380, 0x78, 0x98, 0xc420b7a820)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/polling_autoscaler.go:38 +0x35
k8s.io/autoscaler/cluster-autoscaler/core.NewAutoscaler(0xa, 0x3fe0000000000000, 0x8bb2c97000, 0x1176592e000, 0x0, 0x4e200, 0x0, 0x186a00000, 0x0, 0x7ffe6838a8db, ...)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/core/autoscaler.go:64 +0x5f2
main.run(0xc420ae7c20)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:247 +0x263
main.main.func2(0xc4202a84e0)
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/main.go:345 +0x2a
created by k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection.(*LeaderElector).Run
  /gopath/src/k8s.io/autoscaler/cluster-autoscaler/vendor/k8s.io/client-go/tools/leaderelection/leaderelection.go:145 +0x97

'deleted the initial first autoscaler pod and the second replacement AS pod resolved the errors noted above.

kk delete po cluster-autoscaler-59998c8cbf-jvcx6
pod "cluster-autoscaler-59998c8cbf-jvcx6" deleted

then validated saw the correct updated options with the logs from the newly deployed cluster-autoscaler pod:

kk logs -f cluster-autoscaler-59998c8cbf-f9qpt

I0430 17:50:02.175080       1 flags.go:52] FLAG: --skip-nodes-with-local-storage="false"
I0430 17:50:02.175083       1 flags.go:52] FLAG: --skip-nodes-with-system-pods="false"

and once the pod was deleted and redeployed its logs were clean and the updated cloudconfig file worked as expected allowing scaling up/down.
validated setting the cloudconfig file back to the unmodified value, the autoscaling did not reduce the scaled-up workers and controllers back down.

@mumoshu
Copy link
Contributor

mumoshu commented May 1, 2018

@cmcconnell1 Thanks for the information! This is so helpful.

Have you deployed kiam or kube2iam, or enabled calico, or anything you think suspicious on your cluster? Just an idea, but I suspect CA may have a startup-ordering issue with kiam/kube2iam/calic/etc.

@cmcconnell1
Copy link
Contributor Author

Yes @mumoshu we have kube2Iam enabled in both nodepools and in the global

egrep -C 1  'kube2Iam' ./cluster.yaml
        enabled: true
      kube2IamSupport:
        enabled: true
--
        enabled: true
      kube2IamSupport:
        enabled: true
--
  # This is intended to be used in combination with .controller.managedIamRoleName. See #297 for more information.
  kube2IamSupport:
    enabled: true

Looking at another fresh cluster just deployed to validate--this is immediately after deployment:

kk get po | grep 'cluster-autoscaler'
cluster-autoscaler-59998c8cbf-9hqwq 1/1 Running 3 8m

Saw three restarts of the initial AS pod.
Below is gist with the logs from the third (3rd) AS pod auto deployed with the code from the p/r
kubernetes-cluster-autoscaler-pod-logs-pr-1268

I waited until the sixth (6th) restarted AS pod before killing it and noting its logs look good.
Hope this is helpful.

Thanks!

@mumoshu
Copy link
Contributor

mumoshu commented May 1, 2018

@cmcconnell1 Yes, it's super helpful!

I was reading kubernetes/kops#1796 (comment) - I remember that a custom DNS was the requirement in your env.

If that's still the case, several questions I'd appreciate if answered would be:

  • Which node do you deploy CA? Controller or worker nodes, or maybe both?
  • To which node role did you CA get deployed at first? And which node role, after you killed it?

@mumoshu
Copy link
Contributor

mumoshu commented May 1, 2018

@cmcconnell1 Regardless, would setting dnsPolicy: Default in the cluster-autoscaler deployment inside cloud-config-controller resolve the issue, by any chance?

@cmcconnell1
Copy link
Contributor Author

@mumoshu
I added dnsPolicy: Default to the cluster-autoscaler deployment in the cloud-config-controller and redeployed.

The new cluster's CA did have six (6) restarts, but its logs were clear of the previous noted errors when deployed.

    command:
      - ./cluster-autoscaler
      - --v=4
      - --stderrthreshold=info
      - --cloud-provider=aws
      - --skip-nodes-with-local-storage=false
      - --skip-nodes-with-system-pods=false
      - --expander=least-waste
      - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,kubernetes.io/cluster/{{.ClusterName}}
    env:
      - name: AWS_REGION
        value: {{.Region}}
    imagePullPolicy: "Always"
dnsPolicy: Default

@mumoshu
Copy link
Contributor

mumoshu commented May 2, 2018

@cmcconnell1 Thanks! Seems better than before, right?

Now, perhaps you're seeing errors other than https://autoscaling.us-west-1.amazonaws.com/: dial tcp: i/o timeout, until CA finally stabilizes?

@cmcconnell1
Copy link
Contributor Author

@mumoshu I didn't see any (concerning) errors in the final (sixth) CA pod's logs. It remained stable and did not redeploy any more pods and remained stable after around 20 minutes (time of the sixth CA pod deployment). I was not able to watch the previous CA pod logs as am working on other projects at the same time.

What would be very helpful for me would be some docs and utilities (i.e. kube-aws test/validation harness) for testing new kube-aws/kube clusters that would/could gather all requisite details for deep dive analysis.
We know that it's difficult because we all have very different configurations and concerns.

kube-aws test/validation harness benefits:

  • lt would help the kube-aws community standardize and share best practices.

    • when doing a P/R, the kube-aws community could use standardized configurations and "blessed" test scripts/utilities--currently I'm trying to do some quick initial validations manually, while working on other projects, but can't be assured that I would test/validate everything necessary.
    • i.e.: kube-aws test-harness kube2iam CA spot-fleet add-on-foo123 Where each option after the keyword would be specific areas where the cluster would need to use a baseline minimal config and test?
  • It would help foster new and potential kube-aws users and get them up to speed much faster--i.e. we could have templates for things like multi-az node pools (which took some time for me to figure out initially), spot fleet, possibly add-ons, etc.

  • Would increase participation and collaboration by making it much easier to get involved (reducing time, etc.). I've been very hesitant due to time concerns and not knowing how I can effectively test everything correctly.

On that same note, could you recommend existing recommended test/validation tools such as sonobuoy that we could use to help us validate/test new kube-aws (and kubernetes, etc.) versions?
I've used sonobuoy, but would sonobuoy's test results be sufficient validation for us when doing a P/R ?
Or would we need to run a new cluster through something like sonobuoy and additional tools?

Thanks!

@cmcconnell1
Copy link
Contributor Author

FWIW, I ran Sonobuoy Scan against my p/r test cluster and the results should be available here for awhile

RUN DATE: 2018-05-02
Time tests took to run: 01:01:10
Kubernetes Version: v1.9.3
Total Nodes: 3

The only issue my Sonobuoy Scan detected was a very trivial issue
[k8s.io] KubeletManagedEtcHosts should test kubelet managed /etc/hosts file [Conformance]

I also manually scaled the worker node pools and controller up and back down successfully with our mods noted in the P/R.

Hope this helps.
Thanks

@mumoshu
Copy link
Contributor

mumoshu commented May 3, 2018

On that same note, could you recommend existing recommended test/validation tools such as sonobuoy that we could use to help us validate/test new kube-aws (and kubernetes, etc.) versions?

Yes. I prefer it over our e2e/run script these days. Sonobuoy is basically doing the same thing with a more sophisticated manner than our previous e2e test runner 😉

I've used sonobuoy, but would sonobuoy's test results be sufficient validation for us when doing a P/R ?

Depends on the changes made in a PR, but basically:

  • If your change affects host/pod networking(vpc, security group, subnet, etc) and kube-system pods, running sonobuoy would help finding various edge-cases your k8s cluster won't function properly after the change
  • If your change affects host provisioning(systemd unit, files, keys and certs, etc), creating a brand-new cluster would help
  • If your change affects specific optional functionality of your k8s cluster, like CA, you should manually test it. That's because an optional feature like cluster autoscaling won't be covered by a sonobuoy test. To be clear, you aren't forced to test every failure-cases and edge-cases if it is not a primary purpose of your PR. But it does help me confirming I'm not missing anything in my review 😉

kube-aws test/validation harness

Your idea of the test hardness sounds awesome!
I'd really like to discuss further in another issue.
Maybe we'd want to explain it in the developer guide, or even in the pull request template?

@mumoshu
Copy link
Contributor

mumoshu commented May 3, 2018

@cmcconnell1 Thank you so much for your efforts, anyway! Your PR LGTM now.

mumoshu pushed a commit that referenced this issue May 3, 2018
* autoscaler: update cloud-config-controller

modifies the behavior of the autoscaler to terminate worker and
controller nodes no longer needed when minSize is set manually.

Note that I had to delete the initial autoscaler pod after
deploying a new cluster with this modification in place.
After that initial autoscaler pod was deleted the second
pod functioned without errors and as expected.
I will update the issue comments with more details.

Fixes #1253

* update CA in cloud-config-controller with dnsPolicy: Default
@prageethw
Copy link

prageethw commented Jan 12, 2019

if you using cluster autoscaler helm chart set values as below to get it to work

    --set extraArgs.skip-nodes-with-system-pods=0 
    --set extraArgs.skip-nodes-with-local-storage=0 

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants