-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem Deploying Autoscaler with v1.5.1 #1796
Comments
@ese Do I have the config correct? |
@pluttrell Is the group name correct? I believe it has to be the name of your node ASG, which likely is not |
@yissachar Thanks for the suggestion. I deleted the old deployment and recreated it, but this time using the name of my Nodes ASG, as follows:
But still see the same problem:
|
@pluttrell At a quick review it seems a problem with your dns |
@ese I'm using Route53. I also have deployed the Dashboard addon, which is fully accessable at: https://api.${NAME}/ui. As is all of the |
Was that a typo @pluttrell - |
@dmcnaught Thanks for pointing out my typo in the comment above, which I used instead of redacting it. I just corrected it. What I tried earlier had the full ASG name, which I cut&pasted, so I'm sure it was correct. |
@pluttrell I can't reproduce it. Works fine for me with 1.5.1 kops release |
After upgrading to 1.5.1, I worked on reproducing the problem and found that using the exact same steps to create 12 clusters, only 2 of them experienced this problem. The other 10 worked fine. |
The problem might also come and go. Or not be triggered until there's a scaling event. In a cluster that had previously not reported any errors, I intentional deployed an exorbitant number of replicas to trigger a scaling event, but it failed while trying to scale up.
|
Was this ever resolved? I have exactly the same problem. Oddly I get the same error if I put a nonsense name for the ASG and/or remove the IAM autoscaling permissions. Could someone confirm what error is displayed if these are incorrect (even though I'm confident they are correct)? kubernetes v1.5.3 server and client Also, I can confirm the |
Finally figured it out. Following the template at https://github.com/kubernetes/contrib/blob/master/cluster-autoscaler/cloudprovider/aws/README.md#1-asg-setup-min-1-max-10-asg-name-k8s-worker-asg-1 works, but the template at https://github.com/kubernetes/kops/tree/master/addons/cluster-autoscaler doesn't. Correcting for indentation, the diff is (working version is '<')
Which of those is the crucial difference I don't know. |
I get this error in my autoscaler log
what does error mean ? It's trying to connect to some unknown host |
Getting a similar error, with kops 1.7.0, kubernetes 1.7.5, cluster-autoscaler 0.6.1, but only when trying to scale from 0 nodes. According to this, as of CA 0.6.1 I should be able to scale to/from 0. I'm getting errors like this:
Using a deployment similar to this one, and it works as long as there is at least 1 node up:
|
Figured this out, it was the fact that a kube-dns pod was not running on the master node. To run it, had to add the master toleration to the kube-dns deployment (same as with cluster-autoscaler deployment above). Once kube-dns was running on the master, autoscaler was able to use it to get ASG info from AWS and scale up from 0 nodes. |
@andrewsykim we need kube-dns on the master? See above comment. |
Makes more sense to set |
^ @7chenko any chance you can open a PR for this? |
I'm seeing the same thing. What's the advice, kube-dns on master or dnsPolicy? How do I accomplish either? |
@kyleu we need the manifest for autoscaler to be changed. It should not be utilizing cluster dns. Also kube-dns should not probably live on the master. |
Mine turned out to be specifying the AZ (us-east-1a), and not the region (us-east-1). The URL showed my error, but I overlooked it. |
I faced the same issue and changed dnsPolicy from |
I run kube-dns on the master because I also find that when it runs on the nodes it prevents scale-down (along with kube-dns-autoscaler pod). What's the right way to avoid that? I do have --skip-nodes-with-system-pods=false. |
For 1.7.x clusters, should we set |
From the start, cluster-autoscaler should run with |
Automatic merge from submit-queue. . cluster-autoscaler should use dnsPolicy Default Fixes: #1796
For anyone reading this issue take this into account, it also happened to me. |
Using Kops v1.5.0-beta2, if I deploy the Cluster Autoscaler as described here on AWS, it appears to fail. Here's exactly what I ran:
Here is the log from the pod itself:
The text was updated successfully, but these errors were encountered: