Skip to content
This repository has been archived by the owner on Sep 30, 2020. It is now read-only.

waitSignal cluster.yaml section is not - respected in the template #371

Closed
javapapo opened this issue Mar 1, 2017 · 12 comments · Fixed by #386
Closed

waitSignal cluster.yaml section is not - respected in the template #371

javapapo opened this issue Mar 1, 2017 · 12 comments · Fixed by #386
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@javapapo
Copy link

javapapo commented Mar 1, 2017

Assuming that you have the following section configured on your cluster.yaml

# When enabled, autoscaling groups managing controller nodes wait for nodes to be up and running.
# It is enabled by default.
waitSignal:
  enabled: false
  maxBatchSize: 1

Unfortunately on the generated cloudformation (after the stack is rendered) - or even in the intermediate template. We always have the section generated!

For example looking at the sections of node-pool.json.tmpl

{{if .WaitSignal.Enabled}}
          "WaitOnResourceSignals" : "true",
          "MaxBatchSize" : "{{.WaitSignal.MaxBatchSize}}",
          "PauseTime": "{{.CreateTimeout}}"
          {{else}}

it seems that this part is always rendered on the generate stack.json.

Since this, part is causing us problems on our AWS account/setup, we need to manually remove the above part from the node-pool.json.tmpl so that is not rendered at all.

Thanks for your time anyway & and many many thanks for your great tool!

gianrubio added a commit to gianrubio/kube-aws that referenced this issue Mar 1, 2017
@Camsteack
Copy link

Hi @gianrubio

This is not really what we need. The issue is that the Waitsignal is not working and is causing the stack to fail as it does not get a valid response from all the worker node, even if they are properly created.
We want to be able to disable it using

waitSignal:
  enabled: false

But it is not working so we have to manually remove it from the stack template.
Thanks a lot.

@gianrubio
Copy link
Contributor

@Camsteack I accidentally pushed the code, I haven't finished yet. Sorry

@mumoshu
Copy link
Contributor

mumoshu commented Mar 1, 2017 via email

@Camsteack
Copy link

Camsteack commented Mar 1, 2017

We will try that. Thanks a lot @mumoshu and @gianrubio

@mumoshu
Copy link
Contributor

mumoshu commented Mar 6, 2017

@javapapo @Camsteack Did it work for you? Basically, you have to differentiate the top level experimental.waitSignal and the node-pool specific worker.nodePools[].waitSignal according to their comments inside cluster.yaml.

@mumoshu mumoshu added the triage/support Indicates an issue that is a support question. label Mar 6, 2017
@javapapo
Copy link
Author

javapapo commented Mar 6, 2017

No, unfortunately

mumoshu added a commit to mumoshu/kube-aws that referenced this issue Mar 7, 2017
@mumoshu mumoshu added kind/bug Categorizes issue or PR as related to a bug. and removed triage/support Indicates an issue that is a support question. labels Mar 7, 2017
@mumoshu
Copy link
Contributor

mumoshu commented Mar 7, 2017

Thanks for the confirmation @javapapo!
Now, adding more tests revealed a bug which had been making waitSignal unable to be disabled.
The bug is fixed in #371

@mumoshu
Copy link
Contributor

mumoshu commented Mar 7, 2017

@javapapo The fix is included in v0.9.5-rc.2.
Sorry for taking your time so much but could you please confirm if it works for you?

@javapapo
Copy link
Author

javapapo commented Mar 7, 2017

Thank you for your hard work and the replies! Will try it asap

@javapapo
Copy link
Author

javapapo commented Mar 7, 2017

So on the root level of my cluster yaml I have

waitSignal:
  enabled: false
  maxBatchSize: 1

On the generated

"Resources": {
    "Controllers": {
      "Type": "AWS::AutoScaling::AutoScalingGroup",

I see

"UpdatePolicy": {
        "AutoScalingRollingUpdate": {
          "MinInstancesInService": "2",
          "MaxBatchSize": "1",
          "WaitOnResourceSignals": "true",
          "PauseTime": "PT15M"
        }

Or Should I just place it on ?

controller:
     autoScalingGroup:
    waitSignal:
      enabled: false
     maxBatchSize: 1

@javapapo
Copy link
Author

javapapo commented Mar 7, 2017

Apart from the above, the cluster creation was successful, so I think the issue is resolved

@mumoshu
Copy link
Contributor

mumoshu commented Mar 9, 2017

@javapapo

So on the root level of my cluster yaml I have

waitSignal:
  enabled: false
  maxBatchSize: 1

Yes, this is correct.
And sorry but I've just noticed that I've mistakenly missed the commit to fix this in the v0.9.5-rc.2 release!
v0.9.5-rc.3 released today does include the fix

camilb added a commit to camilb/kube-aws that referenced this issue Apr 5, 2017
* kubernetes-incubator/master: (29 commits)
  Emit errors when kube-aws sees unexpected keys in cluster.yaml Resolves kubernetes-retired#404
  Tag controller nodes appropriately with `kubernetes.io/role`. Resolves kubernetes-retired#370
  Make Container Linux AMI fetching a bit more reliable
  Stop locksmithd errors on etcd nodes
  Upgrade heapster to version 1.3.0 (kubernetes-retired#420)
  Auth token file support (kubernetes-retired#418)
  Update README.md
  Update README accordingly to the new git repo
  AWS China region support (kubernetes-retired#390)
  Conform as a Kubernetes Incubator Project
  Fixed typo in template
  upgrade aws-sdk to latest version Fix kubernetes-retired#388
  Upgrade Kubernetes version to v1.5.4
  Fix assumed public hostnames for EC2 instances in us-east-1
  Fix assumed public hostnames for EC2 instances in us-east-1
  typo
  fix: etcdDataVolumeEncrypted not creating encrypted volumes fixes kubernetes-retired#383
  Allow disabling wait signals fixes kubernetes-retired#371
  Update file paths in readme
  Fix an issue with glue security group documentation
  ...
redbaron pushed a commit to HotelsDotCom/kube-aws that referenced this issue Apr 6, 2017
* commit '09366deebc35f602e6d87ea69d7cb5e56d113a5f':
  fix: etcdDataVolumeEncrypted not creating encrypted volumes fixes kubernetes-retired#383
  Allow disabling wait signals fixes kubernetes-retired#371
  Update file paths in readme
  Fix an issue with glue security group documentation
  Update kubernetes-on-aws-prerequisites.md
  Add apiserver-count parameter in kube-apiserver config
kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this issue Mar 27, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants