You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
1. What kops version are you running? The command kops version, will display
this information.
Version 1.17.0 (git-a17511e6dd)
2. What Kubernetes version are you running? kubectl version will print the
version if a cluster is running or provide the Kubernetes version specified as
a kops flag.
W0617 14:56:00.493511 22627 aws_cloud.go:673] ignoring instance as it is terminating: i-deadbeefdeadbeef0 in autoscaling group: my-instance-group.mycluster.example.com
cluster "mycluster.example.com" did not pass validation: machine "i-deadbeefdeadbeef0" has not yet joined cluster
6. What did you expect to happen?
I0617 14:37:57.477202 15279 instancegroups.go:268] Cluster did not pass validation, will try again in "30s" until duration "15m0s" expires: machine "i-deadbeefdeadbeef0" has not yet joined cluster.
7. Please provide your cluster manifest. Execute kops get --name my.example.com -o yaml to display your cluster manifest.
You may want to remove your cluster name and other sensitive information.
N/A
8. Please run the commands with most verbose logging by adding the -v 10 flag.
Paste the logs into this report, or in a gist and provide the gist link here.
N/A
9. Anything else do we need to know?
We run a large cluster with cluster-autoscaling and mixedInstancesPolicy / spot instances running on a number of our instance groups. Additionally, in cases when we need to roll the entire cluster (i.e. Kubernetes updates), it's common for us to have multiple engineers rolling individual instance groups and monitoring the roll-out. It's therefore pretty common for the cluster to be in a state where some node, somewhere is joining or leaving the cluster. When this happens, the rolling-update command refuses to execute and exits.
While the refusal to actually terminate any nodes while the cluster is unstable is desirable behavior, the immediate exit is not. The rolling-update command is actually inconsistent in this behavior; generally, in particular if a given instance group is in the middle of a rolling update, the command will print that it will try again in "30s" until duration "15m0s" expires. It does so because cluster instability is a natural consequence of terminating instances, and the retry loop provides a smoother experience, instead of exiting after rolling every instance.
This bug report (not really a bug, more a sore point in the UX that is presumably working as designed) asks that rolling-update behave in the same way when it's starting to roll an instance group as it does when it's in the middle of rolling an instance group - to poll validate until the cluster is ready, or times out.
The text was updated successfully, but these errors were encountered:
ari-becker
changed the title
Cluster refuses to rolling-update when the cluster does not pass validation
Cluster refuses to start a rolling-update when the cluster does not pass validation
Jun 17, 2020
1. What
kops
version are you running? The commandkops version
, will displaythis information.
Version 1.17.0 (git-a17511e6dd)
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops rolling-update cluster --instance-group my-instance-group --yes
5. What happened after the commands executed?
6. What did you expect to happen?
I0617 14:37:57.477202 15279 instancegroups.go:268] Cluster did not pass validation, will try again in "30s" until duration "15m0s" expires: machine "i-deadbeefdeadbeef0" has not yet joined cluster.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
N/A
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
N/A
9. Anything else do we need to know?
We run a large cluster with cluster-autoscaling and mixedInstancesPolicy / spot instances running on a number of our instance groups. Additionally, in cases when we need to roll the entire cluster (i.e. Kubernetes updates), it's common for us to have multiple engineers rolling individual instance groups and monitoring the roll-out. It's therefore pretty common for the cluster to be in a state where some node, somewhere is joining or leaving the cluster. When this happens, the
rolling-update
command refuses to execute and exits.While the refusal to actually terminate any nodes while the cluster is unstable is desirable behavior, the immediate exit is not. The
rolling-update
command is actually inconsistent in this behavior; generally, in particular if a given instance group is in the middle of a rolling update, the command will print that it willtry again in "30s" until duration "15m0s" expires
. It does so because cluster instability is a natural consequence of terminating instances, and the retry loop provides a smoother experience, instead of exiting after rolling every instance.This bug report (not really a bug, more a sore point in the UX that is presumably working as designed) asks that
rolling-update
behave in the same way when it's starting to roll an instance group as it does when it's in the middle of rolling an instance group - to pollvalidate
until the cluster is ready, or times out.The text was updated successfully, but these errors were encountered: