-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rolling update panic #11774
Comments
/assign @johngmyers |
The issue is that incrementing So this was introduced by #10740 and is technically a 1.21 regression. |
Aha, thanks for the explanation. So is this a duplicate issue, or should I leave it open? |
Leave it open |
Have you had a chance to look into this, @johngmyers? |
/bump we keep hitting this with kOps 1.21.1 (but our pipelines retry twice, so it's more of a minor inconvenience than anything) |
hitting this too with kops 1.21.1 do you have |
@azman0101, yeah, we're typically using 25% or 60%. |
Never saw this problem without rollingUpdate before |
I am trying to reproduce this, but I can't. I've done a few detachment race conditions, and triggering autoscaling using cluster autoscaler (CAS) and manually in the console while rotating etc, but nothing seems to trigger this issue. Can you provide a bit more info on: |
I was able to trigger the bug while CA disable. I was also able to trigger the bug when using Warmpool and without UpdateStrategy |
What was you desired flag set to? |
I'm wondering if this would be sufficient: diff --git a/pkg/instancegroups/instancegroups.go b/pkg/instancegroups/instancegroups.go
index bf07a12bf4..e47364f48b 100644
--- a/pkg/instancegroups/instancegroups.go
+++ b/pkg/instancegroups/instancegroups.go
@@ -151,6 +151,10 @@ func (c *RollingUpdateCluster) rollingUpdateInstanceGroup(group *cloudinstances.
if maxSurge > 0 && !c.CloudOnly {
skippedNodes := 0
for numSurge := 1; numSurge <= maxSurge; numSurge++ {
+ if skippedNodes > numSurge-1 {
+ break
+ }
+
u := update[len(update)-numSurge+skippedNodes]
if u.Status != cloudinstances.CloudInstanceStatusDetached {
if err := c.detachInstance(u); err != nil { |
Sorry @olemarkus, can't tell you anymore |
This was my line of thinking too. |
Still hitting this on kOps 1.23.2 |
/kind bug
1. What
kops
version are you running? The commandkops version
, will displaythis information.
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops rolling-update cluster --name <my cluster> --yes
5. What happened after the commands executed?
Works for a while then panics.
6. What did you expect to happen?
No panic.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
Not
-v 10
, but this is what I have:9. Anything else do we need to know?
I had 3x APIServer instance groups, but the asset manifests were misconfigured by me, so those three nodes never joined the cluster. I terminated one manually then a bit later got a panic. The same thing happened when later I terminated the other two.
As the stack trace mentions, the problem seems to lie in the maths around here: https://github.com/kubernetes/kops/blob/v1.21.0-beta.3/pkg/instancegroups/instancegroups.go#L153-L154
The text was updated successfully, but these errors were encountered: