-
Notifications
You must be signed in to change notification settings - Fork 39.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rolling-update rollback from an unavailable rc #18583
Conversation
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
Labelling this PR as size/M |
Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist") If this message is too spammy, please complain to ixdy. |
6ade8a5
to
ffef160
Compare
@k8s-bot ok to test |
decrement := (oldAvailable + newAvailable) - minAvailable | ||
// Scale down as much as possible while maintaining minimum availability | ||
unavailableOldReplicas := oldRc.Spec.Replicas - oldAvailable | ||
decrement := unavailableOldReplicas + (oldAvailable + newAvailable) - minAvailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this just be oldRc.Spec.Replicas + newAvailable - minAvailable
? The math is the same.
Thanks for the pr @jsravn. cc @ironcladlou |
GCE e2e test build/test passed for commit ffef16033a83e187dd558d92ac6303b2f75982e8. |
The author of this PR is not in the whitelist for merge, can one of the admins add the 'ok-to-merge' label? |
ffef160
to
4a6eb44
Compare
@jlowdermilk I made the recommended change. |
GCE e2e test build/test passed for commit 4a6eb4440318de05f69880f6bacd2732985cda18. |
@@ -692,6 +693,45 @@ func TestUpdate_progressTimeout(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestUpdate_canRollbackFromAFailedRcToAReadyRc(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you unable to express this as a scenario in TestUpdate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look. I assumed it can't be because the test above it, TestUpdate_progressTimeout
, is also a timeout scenario and not part of TestUpdate
.
@ironcladlou Made the change, thanks. |
Labelling this PR as size/S |
GCE e2e test build/test passed for commit dcdb8f91f447343bf427a9550b8553b50a5f858e. |
Awesome, thank you! |
This lgtm |
@jsravn, please squash commits. |
Rolling back from a broken update with only one replica fails with a timeout in the existing code. The problem is the scale down logic does not consider unavailable replicas in the old replication controller when calculating how much to scale down by. This leads to an obvious problem with a single replica when min unavailable is 1. The fix is to allow scaling down all unavailable replicas in the old controller, while still maintaining the min unavailable invariant.
dcdb8f9
to
e5a558c
Compare
GCE e2e test build/test passed for commit e5a558c. |
@jlowdermilk Squashed them. |
Thanks, lgtm. |
@k8s-bot test this Tests are more than 48 hours old. Re-running tests. |
GCE e2e test build/test passed for commit e5a558c. |
Fix rolling-update rollback from an unavailable rc
Thanks for fixing this @jsravn! |
Will do. |
…-upstream-release-1.1 Auto commit by PR queue bot
…-of-#18583-upstream-release-1.1 Auto commit by PR queue bot
…-of-#18583-upstream-release-1.1 Auto commit by PR queue bot
Rolling back from a broken update with only one replica fails with a
timeout in the existing code.
The problem is the scale down logic does not consider unavailable
replicas in the old replication controller when calculating how much to
scale down by. This leads to an obvious problem with a single replica
when min unavailable is 1.
The fix is to allow scaling down all unavailable replicas in the old
controller, while still maintaining the min unavailable invariant.
Fix #18530