-
Notifications
You must be signed in to change notification settings - Fork 38.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rollback subresource; add rollback logic to deployment controller #19686
Conversation
Labelling this PR as size/L |
GCE e2e test build/test passed for commit 8288159eaa5d364b9234090a1aa8a61c5c9d4f76. |
@@ -226,6 +226,14 @@ type DeploymentSpec struct { | |||
// Value of this key is hash of DeploymentSpec.PodTemplateSpec. | |||
// No label is added if this is set to empty string. | |||
UniqueLabelKey string `json:"uniqueLabelKey,omitempty"` | |||
|
|||
// The config this deployment is rolling back to. Will be cleared after rollback is done. | |||
Rollback *RollbackConfig `json:"rollback,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this should be a part of the deployment. It feels more like a separate subresource.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about change this field to RollbackVersion
type DeploymentSpec struct {
...
// The version this deployment is rolling back to. Will be cleared after rollback is done.
RollbackVersion int
}
type RollbackConfig struct {
// The version to rollback to. Default to 0, which means rollback is done or not required.
Version int `json:"version,omitempty"`
}
Then we copy RollbackConfig.Version
to DeploymentSpec.RollbackVersion
when rolling back?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently Rollback.Version (int
) means the version to rollback to. 1
means roll back to version 1, 0
means it's unset and no-op. How do we specify that we want to roll back to the previous version without knowing the version number? How about -1
means roll back to the latest previous version and we don't accept other negative value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like @Kargakis's proposal to make this a separate subresource.
Look at how pod/binding is implemented.
POST (Create) to /rollback should update the Deployment spec to match the specified revision. Think of it as a convenience mechanism for looking up the old revision details and updating Deployment to match.
(Note: I prefer the term "revision" vs. "version" because the latter is so overloaded in the system, and each RC is the result of a revision to the Deployment spec.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the reason that Binding contains ObjectMeta is so that annotations can be applied/updated. We should confirm that only the annotations field is used, but we should create a separate type for that purpose and file an issue to change Binding to use it also.
In this case, I think we may want to update the change-source annotation (and related annotations) on the Deployment. We could potentially automatically change it to mention the rollback, but I think it would be best to let kubectl decide the content of all the annotations it applies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline.
The reason this is here is because the actual rollback of the pod template is executed in the Deployment controller, not in the registry. That division of responsibility was chosen due to the complexity and cost of this operation. The controller already reads all the RCs and can cache which RCs correspond to which Deployment.
This is sort of an internal API between the apiserver and Deployment controller, so we discussed whether it should be conveyed via an annotation instead, but is also useful for the client, as a way to determine whether the rollback has been acted upon asynchronously yet or not. We will also want to generate an event once the rollback executes, conveying success or failure. Status alone could be quickly overwritten by another mutation.
As for single field vs nested struct: We don't expect users or even clients to specify this manually. The nested struct is useful as future-proofing IMO, in case we decide to support rollback using some key other than revision, such as podTemplateHash. The controller should nil the whole rollback
field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
GCE e2e build/test failed for commit d80f01130e05ef202ab10ffdfe3484e64704c9e2. |
Labelling this PR as size/XL |
GCE e2e build/test failed for commit ef58ac534dd489d0e894843fa60d9fb794c79f98. |
} | ||
glog.Infof("Found rc with version %d: %+v", v, rc) | ||
if v == toVersion { | ||
glog.Infof("Found a match!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems spammy, make it a verbose log?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
GCE e2e build/test failed for commit 269cc2e0ca44de48b3aff1e0a0130b36af5370a3. |
@k8s-bot test this |
GCE e2e test build/test passed for commit 269cc2e0ca44de48b3aff1e0a0130b36af5370a3. |
Please ignore the first commit (from #19581) when reviewing. |
Thank you for working on this! One comment I have is that the current deploy proposal (and this implementation) requires that the old deployment still be around to be able to do a rollback, which to me severely cripples the usefulness of the command. In my experience, when you roll something out, it often takes some time for metrics/monitoring/other services to react before you realize you need to roll back. As such, it's quite possible that by the time you submit the rollback command, the last of the rollout has completed, therefore making the command ineffectual and arguably fairly unpredictable as to whether it has any effect or not. I believe this design is optimizing for the case where you know you need to rollback almost immediately, which in my experience happens only sometimes. It's just as often that not until the rollout is part way through or even just completed that you know that you need to roll back. As such, instead of just relying on a number, can we potentially store the entire previous config? This would make rolling back much more robust and useful and would always be available. You can still store and refer to the hash, but the full previous spec should be stored in the object. |
GCE e2e test build/test passed for commit df2deb5b91d7e1ff969d1613e8a65eaf85724a6b. |
@ghodss with the current design/implementation, you can still rollback even if the rollout is complete. When we do deployment rolling update, we only scale old RCs down but not delete them. Therefore, when the users want to roll back, we look at all the RCs of this deployment to determine the version to rollback to. As long as the RCs aren't deleted, even though they're scaled to 0 replicas, we can roll back to any version by copying the podTemplate back to the deployment. I do agree that saving the previous deployment's podTemplate makes this more robust. |
GCE e2e test build/test passed for commit 9f401b6c149e858073c1a822568ef5b239f779ad. |
if *toRevision == 0 { | ||
if *toRevision = lastRevision(allRCs); *toRevision == 0 { | ||
// If we still can't find the last revision, gives up rollback | ||
dc.emitRollbackWarningEvent(deployment, "DeploymentRollbackRevisionNotFound", "Unable to find last revision. Gives up rollback.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't notice this until I reviewed #19835:
"Gives up rollback"doesn't use the same verb form as "Unable to find the last revision". Same issue below.
Do we need additional text there at all? It seems obvious (to me, anyway) that if there is no previous revision, the rollback can't happen.
We could add something like "Rollback failed", "Rollback aborted", or somesuch, but I feel like they'd add more confusion rather than reduce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bgrant0607 I updated rollback event messages. PTAL.
PR needs rebase |
GCE e2e build/test failed for commit f78358a36955470dd6f8105d3fd2129daa05447a. |
Better, thanks. LGTM. All the tests (even the Travis checks) failed after your rebase. |
PR needs rebase |
GCE e2e test build/test passed for commit 3396db9. |
@k8s-bot test this [submit-queue is verifying that this PR is safe to merge] |
GCE e2e test build/test passed for commit 3396db9. |
Automatic merge from submit-queue |
Auto commit by PR queue bot
@bgrant0607 @janetkuo Can someone please confirm that all Deployment enhancements meant for 1.2 are merged in kubernetes:master and stable enough ? I would like to start playing with them... |
@krmayankk There are still a number of bug fixes and incompatible API changes we're going to make over the next week or so. If you need it to be stable, I'd wait. |
Addresses #17168; depends on #19581
See proposal.
API:
cc @bgrant0607 @nikhiljindal @ironcladlou @Kargakis @kubernetes/sig-config