Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add rollback subresource; add rollback logic to deployment controller #19686

Merged
merged 1 commit into from
Jan 31, 2016

Conversation

janetkuo
Copy link
Member

Addresses #17168; depends on #19581
See proposal.

API:

type DeploymentSpec struct {
...
    // The config this deployment is rolling back to. Will be cleared after rollback is done.
    RollbackTo *RollbackConfig `json:"rollbackTo,omitempty"`
}

// DeploymentRollback stores the information required to rollback a deployment.
type DeploymentRollback struct {
    unversioned.TypeMeta `json:",inline"`
    // Required: This must match the Name of a deployment.
    Name string `json:"name"`
    // The annotations to be updated to a deployment
    UpdatedAnnotations map[string]string `json:"updatedAnnotations,omitempty"`
    // The config of this deployment rollback.
    RollbackTo RollbackConfig `json:"rollbackTo"`
}

type RollbackConfig struct {
    // The revision to rollback to. If set to 0, rollbck to the last revision.
    Revision int64 `json:"revision,omitempty"`
}

cc @bgrant0607 @nikhiljindal @ironcladlou @Kargakis @kubernetes/sig-config

@k8s-github-robot
Copy link

Labelling this PR as size/L

@k8s-github-robot k8s-github-robot added kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 15, 2016
@k8s-bot
Copy link

k8s-bot commented Jan 15, 2016

GCE e2e test build/test passed for commit 8288159eaa5d364b9234090a1aa8a61c5c9d4f76.

@@ -226,6 +226,14 @@ type DeploymentSpec struct {
// Value of this key is hash of DeploymentSpec.PodTemplateSpec.
// No label is added if this is set to empty string.
UniqueLabelKey string `json:"uniqueLabelKey,omitempty"`

// The config this deployment is rolling back to. Will be cleared after rollback is done.
Rollback *RollbackConfig `json:"rollback,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure this should be a part of the deployment. It feels more like a separate subresource.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about change this field to RollbackVersion

type DeploymentSpec struct {
    ...
    // The version this deployment is rolling back to. Will be cleared after rollback is done.
    RollbackVersion int
}

type RollbackConfig struct {
    // The version to rollback to. Default to 0, which means rollback is done or not required.
    Version int `json:"version,omitempty"`
}

Then we copy RollbackConfig.Version to DeploymentSpec.RollbackVersion when rolling back?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently Rollback.Version (int) means the version to rollback to. 1 means roll back to version 1, 0 means it's unset and no-op. How do we specify that we want to roll back to the previous version without knowing the version number? How about -1 means roll back to the latest previous version and we don't accept other negative value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are going to save info about the rollback in the spec and after @ghodss comment I may be convinced of saving the latest successfull* podtemplate as the rollback config.

*It wouldn't make sense to save a template as the rollback template if it was never successfully deployed, right?

cc: @ironcladlou @smarterclayton

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like @Kargakis's proposal to make this a separate subresource.

Look at how pod/binding is implemented.

POST (Create) to /rollback should update the Deployment spec to match the specified revision. Think of it as a convenience mechanism for looking up the old revision details and updating Deployment to match.

(Note: I prefer the term "revision" vs. "version" because the latter is so overloaded in the system, and each RC is the result of a revision to the Deployment spec.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the reason that Binding contains ObjectMeta is so that annotations can be applied/updated. We should confirm that only the annotations field is used, but we should create a separate type for that purpose and file an issue to change Binding to use it also.

In this case, I think we may want to update the change-source annotation (and related annotations) on the Deployment. We could potentially automatically change it to mention the rollback, but I think it would be best to let kubectl decide the content of all the annotations it applies.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed offline.

The reason this is here is because the actual rollback of the pod template is executed in the Deployment controller, not in the registry. That division of responsibility was chosen due to the complexity and cost of this operation. The controller already reads all the RCs and can cache which RCs correspond to which Deployment.

This is sort of an internal API between the apiserver and Deployment controller, so we discussed whether it should be conveyed via an annotation instead, but is also useful for the client, as a way to determine whether the rollback has been acted upon asynchronously yet or not. We will also want to generate an event once the rollback executes, conveying success or failure. Status alone could be quickly overwritten by another mutation.

As for single field vs nested struct: We don't expect users or even clients to specify this manually. The nested struct is useful as future-proofing IMO, in case we decide to support rollback using some key other than revision, such as podTemplateHash. The controller should nil the whole rollback field.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-bot
Copy link

k8s-bot commented Jan 15, 2016

GCE e2e build/test failed for commit d80f01130e05ef202ab10ffdfe3484e64704c9e2.

@janetkuo janetkuo changed the title WIP: Add rollback subresource; add rollback logic to deployment controller Add rollback subresource; add rollback logic to deployment controller Jan 19, 2016
@k8s-github-robot
Copy link

Labelling this PR as size/XL

@k8s-github-robot k8s-github-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 19, 2016
@k8s-bot
Copy link

k8s-bot commented Jan 19, 2016

GCE e2e build/test failed for commit ef58ac534dd489d0e894843fa60d9fb794c79f98.

}
glog.Infof("Found rc with version %d: %+v", v, rc)
if v == toVersion {
glog.Infof("Found a match!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems spammy, make it a verbose log?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@k8s-bot
Copy link

k8s-bot commented Jan 19, 2016

GCE e2e build/test failed for commit 269cc2e0ca44de48b3aff1e0a0130b36af5370a3.

@janetkuo
Copy link
Member Author

@k8s-bot test this

@k8s-bot
Copy link

k8s-bot commented Jan 20, 2016

GCE e2e test build/test passed for commit 269cc2e0ca44de48b3aff1e0a0130b36af5370a3.

@janetkuo
Copy link
Member Author

Please ignore the first commit (from #19581) when reviewing.

@ghodss
Copy link
Contributor

ghodss commented Jan 20, 2016

Thank you for working on this! One comment I have is that the current deploy proposal (and this implementation) requires that the old deployment still be around to be able to do a rollback, which to me severely cripples the usefulness of the command. In my experience, when you roll something out, it often takes some time for metrics/monitoring/other services to react before you realize you need to roll back. As such, it's quite possible that by the time you submit the rollback command, the last of the rollout has completed, therefore making the command ineffectual and arguably fairly unpredictable as to whether it has any effect or not. I believe this design is optimizing for the case where you know you need to rollback almost immediately, which in my experience happens only sometimes. It's just as often that not until the rollout is part way through or even just completed that you know that you need to roll back.

As such, instead of just relying on a number, can we potentially store the entire previous config? This would make rolling back much more robust and useful and would always be available. You can still store and refer to the hash, but the full previous spec should be stored in the object.

@k8s-bot
Copy link

k8s-bot commented Jan 20, 2016

GCE e2e test build/test passed for commit df2deb5b91d7e1ff969d1613e8a65eaf85724a6b.

@janetkuo
Copy link
Member Author

@ghodss with the current design/implementation, you can still rollback even if the rollout is complete. When we do deployment rolling update, we only scale old RCs down but not delete them. Therefore, when the users want to roll back, we look at all the RCs of this deployment to determine the version to rollback to. As long as the RCs aren't deleted, even though they're scaled to 0 replicas, we can roll back to any version by copying the podTemplate back to the deployment.

I do agree that saving the previous deployment's podTemplate makes this more robust.

@janetkuo janetkuo added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 29, 2016
@k8s-bot
Copy link

k8s-bot commented Jan 29, 2016

GCE e2e test build/test passed for commit 9f401b6c149e858073c1a822568ef5b239f779ad.

@janetkuo janetkuo removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jan 29, 2016
@janetkuo
Copy link
Member Author

This depends on #19581 which depends on #20139. Will apply LGTM later.

@janetkuo janetkuo changed the title Add rollback subresource; add rollback logic to deployment controller Add rollback subresource; add rollback logic to deployment controller [LGTM but depends on #19581] Jan 29, 2016
if *toRevision == 0 {
if *toRevision = lastRevision(allRCs); *toRevision == 0 {
// If we still can't find the last revision, gives up rollback
dc.emitRollbackWarningEvent(deployment, "DeploymentRollbackRevisionNotFound", "Unable to find last revision. Gives up rollback.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't notice this until I reviewed #19835:

"Gives up rollback"doesn't use the same verb form as "Unable to find the last revision". Same issue below.

Do we need additional text there at all? It seems obvious (to me, anyway) that if there is no previous revision, the rollback can't happen.

We could add something like "Rollback failed", "Rollback aborted", or somesuch, but I feel like they'd add more confusion rather than reduce it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgrant0607 I updated rollback event messages. PTAL.

@k8s-github-robot
Copy link

PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2016
@janetkuo janetkuo removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2016
@k8s-bot
Copy link

k8s-bot commented Jan 30, 2016

GCE e2e build/test failed for commit f78358a36955470dd6f8105d3fd2129daa05447a.

@bgrant0607
Copy link
Member

Better, thanks. LGTM.

All the tests (even the Travis checks) failed after your rebase.

@k8s-github-robot
Copy link

PR needs rebase

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jan 30, 2016
@janetkuo janetkuo changed the title Add rollback subresource; add rollback logic to deployment controller [LGTM but depends on #19581] Add rollback subresource; add rollback logic to deployment controller Jan 31, 2016
@janetkuo janetkuo added lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Jan 31, 2016
@k8s-bot
Copy link

k8s-bot commented Jan 31, 2016

GCE e2e test build/test passed for commit 3396db9.

@k8s-github-robot
Copy link

@k8s-bot test this [submit-queue is verifying that this PR is safe to merge]

@k8s-bot
Copy link

k8s-bot commented Jan 31, 2016

GCE e2e test build/test passed for commit 3396db9.

@k8s-github-robot
Copy link

Automatic merge from submit-queue

k8s-github-robot pushed a commit that referenced this pull request Jan 31, 2016
Auto commit by PR queue bot
@k8s-github-robot k8s-github-robot merged commit 6a2a0e6 into kubernetes:master Jan 31, 2016
@krmayankk
Copy link

@bgrant0607 @janetkuo Can someone please confirm that all Deployment enhancements meant for 1.2 are merged in kubernetes:master and stable enough ? I would like to start playing with them...

@bgrant0607
Copy link
Member

@krmayankk There are still a number of bug fixes and incompatible API changes we're going to make over the next week or so. If you need it to be stable, I'd wait.

@janetkuo janetkuo added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Mar 11, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/app-lifecycle kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants