-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Federated Deployment Design Doc #325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,127 @@ | ||
| # Federated Deployment | ||
| # Design Document | ||
|
|
||
| Authors: [Marcin Wielgus](mailto:mwielgus@google.com), [Nikhil Jindal](mailto:nikhiljindal@google.com ) | ||
|
|
||
| ## Introduction | ||
|
|
||
| The purpose of this document is to provide a detailed design of how Kubernetes Deployments should be handled in Federation, with the special emphasis on rolling update support in deployment updates. | ||
|
|
||
| The internal design is based on [Federated ReplicaSet](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/federated-replicasets.md) | ||
|
|
||
| ## Background | ||
|
|
||
| With Federation we are trying to provide Kubernetes objects and functionality spanning over multiple clusters. One of the key objects in Kubernetes is Deployment. Deployment allows users to maintain a defined number of pod replicas and conveniently update them if needed. The update functionality is the main thing that differentiate them from replica sets. ReplicaSets update everything at once while Deployments allow a slow, rolling update and rollback in case the update is unsuccessful. | ||
|
|
||
| ## Requirements | ||
|
|
||
| Federated deployments must provide the similar functionality as regular deployments, in particular : | ||
| + [R1] Work well with kubectl rollout | ||
| + History - provide all of the revisions of federated deployment. | ||
| + Pause/Resume - provide an ability to pause and resume an ongoing update | ||
| + Status - provide rollout status information | ||
| + Undo - provide go back to the previous version | ||
| + Require no/almost-no fixes in kubectl | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I guess federated resources have their own aliases in the command line, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. same kubectl commads (like |
||
| + [R2] Allow rolling update | ||
| + One cluster at a time (preferred) | ||
| + All clusters at a time (additional) | ||
| + [R3] Be similar to what is currently implemented in regular Kubernetes. | ||
| + [R4] Allow similar scheduling extra features as Federated ReplicaSet | ||
| + Specifying min/max/weight for each of the clusters | ||
| + Be able to rebalance replicas once a cluster is added, lost or if there is no capacity. | ||
| + [R5] Each of the underlying clusters should be able to work independently if the federation control plane is down, | ||
| or there are some cross-cluster network issues. The clusters, when federation controll plane is down, shoule be as easily updatable/managable as if federation were present. For high availability, federation controll plane should be considered an "addon" that makes some extra features possible, not a component that brings or replaces the core K8S functionality. | ||
| This basically means that Federated Deployment should create fully-functional deployments in the clusters. | ||
|
|
||
| ## Design | ||
|
|
||
| Regular Deployments don’t maintain the replicas by themselves. They let ReplicaSets keep track of them. Each of the deployments create a separate ReplicaSet for each of the pod template versions they have. To do a rolling upgrade the deployment slowly adds replicas to the replica set with the new pod template and takes them from the old one. The names of ReplicaSets are in form of deployment_name + “-” + podspec_hash. So it is relatively easy to tell which replicaset is responsible for pods with the given spec (if you are a machine of course). | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As commented elsewhere, podtemplate hash is not stable across Kubernetes versions. We will need to change the naming scheme for ReplicaSets. The simplest solution is deployment.name + deployment.templategeneration. Or hash that and append to deployment.name but the new hash should differ in size from the old one so we can avoid conflicts with old replica sets that we cannot migrate. |
||
|
|
||
| <img src="federated-deployment-1.png" width="400"/> | ||
|
|
||
| When adding the federation layer over the deployment the things get more complicated. Federated deployment needs to control the regular deployment. That’s understandable and in line with other controllers. Now what about ReplicaSets? Requirements [R1] [R2] [R3] imply that there should be ReplicaSets at the federation level. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't necessarily that the design of a federated deployment should assume it understands the implementations of a local deployment. We have talked significantly about having custom deployment strategies. That's information that the federator can't and shouldn't have. I would have started from the opposite position - that federated deployments should be modifying deployments in a cluster and watching them, vs doing anything that assumes it understands how the deployment works in that cluster. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My impression was that a federated deployment was going to monitor deployments in the cluster and aggregate statistics about those deployments in the shadow replicasets. I don't think the intent is for the shadow replicasets to be more than a convenient place to store data, and that their use is just confusing things. Maybe another object or 3rd party resource would be a better match for this use case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @smarterclayton I agree with you and thats how we started. That is what we had in our initial design proposal (which is what is implemented now). In that design deployment just delegates to underlying deployments. The problem with that is that rollback breaks if a cluster was added after the version we want to rollback to. For ex: a cluster was added at version 5 and we now want to revert to version 3. The new cluster has no idea of what the template was at revision 3. So we need a way to keep track of templates at all versions in federation. @marun You are right. The federated replicaset is just to store data. The reason we chose to store it as a replicaset instead of any other resource is because |
||
|
|
||
| <img src="federated-deployment-2.png" width="500"/> | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot for the great images! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ack :) |
||
|
|
||
| However Federated ReplicaSets should not control the ReplicaSets in the underlying clusters because they are already controlled by the local Deployments and having two controllers would lead to race condition issues. Two controllers independently trying to bring the current state to the same desired state will eventually do it, but on the way multiple back and forth steps can be executed. The user looking at this would be really confused and would get an impression that something is really broken there. | ||
|
|
||
| On the other hand FederatedReplicaSets should somehow reflect the status of the underlying cluster - like number of total replicas that are there so that kubectl rollout has proper data to work on. | ||
|
|
||
| <img src="federated-deployment-3.png" width="500"/> | ||
|
|
||
| Federated nginx-701339712 gets all of the statistics as well as spec elements (like the total number of replicas) from ReplicaSets in the underlying clusters. Obviously none of the spec changes should be pushed back. With this change the whole thing would look like as a well-working “whole” while in fact Federated ReplicaSets would be “shadows” (or puppets) - their original functionality (creating rs in clusters and controlling the balancing) would be turned off. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about deployment controller updating There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Honestly, I don't understand the benefit of having federated controller updating federated replica set status based on local replica sets. |
||
|
|
||
| ## Flow | ||
|
|
||
| As the above description may be a bit confusing let's describe what happens exactly when a user creates federated deployment: | ||
|
|
||
| 1. A user creates a Federated Deployment. | ||
| 2. Federated Deployment controller is notified about the new Federated Deployment. | ||
| 3. Federated Deployment controller creates Shadow Federated ReplicaSet. | ||
| 4. Federated Deployment controller creates Local Deployments in underlying clusters | ||
| 5. Local Deployments create Local ReplicaSets (in underlying clusters). | ||
| 6. Replicas are created by Local ReplicaSets. | ||
| 7. The status of Local ReplicaSets is updated. | ||
| 8. Federated ReplicaSet controller learn about changes in the underlying clusters. | ||
| 9. Shadow ReplicaSet status is updated based on Local ReplicaSet statuses. | ||
|
|
||
| On update the situation looks similar: | ||
|
|
||
| 1. A user updates a Federated Deployment. | ||
| 2. Federated Deployment controller is notified about the update. | ||
| 3. Federated Deployment controller creates a new Shadow Federated ReplicaSet for the updated spec. | ||
| 4. Federated Deployment controller updates Local Deployments in underlying clusters | ||
| 5. Local Deployments create new Local ReplicaSets (in underlying clusters). | ||
| 6. Replicas are moved from the old Local ReplicaSet to the new LocalReplicaSet by Local Deployment controller. | ||
| 7. The status of Local ReplicaSets is updated. | ||
| 8. Federated ReplicaSet controller learn about changes in the underlying clusters. | ||
| 9. Shadow ReplicaSet status is updated based on Local ReplicaSet statuses. | ||
|
|
||
|
|
||
| ## Implementation | ||
|
|
||
| The implementation of Federated Deployment Controller is largely based on the existing Federated ReplicaSet Controller. The following changes however will be added (on top of obvious changes like “s/replicaset/deployment/”): | ||
|
|
||
| + For each of the pod templates, Federated Deployment will make sure that shadow FederatedReplicaset with the name deployment_name + “-” + pod_template_hash is created, with the appropriate revision number. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are discussions going on about changing this. Also, between two cluster versions pod-template-hash may not be consistent, so there's no guarantee the federated RS matches up to the version the local D creates. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @mwielgus I assumed the shadow replica set would be hashing the template defined for the federated deployment, and wouldn't need to rely on a cluster-specific hash. Is that not the case? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hashing the podtemplate or any kind of api object is not reliable due to api changes. I will prepare a proposal during the weekend for switching to something stable. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am thinking of combining either the uid or the name of a deployment with its podtemplate generation (new concept that is not there yet) to produce the new hash. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Opened #384 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for pointing that out @Kargakis and @smarterclayton There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Not a subcommand that I know of |
||
|
|
||
| + Local deployment checks will occur according to annotation (if present) or alphabetically. So that the order | ||
| of cluster updates is predictable. | ||
|
|
||
| + If Federated Deployment with `strategy == RollingUpdate` spots a Local Deployment with different pod template it updates it only if there is no other update going on. Updates are spotted by checking if: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Start this by saying that federated deployment controller updates one cluster at a time. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am no expert, but I think we can rewrite this along the lines of: "federated deployment controller updates local deployment in one cluster at a time. It determines if an update is going on in a cluster by checking: . |
||
| + `deployment.status.updatedReplicas != deployment.status.replicas` | ||
| + `deployment.status.unavailableReplicas != 0` | ||
| + `deployment.status.observed_generation != deployment.metadata.generation` | ||
|
|
||
| The Local Deployment will get a chance to be updated at a later time. Every change in Local Deployments trigger a Federated Deployment reconciliation. So once one Local Deployment is done with the update and publishes its new state the reconciliation will be triggered and next Local Deployment update started. | ||
| Obviously if the Federated Deployment itself is updated then all currently going on updates are stopped and a new one is stated. | ||
|
|
||
| + Federated Deployments with `strategy == Recreate` update all clusters at once. | ||
| + FederatedDeployments always update replica numbers (if needed) in the underlying deployments specs, no matter whether some update is going on or not. | ||
| + Rollback REST api is registered. | ||
| + Federated Deployment Controller has support for deleting old Federated ReplicaSets (just like in regular deployments). Federated Replicasets are deleted in a non-cascading way (which means that the by deleting a | ||
| shadow federated replicaset we don't delete replicasets in underlying clusters). | ||
|
|
||
| Moreover a Shadow Federated ReplicaSet (SFRS) support will be added to Federated Replicaset Controller: | ||
|
|
||
| + SFRS are marked with an annotation. The presence of the annotation makes the FRS a shadow. | ||
| + SFRS are created by Federated Deployments and contain the complete ObjectMeta, ReplicaSetSpec (without replica count) and the proper PodSpec. | ||
| + SFRS monitors all underlying clusters where the replicas could potentially be found and updates their ReplicaSetStatuses and ReplicaSetSpec.Replicas based on the underlying cluster content. | ||
| + SFRS never updates RS in underlying clusters. | ||
| + SFRS will be recreated by federated deployment controller if user deletes it. | ||
| + Federated Deployment controller will overwrite any direct updates to SFRS by user. | ||
| + It will be deleted by federated deployment controller when the federated deployment is deleted. | ||
|
|
||
| ## Tests | ||
|
|
||
| Apart from similar tests that should be done for FRS, FD tests need to check: | ||
| + If a failure in update stops the whoe update. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. whoe -> whole |
||
| + If a new update is requested before the previous one was completed. | ||
| + If all of the rollout mechanics and kubectl command work | ||
| + If Federated ReplicaSets have the correct statuses and replica counts. | ||
| + If FRS is not deleted unless FD is deleted and is deleted when FD is deleted. | ||
| + Federated deployment controller overrites any direct updates to SFRS by users. | ||
|
|
||
|
|
||
| <!-- BEGIN MUNGE: GENERATED_ANALYTICS --> | ||
| []() | ||
| <!-- END MUNGE: GENERATED_ANALYTICS --> | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/go back to the previous version/rollbacks to previous versions/