-
Notifications
You must be signed in to change notification settings - Fork 446
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Canary Deployments with Gloo Federation #6127
Comments
Need estimate or alternatives. |
Need to understand level of effort on this one @sam-heilbron |
I tested as an alternative if we can run two Gloo Federation instances at once, the second instance running in the opposite cluster (where again all clusters need to be registered and all resources deployed). I didn't like the UX of this alternative, hence it is crossed out. But what I would like to circle back to is: How important is it to deploy gloo fed using the canary pattern? Gloo Federation is reading the Gloo Edge instances running in the clusters, picking up some configuration applied by the user making the configuration in the clusters so that cross-cluster traffic is possible, and failover works. From then on there aren't ongoing changes that Gloo Federation needs to reconcile. Having a pre-prod environment to test upgrading Gloo Federation should be all that's needed. |
Gloo Fed is a privileged component that controls configuration for multiple edge control planes. I think that the blast radius from a malfunctioning new version can be significant. For example, consider a bug in the orphan termination functionality, that erases configuration from all federated clusters, leading to a complete system outage. There are also inherent compatibility risks when following canary deployment practices for the edge control and data planes in a federated environment. Gloo Fed CRDs and clients may be incompatible with edges that are still running an older version. AFAIK, k8s CRD versioning practices are not applied, breaking changes occur from time to time, and downgrading is difficult in Gloo Edge:
IMHO, the safest way to upgrade a federated environment is:
This scheme is not always feasible, especially when the federation clusters require state synchronization. The next best thing would be to support an in-cluster gloo fed canary deployment. These solutions would only work if Federated CRDs are properly versioned and deprecated.
If Gloo Fed is down:
It's not always possible to have a pre-prod environment that completely simulates production. |
That is an issue.
The third issue is the most likely issue to occur. But the impact is completely negligible. The implementation of Gloo Edge is purposefully different from Istio, Gloo edge doesn't configure the gateway proxy with endpoints (IP addresses for every pod; a luxury that a service mesh cannot afford as it would cause excessive load on the DNS proxy). Summary: Gloo Fed will only make tweaks when you apply Gloo Fed CRDs. Or if you change the Loadbalancer service in one of the gloo instances. (Those changes are not frequent, and at least shouldn't be done when you make a Gloo Fed update) Though without Pre-prod environments, there is no alternative but to have some canary deployment approach to reduce the risk. |
Can limit the scope to having Gloo Fed backwards compatible with GE. |
Right. For example, the Gloo Mesh Control Plane is compatible with n-1 version relay agents to support rolling upgrade scenarios. Ideally, Gloo Fed should have similar compatibility with Gloo Edge. Otherwise, some form of protection is required, to ensure that state of n-1 GEs is not corrupted and that GF doesn't run into global failures due to unexpected GE version under federation. |
breakdown of tasks (not necessarily in order):
|
This issue has been marked as stale because of no activity in the last 180 days. It will be closed in the next 180 days unless it is tagged "no stalebot" or other activity occurs. |
Version
1.11.*
Is your feature request related to a problem? Please describe.
Gloo Edge supports in-place canary deployment: multiple control planes can reconcile the same CRs and produce XDS for two distinct data planes.
With Gloo Federation, It should be possible to perform a blue-green deployment that does not create any upgrade risk to existing clusters. Furthermore, Gloo Federation itself should support a blue-green deployment model, where a new federation version can be tested before it assumes control over existing clusters.
Describe the solution you'd like
This can be achieved by deploying an additional gloo-fed instance and creating new edge clusters with the latest gloo-edge version. Traffic is gradually shifted from old clusters to new ones. The Canary deployment concepts can be applied to Gloo Federation:
GlooInstances
with a matching gloo-edge versionDescribe alternatives you've considered
No response
Additional Context
No response
The text was updated successfully, but these errors were encountered: