New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Garden Cluster Disaster Recovery (a.k.a. Gardener Ring) #233
Labels
area/disaster-recovery
Disaster recovery related
component/gardener
Gardener
kind/epic
Large multi-story topic
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
Comments
5 tasks
vlerenc
added
component/gardener
Gardener
area/disaster-recovery
Disaster recovery related
kind/epic
Large multi-story topic
and removed
component/gardener
Gardener
labels
Jun 27, 2018
gardener-robot-ci-1
added
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Oct 5, 2018
rfranzke
added a commit
to rfranzke/gardener
that referenced
this issue
Oct 19, 2018
gardener-robot-ci-1
added
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Dec 5, 2018
gardener-robot-ci-1
added
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Feb 4, 2019
richardyuwen
pushed a commit
to richardyuwen/gardener
that referenced
this issue
Mar 26, 2019
gardener-robot-ci-1
added
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Apr 6, 2019
gardener-robot-ci-1
added
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Jun 6, 2019
27 tasks
gardener-robot-ci-1
added
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Oct 1, 2019
ghost
added
the
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
label
Dec 1, 2019
ghost
removed
the
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
label
Dec 2, 2019
ghost
added
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Jan 31, 2020
ghost
added
the
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
label
Apr 1, 2020
ghost
added
lifecycle/rotten
Denotes an issue or PR that has aged beyond stale and will be auto-closed.
and removed
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
labels
Jun 1, 2020
vlerenc
added
roadmap/external
and removed
lifecycle/rotten
Denotes an issue or PR that has aged beyond stale and will be auto-closed.
labels
Sep 24, 2020
gardener-robot
added
the
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
label
Dec 17, 2020
/close as it's unlikely that we will implement this ring due to complexity concerns |
gardener-robot
added
the
roadmap/standalone
Roadmap for the (on-prem) standalone delivery, e.g. CDC, NS2, etc.
label
May 21, 2021
vlerenc
removed
roadmap/standalone
Roadmap for the (on-prem) standalone delivery, e.g. CDC, NS2, etc.
lifecycle/icebox
labels
Jun 8, 2021
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/disaster-recovery
Disaster recovery related
component/gardener
Gardener
kind/epic
Large multi-story topic
lifecycle/stale
Denotes an issue or PR has remained open with no activity and has become stale.
Story
Motivation
Availability SLO, Business continuity.
Acceptance Criteria
Implementation Proposal
Form a highly available distributed self-healing Gardener ring that watches its hosting garden clusters (forming a ring of seed/shoots) and repairs lost clusters underneath itself autonomously and automatically. This is not so much a pure disaster recovery as a mix of:
Idea how this could be set up: Let's use Minikube (on an IaaS VM with IaaS LBs) to bootstrap a Gardener, create a ring of seeds (actually garden clusters) and transfer the first seed control plane into the last seed cluster, finally deploy an etcd cluster across the garden clusters, an API server as well, form a virtual distributed node-less Kubernetes cluster and transfer the Gardener control plane into that ring and shut down the Minikube cluster again. The result should be 3 seeds (actually garden clusters), one watching the other with a distributed Gardener using a distributed etcd on a distributed virtual node-less Kubernetes cluster consisting only of API servers. The whole thing becomes a self-healing Gardener ring.
Positive side effect: We can reduce our effort into Kubify and focus even more on the Gardener. Also, we can leverage the Gardener functionality for the clusters that run Gardener itself, which is an immense improvement since day-2 operations are much stronger in Gardener than they are in Kubify for obvious reasons. In addition, Gardener is more thoroughly quality-tested and production-hardened than Kubify, which in comparison runs few (but nonetheless critical garden) clusters.
Prerequisites/Requirements
Resources
Release Notes
Definition of Done
The text was updated successfully, but these errors were encountered: