Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quick Disaster Recovery options #531

Open
boedy opened this issue Sep 22, 2023 · 0 comments
Open

Quick Disaster Recovery options #531

boedy opened this issue Sep 22, 2023 · 0 comments
Labels
question Further information is requested

Comments

@boedy
Copy link

boedy commented Sep 22, 2023

I've been exploring multi-availability zone setups offered by various cloud providers, aiming to architect a robust DR solution for potential datacenter failures. With the Kubernetes control plane being HA in most configurations and resilient to a datacenter outage, I'm keen on ensuring a similar resilience for my storage layer.

My primary objective is to operate predominantly in one zone (datacenter) while maintaining an additional asynchronous replica for each resource definition in another zone. This setup would act as a safety net, enabling a swift switch to the standby zone with minimal RTO and RPO should the primary zone encounter issues. While the latency between AZs is generally low, I'm specifically looking for an asynchronous solution to ensure maximum performance in the primary zone without being impacted by any inter-zone communication delays. Additionally, even with low latency, the asynchronous setup provides a buffer against any unforeseen network anomalies between zones.

While the piraeus-ha-controller has been instrumental for quick failovers, its quorum-based scheduling poses challenges. Specifically, achieving quorum becomes problematic if the primary zone goes offline. Additionally, the current placement parameters make it challenging, if not impossible, to schedule X replicas in zone A and Y replicas in zone B.

I've come across setups using DRBD with Pacemaker and Booth for similar requirements. It got me wondering if we could have something akin to that but tailored for a single Kubernetes cluster environment. Perhaps an additional controller that could manage this.

I'm eager to get feedback on this and to learn if there are any existing or upcoming features that resonate with this vision.

@WanzenBug WanzenBug added the question Further information is requested label Sep 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants
@WanzenBug @boedy and others