Quick Disaster Recovery options #531

boedy · 2023-09-22T15:33:51Z

I've been exploring multi-availability zone setups offered by various cloud providers, aiming to architect a robust DR solution for potential datacenter failures. With the Kubernetes control plane being HA in most configurations and resilient to a datacenter outage, I'm keen on ensuring a similar resilience for my storage layer.

My primary objective is to operate predominantly in one zone (datacenter) while maintaining an additional asynchronous replica for each resource definition in another zone. This setup would act as a safety net, enabling a swift switch to the standby zone with minimal RTO and RPO should the primary zone encounter issues. While the latency between AZs is generally low, I'm specifically looking for an asynchronous solution to ensure maximum performance in the primary zone without being impacted by any inter-zone communication delays. Additionally, even with low latency, the asynchronous setup provides a buffer against any unforeseen network anomalies between zones.

While the piraeus-ha-controller has been instrumental for quick failovers, its quorum-based scheduling poses challenges. Specifically, achieving quorum becomes problematic if the primary zone goes offline. Additionally, the current placement parameters make it challenging, if not impossible, to schedule X replicas in zone A and Y replicas in zone B.

I've come across setups using DRBD with Pacemaker and Booth for similar requirements. It got me wondering if we could have something akin to that but tailored for a single Kubernetes cluster environment. Perhaps an additional controller that could manage this.

I'm eager to get feedback on this and to learn if there are any existing or upcoming features that resonate with this vision.

WanzenBug added the question Further information is requested label Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quick Disaster Recovery options #531

Quick Disaster Recovery options #531

boedy commented Sep 22, 2023

Quick Disaster Recovery options #531

Quick Disaster Recovery options #531

Comments

boedy commented Sep 22, 2023