You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've been exploring multi-availability zone setups offered by various cloud providers, aiming to architect a robust DR solution for potential datacenter failures. With the Kubernetes control plane being HA in most configurations and resilient to a datacenter outage, I'm keen on ensuring a similar resilience for my storage layer.
My primary objective is to operate predominantly in one zone (datacenter) while maintaining an additional asynchronous replica for each resource definition in another zone. This setup would act as a safety net, enabling a swift switch to the standby zone with minimal RTO and RPO should the primary zone encounter issues. While the latency between AZs is generally low, I'm specifically looking for an asynchronous solution to ensure maximum performance in the primary zone without being impacted by any inter-zone communication delays. Additionally, even with low latency, the asynchronous setup provides a buffer against any unforeseen network anomalies between zones.
While the piraeus-ha-controller has been instrumental for quick failovers, its quorum-based scheduling poses challenges. Specifically, achieving quorum becomes problematic if the primary zone goes offline. Additionally, the current placement parameters make it challenging, if not impossible, to schedule X replicas in zone A and Y replicas in zone B.
I've come across setups using DRBD with Pacemaker and Booth for similar requirements. It got me wondering if we could have something akin to that but tailored for a single Kubernetes cluster environment. Perhaps an additional controller that could manage this.
I'm eager to get feedback on this and to learn if there are any existing or upcoming features that resonate with this vision.
The text was updated successfully, but these errors were encountered:
I've been exploring multi-availability zone setups offered by various cloud providers, aiming to architect a robust DR solution for potential datacenter failures. With the Kubernetes control plane being HA in most configurations and resilient to a datacenter outage, I'm keen on ensuring a similar resilience for my storage layer.
My primary objective is to operate predominantly in one zone (datacenter) while maintaining an additional asynchronous replica for each resource definition in another zone. This setup would act as a safety net, enabling a swift switch to the standby zone with minimal RTO and RPO should the primary zone encounter issues. While the latency between AZs is generally low, I'm specifically looking for an asynchronous solution to ensure maximum performance in the primary zone without being impacted by any inter-zone communication delays. Additionally, even with low latency, the asynchronous setup provides a buffer against any unforeseen network anomalies between zones.
While the piraeus-ha-controller has been instrumental for quick failovers, its quorum-based scheduling poses challenges. Specifically, achieving quorum becomes problematic if the primary zone goes offline. Additionally, the current placement parameters make it challenging, if not impossible, to schedule X replicas in zone A and Y replicas in zone B.
I've come across setups using DRBD with Pacemaker and Booth for similar requirements. It got me wondering if we could have something akin to that but tailored for a single Kubernetes cluster environment. Perhaps an additional controller that could manage this.
I'm eager to get feedback on this and to learn if there are any existing or upcoming features that resonate with this vision.
The text was updated successfully, but these errors were encountered: