Skip to content

Expand a PXC cluster with nodes from another Kubernetes cluster #166

@hors

Description

@hors

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about the feature
Allow users to add Galera cluster member nodes from a second Kubernetes cluster into an existing PXC cluster. This makes a single Galera cluster span multiple k8s clusters, giving it a unified quorum, shared write-set replication, and automatic failover across cluster boundaries, rather than requiring async replication between two independent PXC clusters.

This follows the same design as the PSMDB operator's externalNodes, adapted for Galera's multi-port replication protocol.

Which product(s) is this request for?
PXC, Operators

Tell us about the problem
Today, connecting two PXC clusters requires asynchronous MySQL replication (replicationChannels). This has fundamental limitations for use cases where users want the two clusters to behave as one:

  • Replication lag — async replication is not synchronous; the replica cluster can fall behind
  • No unified quorum — each cluster maintains its own independent Galera quorum; a cluster failure does not trigger a coordinated failover
  • Split-brain risk — both clusters can accept writes independently during a network partition
  • Blue-green k8s upgrades — the primary use case that surfaced this request: a user wants to spin up a new k8s cluster, move PXC nodes one by one into it (as true Galera members, not replicas), then decommission the old cluster — which is impossible today without async replication gaps and a hard cutover moment

For stateless apps, blue-green k8s upgrades are trivial. For PXC, the only current option is async replication between two distinct clusters, which introduces risk and complexity.

Use Cases

  • Blue-green Kubernetes cluster upgrade — gradually move Galera nodes from old k8s cluster to new k8s cluster, node by node, while the cluster keeps serving traffic with full synchrony
  • Multi-cluster high availability — spread Galera nodes across two k8s clusters so a full k8s cluster failure does not lose quorum
  • Zero-downtime cluster migration — move PXC from one environment (e.g., on-prem k8s) to another (cloud k8s) without any async replication gap

Acceptance Criteria

  • spec.pxc.externalNodes field accepted and validated in PerconaXtraDBCluster CR
  • External node addresses included in wsrep_cluster_address (gcomm://) generation
  • Per-pod services expose Galera ports (4567, 4568, 4444) in addition to 3306 when externalNodes is configured
  • External node successfully joins the Galera cluster via SST across cluster boundary
  • Galera quorum is maintained correctly with mixed local + external nodes
  • Removing an externalNode entry removes it from gcomm:// and Galera excludes it
  • Blue-green upgrade workflow documented step by step with concrete CR examples
  • Quorum planning guidance documented (safe vs unsafe node count combinations)
  • E2E test: 3-node cluster on cluster1 + 2 external nodes from cluster2 → verify single Galera quorum
  • E2E test: simulate blue-green upgrade (node-by-node migration between two clusters)

Jira Link

Metadata

Metadata

Assignees

No one assigned

    Labels

    OperatorsLabel for Percona Kubernetes OperatorsPXCLabel for Percona XtraDB ClusterProposed

    Projects

    Status

    To consider

    Status

    Researching

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions