Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Only extend cluster database when reaching three members #6230
As it currently stands when bringing up clustering, the first three members all immediately become database members. This isn't ideal as unfortunately many people will only bring up a two servers cluster, ending up in the worst situation where loosing either of them leads to a broken cluster.
I believe it would instead be preferable to not have the second member act as a database member until a third member is joined at which point both second and third should be promoted to database members at the same time.
As part of this we should also update our clustering documentation to more strongly explain why two members clusters should be avoided and to more definitely recommend our users run clusters of at least three members.
@freeekanayaka this can all be done in LXD itself correct?
We only need to modify the joining logic so we don't bring up the database unless there are at least three members in the cluster, at which point we should promote whatever members are needed to have a database backed by three of them.
For this one, there is no API extension or much in the way of a user visible change.
This is because the RAFT database model we use requires consensus, consensus with just two members is problematic as the loss of either causes a stuck database. As in that scenario, the loss of either member would cause the loss of the database, changing to having just the single database member until a third is added actually improves the odd of recovery from 0% to 50% (the initial member can deal with the loss of the second).
The changes here are likely to all happen in
Testing this should be straightforward enough by running a test cluster in containers or VMs, joining more and more members each time checking
As for expected commits, I'm mainly expecting two for this case:
@freeekanayaka and myself should be able to help you with any question you have.
On top of what @stgraber said, I'll add that you might need to modify the tests in