Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split brain issue #34

Open
Fodoj opened this issue Feb 5, 2016 · 9 comments
Open

Split brain issue #34

Fodoj opened this issue Feb 5, 2016 · 9 comments

Comments

@Fodoj
Copy link

Fodoj commented Feb 5, 2016

If I would simultaneously start 20 nodes each applying this module with same cluster name, is there a chance that I will get split cluster issue? After going through source code it seems like nothing would stop coucbase from doing it.

@justicel
Copy link
Owner

justicel commented Mar 4, 2016

Hi sorry for the long response time. I've had some personal stuff recently. In theory if you could start literally 20 nodes at once that could theoretically result in a split brain in the configuration, yes. At the same time though I don't know of a way that would work in practice. As long as you have existing nodes in the cluster they will pick up the new members and add them and migrate, but that won't happen all at the same time just due to the way that puppet would not be able to run them all with the same timing.

Have you run into an issue specifically with this? I can also try to test something myself.

@Fodoj
Copy link
Author

Fodoj commented Mar 4, 2016

I tested myself and split brain happens in 99% of cases :(

On 4 Mar 2016 01:21 +0100, Justice Londonnotifications@github.com, wrote:

Hi sorry for the long response time. I've had some personal stuff recently. In theory if you could start literally 20 nodes at once that could theoretically result in a split brain in the configuration, yes. At the same time though I don't know of a way that would work in practice. As long as you have existing nodes in the cluster they will pick up the new members and add them and migrate, but that won't happen all at the same time just due to the way that puppet would not be able to run them all with the same timing.

Have you run into an issue specifically with this? I can also try to test something myself.


Reply to this email directly orview it on GitHub(#34 (comment)).

@justicel
Copy link
Owner

justicel commented Mar 4, 2016

Huh. Weird. I'll look into it some more.

@dfairhurst
Copy link

I think this is the key point:

As long as you have existing nodes in the cluster

In the case of spawning a completely new cluster (not adding to an existing one with nodes already) with 20 new VMs this is very likely to happen as the VMs come up simultaneously.

@justicel
Copy link
Owner

justicel commented Mar 4, 2016

@dfairhurst Fair enough. I'll work on engineering a solution for that particular problem.

@rdev5
Copy link
Contributor

rdev5 commented Jul 11, 2017

Thoughts on waiting random T seconds (i.e. sleep $(/usr/bin/shuf -i 1000-10000 -n 1)) in the module before starting/joining the cluster?

@justicel
Copy link
Owner

Good idea! I'll consider how to best implement this that won't fall afoul of timeouts for exec, etc.

@rdev5
Copy link
Contributor

rdev5 commented Jul 11, 2017

Well, assuming this actually works, what about adding it to the couchbasenode.erb template such that subsequent entries would render as follows:

#!/bin/bash

touch /opt/couchbase/var/.installed

#Server node configurations below
/opt/couchbase/bin/couchbase-cli rebalance -c localhost -u couchbase -p 'password' --server-add=couchbase01.example.com --server-add-username=couchbase --server-add-password='password'
/usr/bin/sleep $(/usr/bin/shuf -i 500-10000 -n 1)

/opt/couchbase/bin/couchbase-cli rebalance -c localhost -u couchbase -p 'password' --server-add=couchbase02.example.com --server-add-username=couchbase --server-add-password='password'
/usr/bin/sleep $(/usr/bin/shuf -i 500-10000 -n 1)

@rdev5
Copy link
Contributor

rdev5 commented Jul 11, 2017

Also worth noting:

DEPRECATED: Adding server from the rebalance command is deprecated and will be removed in future release, use the server-add command to add servers instead.

I was originally looking for a --wait option like they have for bucket-create but now I'm curious if server-add behaves any differently in mitigating this same issue more natively.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants