Thin arbiter quorum for 2-way replication #352

pranithk · 2017-11-08T04:55:03Z

Thin-arbiter design document details the design which introduces new kind of quorum in 2-way replication in AFR which adds quorum at a brick level. One of the use cases this can be used in is stretch cluster scenario. Users can also use it if they find it fits their use case better than arbiter. Once this becomes stable, we should start looking at deprecating 2-way replication IMHO.

Please add your comments to the document itself, so that the complete discussion is in the document.

slesru · 2017-11-08T08:22:17Z

For me it is absolutely not clear how this can deprecate 2 way replications- we use just 2 machines in our cluster with it.

ShyamsundarR · 2017-11-21T18:16:32Z

@pranithk What are the implication when upgrading a cluster, with this approach? Today, the upgrade process, is to upgrade one of the replica bricks and then wait for healing to catch up before upgrading the other brick as in here.

So with the thin arbiter, does this change at all? Or, how does the upgrade procedure change (including upgrading the arbiter in future releases)?

pranithk · 2017-12-07T10:08:16Z

@slesru sorry for the delay in response, was on vacation for a while.
2-way replication leads to split-brains. There is no way to avoid it. The only way to avoid it is if we have at least 3 nodes. So we have 3 variants now to avoid split-brains.

3-way replication
arbiter
thin-arbiter

pranithk · 2017-12-07T10:09:37Z

@ShyamsundarR upgrade procedure is same. At least I am not able to see any conflicting cases to prevent it at the moment.

amarts · 2017-12-12T08:34:19Z

Some discussions I had with team offline, which I am recording here for wider audience.

Challenges of management layer latency issues

One of the main goal of thin arbiter is to have 3rd node in remote/cloud setup which can have higher latency. But our management layer would become very slow with this, if we want to manage the thin-arbiter through glusterd (or gd2) process.

Possible solution: Provide a separate command which will always run, say gluster-tiebreaker, which can be a script which will run glusterfs itself as client process (which can connect to one or more glusterd processes with backup-volfile server).

Now, the volume create can add this arbiter node as a common option, which will make sure the same will be added to all the replica pairs.

A clear document with all tested steps to revert a bad brick to good brick in the thin arbiter.

This is very important in this design as there are possibilities where just because of one (or few) file pending heal, a whole brick, which may contain millions of files may be marked as bad. In that case, we need a mechanism to override the code-path by resetting the relevant flags.

Feedback welcome @pranithk @itisravi @karthik-us @aspandey

gluster-ant · 2018-03-25T12:47:08Z

A patch https://review.gluster.org/19545 has been posted that references this issue.
Commit message: cluster/afr: Implement thin-arbiter translator

gluster-ant · 2018-04-09T01:22:25Z

A patch https://review.gluster.org/19835 has been posted that references this issue.
Commit message: afr: thin arbiter changes

pranithk · 2018-04-18T10:37:19Z

I found a race bug in the earlier design where split-brains can happen. I just had a discussion with @aspandey @itisravi @karthik-us about the problem and solution. Ashish will be updating the design doc with the relevant changes. This is going to be fixed as a separate bug after the initial implementation.

@aspandey Please add a comment once you update the design document.

aspandey · 2018-04-20T08:06:03Z

I have updated the design document as per the discussion on issue raised because of race between shd up call and transaction.
Please review it and provide your comments.

pranithk · 2018-04-25T08:02:28Z

@amarts @ShyamsundarR There was no consensus in the document which brought doc/spec approved flags about the format of the document last I checked. What do you guys think we should do to get the spec/doc approved flags considering that the google-doc already has the documentation of the work that is done? Should we send one more patch to glusterfs-specs?

gluster-ant · 2018-04-25T19:37:12Z

A patch https://review.gluster.org/19940 has been posted that references this issue.
Commit message: cluster/afr: shd changes for thin arbiter.

amarts · 2018-04-27T03:07:16Z

Flags provided after revisiting the document and the CLI format discussions in glusterd2 issues (referenced above).

Updates #352 Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf Signed-off-by: Ashish Pandey <aspandey@redhat.com>

1. Create thin arbiter index file during mount. 2. Set pending marker in thin arbiter id file in case of failure. Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82 updates: #352 Signed-off-by: Ravishankar N <ravishankar@redhat.com>

Updates #352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>

hsafe · 2018-06-20T08:43:29Z

@pranithk
Can you please explain a bit when you say: 2-way replication leads to split-brains ?
I am trying to understand if I run a 2x replica nodes each one brick ...in case of one fails and come back live, if I have a fallback config on the clients the two nodes can heal and sync each other or not? Can you pelase clarify?

pranithk · 2018-06-20T08:47:20Z

@hsafe For the case you explain, it will be fine.
Let us say the setup has 2 bricks b0, b1
Consider the following case:

File 'a' is created when both the bricks b0 and b1 are available.
Now when a brick b0 goes down, 'a' is added with content 'abc'
Now b0 comes up, but before heal/sync can happen b1 goes down.
If you add some other content say 'def' to 'a', it will succeed.
Now when b1 comes back up. You have 'a' on b0 with 'abc' and 'a' on b1 with 'def' both of the bricks think they have the correct copy leading to split-brain situation.

hsafe · 2018-06-20T09:46:11Z

@pranithk
Thanks that explained everything...:)

Updates gluster#352 Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf Signed-off-by: Ashish Pandey <aspandey@redhat.com>

1. Create thin arbiter index file during mount. 2. Set pending marker in thin arbiter id file in case of failure. Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82 updates: gluster#352 Signed-off-by: Ravishankar N <ravishankar@redhat.com>

Updates gluster#352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>

Discussion on thin arbiter volume - #352 (comment) Main idea of having this rpm package is to deploy thin-arbiter without glusterd and other commands on a node, and all we need on that tie-breaker node is to run a single glusterfs command. Also note that, no other glusterfs installation needs thin-arbiter.so. Make sure RPM contains sample vol file, which can work by default, and a script to configure that volfile, along with translator image. Change-Id: Ibace758373d8a991b6a19b2ecc60c93b2f8fc489 updates: bz#1674389 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com>

Discussion on thin arbiter volume - #352 (comment) Main idea of having this rpm package is to deploy thin-arbiter without glusterd and other commands on a node, and all we need on that tie-breaker node is to run a single glusterfs command. Also note that, no other glusterfs installation needs thin-arbiter.so. Make sure RPM contains sample vol file, which can work by default, and a script to configure that volfile, along with translator image. Change-Id: Ibace758373d8a991b6a19b2ecc60c93b2f8fc489 updates: bz#1672818 Signed-off-by: Amar Tumballi <amarts@redhat.com> Signed-off-by: Ashish Pandey <aspandey@redhat.com> (cherry picked from commit ca9bef7)

amarts added CB: afr CB: arbiter labels Nov 27, 2017

ShyamsundarR assigned itisravi Mar 20, 2018

ShyamsundarR added this to the Release 4.1 (LTM) milestone Mar 20, 2018

ShyamsundarR added this to Release 4.1 in Releases Mar 20, 2018

ShyamsundarR mentioned this issue Mar 20, 2018

Two Site Stretch Cluster #281

Closed

amarts mentioned this issue Apr 16, 2018

Support volfile generation for Thin Arbiter volume type gluster/glusterd2#655

Closed

amarts added SpecApproved This is a mandatory flag for passing the smoke for feature. Provide spec of feature to get approval DocApproved Mandatory flag to pass smoke. Provide user facing document to get approval. labels Apr 27, 2018

gluster-ant pushed a commit that referenced this issue Apr 27, 2018

feature/thin-arbiter: Implement thin-arbiter translator

69c35db

Updates #352 Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf Signed-off-by: Ashish Pandey <aspandey@redhat.com>

gluster-ant pushed a commit that referenced this issue Apr 30, 2018

cluster/afr: shd changes for thin arbiter

1bf8a8a

Updates #352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>

ShyamsundarR closed this as completed Jun 20, 2018

pranithk mentioned this issue Aug 23, 2018

Mount option for writing onto single node in replicated volume for speed-up? #499

Closed

amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018

feature/thin-arbiter: Implement thin-arbiter translator

861d592

Updates gluster#352 Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf Signed-off-by: Ashish Pandey <aspandey@redhat.com>

amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018

cluster/afr: shd changes for thin arbiter

be68351

Updates gluster#352 Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d Signed-off-by: karthik-us <ksubrahm@redhat.com>

amarts mentioned this issue Jun 28, 2019

Thin-arbiter integration with GD1 #687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thin arbiter quorum for 2-way replication #352

Thin arbiter quorum for 2-way replication #352

pranithk commented Nov 8, 2017

slesru commented Nov 8, 2017

ShyamsundarR commented Nov 21, 2017

pranithk commented Dec 7, 2017

pranithk commented Dec 7, 2017

amarts commented Dec 12, 2017

gluster-ant commented Mar 25, 2018

gluster-ant commented Apr 9, 2018

pranithk commented Apr 18, 2018

aspandey commented Apr 20, 2018

pranithk commented Apr 25, 2018

gluster-ant commented Apr 25, 2018

amarts commented Apr 27, 2018

hsafe commented Jun 20, 2018

pranithk commented Jun 20, 2018

hsafe commented Jun 20, 2018

Thin arbiter quorum for 2-way replication #352

Thin arbiter quorum for 2-way replication #352

Comments

pranithk commented Nov 8, 2017

slesru commented Nov 8, 2017

ShyamsundarR commented Nov 21, 2017

pranithk commented Dec 7, 2017

pranithk commented Dec 7, 2017

amarts commented Dec 12, 2017

Challenges of management layer latency issues

A clear document with all tested steps to revert a bad brick to good brick in the thin arbiter.

gluster-ant commented Mar 25, 2018

gluster-ant commented Apr 9, 2018

pranithk commented Apr 18, 2018

aspandey commented Apr 20, 2018

pranithk commented Apr 25, 2018

gluster-ant commented Apr 25, 2018

amarts commented Apr 27, 2018

hsafe commented Jun 20, 2018

pranithk commented Jun 20, 2018

hsafe commented Jun 20, 2018