Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thin arbiter quorum for 2-way replication #352

Closed
pranithk opened this issue Nov 8, 2017 · 15 comments
Closed

Thin arbiter quorum for 2-way replication #352

pranithk opened this issue Nov 8, 2017 · 15 comments
Assignees
Labels
CB: afr CB: arbiter DocApproved Mandatory flag to pass smoke. Provide user facing document to get approval. SpecApproved This is a mandatory flag for passing the smoke for feature. Provide spec of feature to get approval

Comments

@pranithk
Copy link
Member

pranithk commented Nov 8, 2017

Thin-arbiter design document details the design which introduces new kind of quorum in 2-way replication in AFR which adds quorum at a brick level. One of the use cases this can be used in is stretch cluster scenario. Users can also use it if they find it fits their use case better than arbiter. Once this becomes stable, we should start looking at deprecating 2-way replication IMHO.

Please add your comments to the document itself, so that the complete discussion is in the document.

@slesru
Copy link

slesru commented Nov 8, 2017

For me it is absolutely not clear how this can deprecate 2 way replications- we use just 2 machines in our cluster with it.

@ShyamsundarR
Copy link
Contributor

@pranithk What are the implication when upgrading a cluster, with this approach? Today, the upgrade process, is to upgrade one of the replica bricks and then wait for healing to catch up before upgrading the other brick as in here.

So with the thin arbiter, does this change at all? Or, how does the upgrade procedure change (including upgrading the arbiter in future releases)?

@pranithk
Copy link
Member Author

pranithk commented Dec 7, 2017

@slesru sorry for the delay in response, was on vacation for a while.
2-way replication leads to split-brains. There is no way to avoid it. The only way to avoid it is if we have at least 3 nodes. So we have 3 variants now to avoid split-brains.

  1. 3-way replication
  2. arbiter
  3. thin-arbiter

@pranithk
Copy link
Member Author

pranithk commented Dec 7, 2017

@ShyamsundarR upgrade procedure is same. At least I am not able to see any conflicting cases to prevent it at the moment.

@amarts
Copy link
Member

amarts commented Dec 12, 2017

Some discussions I had with team offline, which I am recording here for wider audience.

Challenges of management layer latency issues

One of the main goal of thin arbiter is to have 3rd node in remote/cloud setup which can have higher latency. But our management layer would become very slow with this, if we want to manage the thin-arbiter through glusterd (or gd2) process.

  • Possible solution: Provide a separate command which will always run, say gluster-tiebreaker, which can be a script which will run glusterfs itself as client process (which can connect to one or more glusterd processes with backup-volfile server).

Now, the volume create can add this arbiter node as a common option, which will make sure the same will be added to all the replica pairs.

A clear document with all tested steps to revert a bad brick to good brick in the thin arbiter.

This is very important in this design as there are possibilities where just because of one (or few) file pending heal, a whole brick, which may contain millions of files may be marked as bad. In that case, we need a mechanism to override the code-path by resetting the relevant flags.


Feedback welcome @pranithk @itisravi @karthik-us @aspandey

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19545 has been posted that references this issue.
Commit message: cluster/afr: Implement thin-arbiter translator

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19835 has been posted that references this issue.
Commit message: afr: thin arbiter changes

@pranithk
Copy link
Member Author

I found a race bug in the earlier design where split-brains can happen. I just had a discussion with @aspandey @itisravi @karthik-us about the problem and solution. Ashish will be updating the design doc with the relevant changes. This is going to be fixed as a separate bug after the initial implementation.

@aspandey Please add a comment once you update the design document.

@aspandey
Copy link
Member

I have updated the design document as per the discussion on issue raised because of race between shd up call and transaction.
Please review it and provide your comments.

@pranithk
Copy link
Member Author

@amarts @ShyamsundarR There was no consensus in the document which brought doc/spec approved flags about the format of the document last I checked. What do you guys think we should do to get the spec/doc approved flags considering that the google-doc already has the documentation of the work that is done? Should we send one more patch to glusterfs-specs?

@gluster-ant
Copy link
Collaborator

A patch https://review.gluster.org/19940 has been posted that references this issue.
Commit message: cluster/afr: shd changes for thin arbiter.

@amarts amarts added SpecApproved This is a mandatory flag for passing the smoke for feature. Provide spec of feature to get approval DocApproved Mandatory flag to pass smoke. Provide user facing document to get approval. labels Apr 27, 2018
@amarts
Copy link
Member

amarts commented Apr 27, 2018

Flags provided after revisiting the document and the CLI format discussions in glusterd2 issues (referenced above).

gluster-ant pushed a commit that referenced this issue Apr 27, 2018
Updates #352

Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
gluster-ant pushed a commit that referenced this issue Apr 30, 2018
1. Create thin arbiter index file during mount.
2. Set pending marker in thin arbiter id file in case of failure.

Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82
updates: #352
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
gluster-ant pushed a commit that referenced this issue Apr 30, 2018
Updates #352

Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d
Signed-off-by: karthik-us <ksubrahm@redhat.com>
@hsafe
Copy link

hsafe commented Jun 20, 2018

@pranithk
Can you please explain a bit when you say: 2-way replication leads to split-brains ?
I am trying to understand if I run a 2x replica nodes each one brick ...in case of one fails and come back live, if I have a fallback config on the clients the two nodes can heal and sync each other or not? Can you pelase clarify?

@pranithk
Copy link
Member Author

@hsafe For the case you explain, it will be fine.
Let us say the setup has 2 bricks b0, b1
Consider the following case:

  1. File 'a' is created when both the bricks b0 and b1 are available.
  2. Now when a brick b0 goes down, 'a' is added with content 'abc'
  3. Now b0 comes up, but before heal/sync can happen b1 goes down.
  4. If you add some other content say 'def' to 'a', it will succeed.
  5. Now when b1 comes back up. You have 'a' on b0 with 'abc' and 'a' on b1 with 'def' both of the bricks think they have the correct copy leading to split-brain situation.

@hsafe
Copy link

hsafe commented Jun 20, 2018

@pranithk
Thanks that explained everything...:)

amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
Updates gluster#352

Change-Id: I3d8caa6479dc8e48bec62a09b056971bb061f0cf
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
1. Create thin arbiter index file during mount.
2. Set pending marker in thin arbiter id file in case of failure.

Change-Id: I269eb8d069f0323f1fc616175e5e5eb7b91d5f82
updates: gluster#352
Signed-off-by: Ravishankar N <ravishankar@redhat.com>
amarts pushed a commit to amarts/glusterfs_fork that referenced this issue Sep 11, 2018
Updates gluster#352

Change-Id: I1bbb3c652ba33cec6aa37f3700370674077fb17d
Signed-off-by: karthik-us <ksubrahm@redhat.com>
gluster-ant pushed a commit that referenced this issue Mar 11, 2019
Discussion on thin arbiter volume -
#352 (comment)

Main idea of having this rpm package is to deploy thin-arbiter
without glusterd and other commands on a node, and all we need
on that tie-breaker node is to run a single glusterfs command.
Also note that, no other glusterfs installation needs
thin-arbiter.so.

Make sure RPM contains sample vol file, which can work by default,
and a script to configure that volfile, along with translator image.

Change-Id: Ibace758373d8a991b6a19b2ecc60c93b2f8fc489
updates: bz#1674389
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
gluster-ant pushed a commit that referenced this issue Mar 13, 2019
Discussion on thin arbiter volume -
#352 (comment)

Main idea of having this rpm package is to deploy thin-arbiter
without glusterd and other commands on a node, and all we need
on that tie-breaker node is to run a single glusterfs command.
Also note that, no other glusterfs installation needs
thin-arbiter.so.

Make sure RPM contains sample vol file, which can work by default,
and a script to configure that volfile, along with translator image.

Change-Id: Ibace758373d8a991b6a19b2ecc60c93b2f8fc489
updates: bz#1672818
Signed-off-by: Amar Tumballi <amarts@redhat.com>
Signed-off-by: Ashish Pandey <aspandey@redhat.com>
(cherry picked from commit ca9bef7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CB: afr CB: arbiter DocApproved Mandatory flag to pass smoke. Provide user facing document to get approval. SpecApproved This is a mandatory flag for passing the smoke for feature. Provide spec of feature to get approval
Projects
None yet
Development

No branches or pull requests

8 participants