-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix move tablet Jepsen test failure #4496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… before the move.
…e after it has latest membership info from Zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 8 of 8 files at r1.
Reviewable status: all files reviewed, 5 unresolved discussions (waiting on @manishrjain)
protos/pb.proto, line 28 at r1 (raw file):
import "api.proto"; import "github.com/dgraph-io/badger/pb/pb.proto";
Are you fine making this change? Usually I revert it back because that's what Daniel suggested. I am fine either way since the protobuf compiler still doesn't understand this type of paths.
worker/groups.go, line 398 at r1 (raw file):
fmt
minor: use errors.Errorf for consistency with the rest of the codebase.
worker/groups.go, line 426 at r1 (raw file):
g.tablets[key] = out if out != nil && ts > 0 && ts < out.MoveTs { return 0, fmt.Errorf("StartTs: %d is from before MoveTs: %d for pred: %q",
minor: use errors.Errorf for consistency with the rest of the codebase.
worker/predicate_move.go, line 196 at r1 (raw file):
Quoted 4 lines of code…
// This loop ensures that we have seen the latest membership update, where this predicate now // belongs to another group. So, we won't serve any transaction asking for this predicate. // Without ensuring that we have seen this update, if we delete the tablet below, we'd end up // serving wrong data and cause Jepsen failures.
This sentence was a bit confusing at first. Maybe something like.
This loop ensures that we have seen a membership update following the move where this predicate now belongs to another group. Without this check, this group could end up serving transactions asking for this predicate, even after
this tablet has been deleted. This issue is known to have caused Jepsen failures.
worker/predicate_move.go, line 214 at r1 (raw file):
Quoted 5 lines of code…
if in.DestGid == 0 { glog.Infof("Was instructed to delete tablet: %v", in.Predicate) p := &pb.Proposal{CleanPredicate: in.Predicate} return &emptyPayload, groups().Node.proposeAndWait(ctx, p) }
Can this go before the part you added? I think it can since deletion implies we don't have to wait for another group to complete the move.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 6 unresolved discussions (waiting on @manishrjain)
protos/pb.proto, line 191 at r1 (raw file):
bool remove = 8; bool read_only = 9; // If true, do not ask zero to serve any tablets. uint64 move_ts = 10;
some weird indentation issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed the comments.
Reviewed 8 of 8 files at r1.
Reviewable status: all files reviewed, 4 unresolved discussions (waiting on @danielmai, @mangalaman93, and @martinmr)
protos/pb.proto, line 28 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
Are you fine making this change? Usually I revert it back because that's what Daniel suggested. I am fine either way since the protobuf compiler still doesn't understand this type of paths.
I talked to Daniel, he didn't see any objections to keeping it without the v2. He also was going to work on changing the regenerate script to do these fix ups in the generated Go code. CC: @danielmai Please do that before we forget.
protos/pb.proto, line 191 at r1 (raw file):
Previously, mangalaman93 (Aman Mangal) wrote…
some weird indentation issue here.
Fixed it. read_only had spaces.
worker/groups.go, line 398 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
fmtminor: use
errors.Errorffor consistency with the rest of the codebase.
Done.
worker/groups.go, line 426 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
minor: use
errors.Errorffor consistency with the rest of the codebase.
Done.
worker/predicate_move.go, line 196 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
// This loop ensures that we have seen the latest membership update, where this predicate now // belongs to another group. So, we won't serve any transaction asking for this predicate. // Without ensuring that we have seen this update, if we delete the tablet below, we'd end up // serving wrong data and cause Jepsen failures.This sentence was a bit confusing at first. Maybe something like.
This loop ensures that we have seen a membership update following the move where this predicate now belongs to another group. Without this check, this group could end up serving transactions asking for this predicate, even after
this tablet has been deleted. This issue is known to have caused Jepsen failures.
Done.
worker/predicate_move.go, line 214 at r1 (raw file):
Previously, martinmr (Martin Martinez Rivera) wrote…
if in.DestGid == 0 { glog.Infof("Was instructed to delete tablet: %v", in.Predicate) p := &pb.Proposal{CleanPredicate: in.Predicate} return &emptyPayload, groups().Node.proposeAndWait(ctx, p) }Can this go before the part you added? I think it can since deletion implies we don't have to wait for another group to complete the move.
If we delete before we have seen the update, then we would have the failure. The race cond would be: we'd delete the data first. A txn would serve the (non-existent) data. Then we'll get the membership update, saying don't serve the data.
* Track move ts of a pred move, so we can reject any reads which happen before the move. * Try with a timestamp jitter to update MaxAssigned. * Added an expected checksum, so Alpha group would only delete predicate after it has latest membership info from Zero. * Regenerate protos. Co-authored-by: Daniel Mai <daniel@dgraph.io> (cherry picked from commit ec44550)
|
It also sounds like this might explain why we saw spurious null values, because the tablets aren't present at all, but... why would we see read skew where an actual value was present? |
My guess is this:
I think not servicing such reads would fix the problem and so far the testing has been supportive of that. This PR fixed majority of the failures with bank/move-tablet. I'm chasing some of the edge cases around this fix right now (ensuring followers do the same actions as the leader in terms of waiting for membership update before deleting their data). |
That might be the different issue with posting list splits. In my branch, I've turned off the splits, and I haven't seen invalid values being served (only nil values, consistent with my above theory). |
In #4496, we introduced the concept of waiting for latest membership status, before the source Alpha proposes deletion of the tablet (/predicate). However, that fix only applied that check on the source Alpha group's leader. It should have been applied on each member of the Alpha group. This PR fixes that by making the expected membership checksum part of the `CleanPredicate` proposal. That way, each Alpha can block until they see the latest membership update which says that they no longer serve that tablet, before deleting it. This fixes a bunch of `bank/move-tablet` failures, but we still see very rare `bank/move-tablet` violations. Another fix this PR contains is to avoid doing posting list splits, which were causing `uid-set/move-tablet` failures, fixing #4538. Changes: * Martin's Split code has a bug. Let's not split posting lists. * Better messaging around turning off split * Make ExpectedChecksum part of the CleanPredicate proposal, so all followers also block deletion of the predicate until they have seen the latest membership update.
In #4496, we introduced the concept of waiting for latest membership status, before the source Alpha proposes deletion of the tablet (/predicate). However, that fix only applied that check on the source Alpha group's leader. It should have been applied on each member of the Alpha group. This PR fixes that by making the expected membership checksum part of the `CleanPredicate` proposal. That way, each Alpha can block until they see the latest membership update which says that they no longer serve that tablet, before deleting it. This fixes a bunch of `bank/move-tablet` failures, but we still see very rare `bank/move-tablet` violations. Another fix this PR contains is to avoid doing posting list splits, which were causing `uid-set/move-tablet` failures, fixing #4538. Changes: * Martin's Split code has a bug. Let's not split posting lists. * Better messaging around turning off split * Make ExpectedChecksum part of the CleanPredicate proposal, so all followers also block deletion of the predicate until they have seen the latest membership update.
In #4496, we introduced the concept of waiting for latest membership status, before the source Alpha proposes deletion of the tablet (/predicate). However, that fix only applied that check on the source Alpha group's leader. It should have been applied on each member of the Alpha group. This PR fixes that by making the expected membership checksum part of the `CleanPredicate` proposal. That way, each Alpha can block until they see the latest membership update which says that they no longer serve that tablet, before deleting it. This fixes a bunch of `bank/move-tablet` failures, but we still see very rare `bank/move-tablet` violations. Another fix this PR contains is to avoid doing posting list splits, which were causing `uid-set/move-tablet` failures, fixing #4538. Changes: * Martin's Split code has a bug. Let's not split posting lists. * Better messaging around turning off split * Make ExpectedChecksum part of the CleanPredicate proposal, so all followers also block deletion of the predicate until they have seen the latest membership update. (cherry picked from commit 2540a93)
Jepsen bank tests with "move-tablet" nemesis were failing due to:
For transactions which are doing writes, this wasn't an issue because commits require consultation with Zero. During commit, Zero ensures that the group which served the data is also the current owner of that predicate. Otherwise, it aborts the transaction. So, this was only an issue for read-only transactions where Zero isn't consulted.
This PR fixes the issue by doing two things:
With these changes, Jepsen tests with "move-tablet" nemesis are passing on my machine.
This change is