Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues related to Removed Segments in MergeTree #11678

Merged
merged 24 commits into from Sep 6, 2022

Conversation

anthony-murphy
Copy link
Contributor

@anthony-murphy anthony-murphy commented Aug 25, 2022

Handling removed segments in MergeTree was incorrect, specifically when to consider them in the case of conflict. This case was complicated by the fact that clients can have different removed segments depending on their lifetime. For instance, a new client loading from summary could have no removed segments, while long running clients could have many and different removed segments yet to be cleaned up by zamboni.

Previously there was logic added to treat the length of some deleted segments as undefined. This change was made as any segment of length 0 is considered when resolving a conflict, and this allowed the merge tree to differentiate removed segments from segments with sequence numbers greater than the reference sequence number of some incoming operation. This improved the situation, as it allowed us to exclude those segments from conflict resolution.

However, the heuristic to determine when to consider a removed segment for conflict resolution or not was incorrect, and that is what is improved upon here. Specifically, removed segments who's removed sequence number in within the collab window, are now consider for conflict resolution, and those outside the collab window are not. One key point is we do not consider the removed sequence number itself in conflict resolution, we only consider the sequences insert sequence number as segment order has always been determined by insert order which is an invariant we cannot break.

This works because all clients, regardless of lifetime must agree on the state of all segments in the collab window, as these segments are not yet finalized in the sense that they could be above or below the reference sequence number for other clients, so all clients must be able to reason over them.

This change also brings in some new unit tests that would previously fail, and does a bit of unit test refactoring to make them more consistent.

For now, i've put this new length calculation under and option, as it will regress some existing assumptions around undo redo, but to fix undo redo i want to be able to test with this change

fixes #9703
related #1009 AB#45 AB#841

@anthony-murphy anthony-murphy requested review from a team as code owners August 25, 2022 17:36
@github-actions github-actions bot added area: contributor experience area: dds Issues related to distributed data structures base: next PRs targeted against next branch labels Aug 25, 2022
@anthony-murphy
Copy link
Contributor Author

this is basically the same change that has been in draft for a while now, i just reduced to scope a bit, and put it under an option for easier rollout control.
#9834

@msfluid-bot
Copy link
Collaborator

msfluid-bot commented Aug 25, 2022

@fluid-example/bundle-size-tests: +17.29 KB
Metric NameBaseline SizeCompare SizeSize Diff
aqueduct.js 392.45 KB 394.31 KB +1.86 KB
connectionState.js 680 Bytes 680 Bytes No change
containerRuntime.js 191.92 KB 196.36 KB +4.44 KB
loader.js 151.12 KB 151.06 KB -57 Bytes
map.js 42.63 KB 47.38 KB +4.75 KB
matrix.js 131.63 KB 135.64 KB +4.02 KB
odspDriver.js 150.23 KB 150.11 KB -127 Bytes
odspPrefetchSnapshot.js 38.39 KB 38.35 KB -41 Bytes
sharedString.js 152.42 KB 154.86 KB +2.44 KB
Total Size 1.25 MB 1.27 MB +17.29 KB

Baseline commit: b7b2bd3

Generated by 🚫 dangerJS against 8f0e921

@@ -81,7 +81,7 @@ function runConflictFarmTests(opts: IConflictFarmConfig, extraSeed?: number): vo
}
mt.seedWithArray(seedArray);

const clients: TestClient[] = [new TestClient()];
const clients: TestClient[] = [new TestClient({ mergeTreeUseNewLengthCalculations: true })];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be concerned about losing coverage for the non-mergeTreeUseNewLengthCalculations case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't mix it, however the results of mixing it won't be worse that than the results of having it off, as without it in the rare cases this fixes, we will already have decoherence. I've turned it on here just to keep coverage, it is off for the replay tests which cover very similar cases, and on in only a couple specific regression cases in applyMsg.spec.ts.

Comment on lines +162 to +167
// ensure all clients have seen the same ops
assert.equal(
c.getCurrentSeq(),
this.clients[0].getCurrentSeq(),
`Client ${c.longClientId} current seq does not match client ${this.clients[0].longClientId}`,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any value in printing the seq numbers in the message in addition to the client ids?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't think so. this should only ever fire in test development, and signals that the test code the processes messages is incorrect. the sequence numbers themselves won't tell the developer much without debugging

Comment on lines +375 to +403
it("Local insert after acked local delete", () => {
const clients = createClientsAtInitialState(
{ initialState: "ZZ", options: { mergeTreeUseNewLengthCalculations: true } },
"A", "B", "C");

const logger = new TestClientLogger(clients.all);

let seq = 0;

const op1 = clients.C.makeOpMessage(clients.C.removeRangeLocal(0, 1), ++seq);
clients.C.applyMsg(op1);

const op2 = clients.B.makeOpMessage(clients.B.removeRangeLocal(1, 2), ++seq);

const op3 = clients.C.makeOpMessage(clients.C.insertTextLocal(0, "C"), ++seq);

const op4 = clients.B.makeOpMessage(clients.B.insertTextLocal(1, "B"), ++seq);

clients.A.applyMsg(op1);
clients.B.applyMsg(op1);

const messages = [op2, op3, op4];
while (messages.length > 0) {
const msg = messages.shift()!;
clients.all.forEach((c) => c.applyMsg(msg));
}

logger.validate();
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I understanding correctly, that in these tests we don't check against a hardcoded expected final state/string? logger.validate() only seems to check that all clients have the same end state as the first one... but do we have any guarantees that that one is right in the first place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's a correct understanding. we have other test like the replay tests that capture the state and ensure it doesn't diverge over time.

it wouldn't be a bad idea to capture the end state in these cases as we can know it and validate against it. Locally i actually have a change to the logger that takes in a base text. I'll file a bug to get these tests updated once that capability is added.
AB#1803

@anthony-murphy anthony-murphy merged commit 384adc6 into microsoft:next Sep 6, 2022
@anthony-murphy anthony-murphy deleted the next-fix-mt-remove branch September 6, 2022 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: dds Issues related to distributed data structures base: next PRs targeted against next branch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants