Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues related to Removed Segments in MergeTree #9834

Closed

Conversation

anthony-murphy
Copy link
Contributor

@anthony-murphy anthony-murphy commented Apr 11, 2022

Handling removed segments in MergeTree was incorrect, specifically when to consider them in the case of conflict. This case was complicated by the fact that clients can have different removed segments depending on their lifetime. For instance, a new client loading from summary could have no removed segments, while long running clients could have many and different removed segments yet to be cleaned up by zamboni.

Previously there was logic added to treat the length of some deleted segments as undefined. This change was made as any segment of length 0 is considered when resolving a conflict, and this allowed the merge tree to differentiate removed segments from segments with sequence numbers greater than the reference sequence number of some incoming operation. This improved the situation, as it allowed us to exclude those segments from conflict resolution.

However, the heuristic to determine when to consider a removed segment for conflict resolution or not was incorrect, and that is what is improved upon here. Specifically, removed segments who's removed sequence number in within the collab window, are now consider for conflict resolution, and those outside the collab window are not. One key point is we do not consider the removed sequence number itself in conflict resolution, we only consider the sequences insert sequence number as segment order has always been determined by insert order which is an invariant we cannot break.

This works because all clients, regardless of lifetime must agree on the state of all segments in the collab window, as these segments are not yet finalized in the sense that they could be above or below the reference sequence number for other clients, so all clients must be able to reason over them.

In addition, i've unified the handling of UnassignedSequenceNumbers such this ensure breakTime, localNetLength, and nodeLength can share handling. this ensures consistency between the 3, which are all integral to the conflict resolution logic of the merge tree.

Lastly this change brings in some new unit tests that would previously fail, and does a bit of unit test refactoring to make them more consistent.

fixes #9703
related #1009 AB#45

@anthony-murphy anthony-murphy requested a review from a team as a code owner April 11, 2022 22:08
@github-actions github-actions bot added the area: dds Issues related to distributed data structures label Apr 11, 2022
@github-actions github-actions bot added the base: main PRs targeted against main branch label Apr 11, 2022
@anthony-murphy anthony-murphy requested a review from a team April 11, 2022 22:15
@anthony-murphy anthony-murphy requested a review from a team as a code owner April 11, 2022 22:40
@github-actions github-actions bot added the public api change Changes to a public API label Apr 11, 2022
if(removedSeq > this.collabWindow.minSeq) {
return 0;
}
return undefined;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment explaining why this is undefined outside the collab window?

@@ -153,6 +153,12 @@ export class TestClientLogger {
this.clients.forEach(
(c) => {
if (c === this.clients[0]) { return; }
// ensure all clients have see the same ops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"seen"?

*/
function normalizeSequenceNumberIfUnassigned(...seqs: (number | undefined)[]): number[] {
return seqs.map((v,i)=>isUnassigned(v) ? Number.MAX_SAFE_INTEGER - i : v ?? 0);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less surgical, but possibly you could avoid normalization by using an incrementing UnassignedSequenceNumber when assigning?

Something like...

const FirstUnassignedSequenceNumber = Number.MAX_SAFE_INTEGER / 2;

private nextUnassignedSequenceNumber = FirstUnassignedSequenceNumber;

segment.seq = this.nextUnassignedSequenceNumber++;
segment.removedSeq = this.nextUnassignedSequenceNumber++;

(Just an idea, not a request...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be a bit more readable if "normalize" took arguments in increasing order of seq# (e.g., [insertSeq, removeSeq]) rather than the other way around. (It took me a while to catch on that this is why '- i' worked.)

The last time I measured, destructuring arrays was slower than destructuring objects with named fields. I don't think it would be measurable, but will mention this alternative anyway in case it's of interest:

const { seq0: insertSeq, seq1: removeSeq } = normalize(/* seq0: / insertSeq, / seq1: */ removeSeq);

Copy link
Contributor

@DLehenbauer DLehenbauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I only have a couple superficial comments of little importance.

@anthony-murphy anthony-murphy requested a review from a team as a code owner April 14, 2022 16:51
@anthony-murphy anthony-murphy marked this pull request as draft April 14, 2022 16:52
@anthony-murphy
Copy link
Contributor Author

moving to draft while i fix test issues

@anthony-murphy
Copy link
Contributor Author

anthony-murphy commented Apr 15, 2022

just a quick update here. I figured out the test issue, and it is directly related the order of deleted segment is the undo redo case, specifically inserts can't move past removes if they are above the min seq as they used to. Moving past deleted segments is an incorrect behavior that undo redo depended on. before this can go in we need to complete #1009

@Abe27342
Copy link
Contributor

Abe27342 commented May 27, 2022

FYI: I think I'm near a working branch with fuzz tests passing (at least for first many seeds... :P). One of the remaining failures is due to merge tree structure and somewhat related to this. We should chat about it once you're back in office, but here are some details.

First, the fuzz test JSON:

[
    {
        "type": "addText",
        "index": 0,
        "content": "as",
        "stringId": "A"
    },
    {
        "type": "removeRange",
        "start": 0,
        "end": 1,
        "stringId": "A"
    },
    {
        "type": "addText",
        "index": 0,
        "content": "zxcvbnmZX",
        "stringId": "C"
    },
    {
        "type": "addText",
        "index": 0,
        "content": "qwertyuiop",
        "stringId": "B"
    },
    {
        "type": "removeRange",
        "start": 6,
        "end": 8,
        "stringId": "B"
    },
    {
        "type": "removeRange",
        "start": 0,
        "end": 1,
        "stringId": "A"
    },
    {
        "type": "addText",
        "index": 5,
        "content": "12345",
        "stringId": "B"
    },
    {
        "type": "addInterval",
        "start": 9,
        "end": 12,
        "collectionName": "comments",
        "stringId": "B",
        "id": "91d2ec3c-c120-4c12-8bfe-d8bf645acfd0"
    },
    {
        "type": "removeRange",
        "start": 7,
        "end": 13,
        "stringId": "B"
    },
    {
        "type": "synchronize"
    }
]

High level, what's going wrong is that client B ends up with a different ordering between its removed segments done in the 2nd to last operation and the segment inserted by client C. Specifically, at the time the addInterval op is being processed, the three client states look like this:

image

(parentheses here indicate the segment is removed; 0,1,2 correspond to clients A,B,C).

This is a problem because the interval endpoint added to B is associated with the "op" segment, so when it slides on client B it ends up at the end of the "zxcvbnmZX" segment whereas when it slides on the others it ends up on the start.

This difference in structure happens because when B is processing C's addText op, the hierarchical structure of its merge tree with removed segments causes it to insert it at a different place w.r.t removed segments: at that time the segments it has are ['qwert', '12', '(345)', '(y)', '(ui)', '(op)', '(a)', 's']
and they're organized into two merge blocks with a break at the "y".

I should note that this is of course with a fair number of local changes on my branch. One relevant one is that I adjusted Ransom's approach to move intervals to have offset 0 locally pre-ack due to other observed issues, one of which is outlined here.

Edit: I think I have a reasonable fix. The nodeMap call within rightExcursion on the insert walk needs to consider locally removed segments to have a consistent conflict resolution scheme. Working on coming up with the cleanest way to make that happen.

Edit2: Fix in #10499

@anthony-murphy
Copy link
Contributor Author

anthony-murphy commented Sep 6, 2022

closed in favor of #11678

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: contributor experience area: dds Issues related to distributed data structures base: main PRs targeted against main branch public api change Changes to a public API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Inconsistent shared string after pausing connection
4 participants