Normalize merge-tree's segment order on rebase of an insert op#11946
Conversation
| }), | ||
| ); | ||
|
|
||
| const newOrder = [...consecutiveLocalSegments, ...adjacentRemovedSegs]; |
There was a problem hiding this comment.
this seems off. i don't think removed segs will always just come after local segs
There was a problem hiding this comment.
also, this doesn't seem idempotent, like what happens if there are multiple local inserts?
would it make more sense to just rebase everything at once in some kind of fix up pass before regenerate.
There was a problem hiding this comment.
Pushed an approach which I think is nicer that's more along these lines. I'm no longer seeing any fuzz failures on the reconnect tests for interval collection if I modify the operations to not include intervals (so it's just basic add/remove text), whereas previously there was a ~2.5% failure rate over 1k tests, with all of those failures fundamentally reducing to this issue of mismatched segment order causing inconsistent conflict resolution.
I was mistaken in Standup the other day, fixing up the callback invocation on local references doesn't fix interval collections stuff. But the core of that problem is that the reconnect strategy used for interval collection requires rebasing character positions from a previous localSeq, which normalizing the segment order upon reconnect completely messes up. I think a reasonable enough way to solve that might be for SharedSegmentSequence to give interval collection a chance to query the underlying merge-tree pre-normalization, and then IntervalCollection's pending state tracking can store the character position + refSeq/localSeq combos it cares about.
I also like the idea to modify merge-tree reconnect farm to repro this issue, I'll do that in this PR before checkin.
…to merge-tree-reconnect-normalize
|
a thought i had on this area, could we augment the reconnect farm to hit it? that test is pretty simplistic right now. |
…, simplifying rebase strategy along the way. Also re-link tracking groups and check that in the test
Changing it to have both a client that always disconnects on application and a different one that only sometimes disconnects seems to suffice to make it hit the issue. There are still some failures on the long version of that test after doing so, but ~80% less (and not on new tests), so I'm reasonably comfortable checking this in. |
⯅ @fluid-example/bundle-size-tests: +7.94 KB
Baseline commit: 151ffe3 |
| let op: IMergeTreeOp | undefined; | ||
| if (len === 0 || len < minLength) { | ||
| const text = client.longClientId!.repeat(random.integer(1, 3)(mt)); | ||
| if (len === 0 || len < minLength || (len < minLength * 3 && random.bool(1 / 3))) { |
There was a problem hiding this comment.
why add the insert condition? can't the test just add an insert operation if it wants it?
There was a problem hiding this comment.
this is just meant to be a fallback case, that handles empty/min len, as we usually want data in the tree.
There was a problem hiding this comment.
yeah, that seems better. I'll convert tests that seem like they should have an insert op to inject one in config. The lack of inserts was one reason the reconnect farm didn't repro this issue (text was only inserted when the string dipped below min length, so it was pretty unlikely to have a lot of adjacent surviving segments that would end up in a different order on rebase)
anthony-murphy
left a comment
There was a problem hiding this comment.
a couple small comments, but otherwise looks good.
Description
This fixes an eventual consistency issue exposed by fuzz testing.
The core problem is that clients that are reconnecting may end up in a state in which they don't resolve tiebreaks of inserted segments the same way as other clients.
This is because the tiebreaking logic makes assumptions about the relative ordering of inserted/removed segments in the collab window.
Specifically, in the absence of reconnect it's never the case that a segment inserted at seq A appears immediately after a segment removed at seq B, where:
(In this case, the insertion logic would have placed the inserted segment before the removed segment on all clients).
However, this invariant was no longer true with reconnect due to rebasing ops.
The regression test demonstrates a case where this happens.
The proposed fix implemented in this PR is to have the merge-tree update its internal state when one of its pending segment groups is rebased to instead be in terms of the current sequence number.
AB#1675