How to make Peritext agnostic to the underlying plain text CRDT #31

zxch3n · 2022-11-04T10:03:49Z

In the current implementation, Peritext needs a special behavior to fix an issue related to tombstones, as mentioned in the article. It requires the plain text CRDT to insert text after the tombstone with a special property (a tombstone that has an “after” anchor on it) to make the span expansion behavior intuitive.

peritext/src/micromerge.ts

Lines 775 to 800 in 89c162d

    
           if (options?.lookAfterTombstones) { 
        
               // Normally in Automerge we insert new characters before any tombstones at the insertion position. 
        
               // But when formatting is involved, we sometimes want to insert after some of the tombstones. 
        
               // We peek ahead and see if there are any tombstones that have a nonempty markOpsAfter; 
        
               // If there are, we want to put this new character after the last such tombstone. 
        
               // This ensures that if there are non-growing marks which end at this insertion position, 
        
               // this new character is inserted after the span-end. 
        
               // See the test case labeled "handles growth behavior for spans where the boundary is a tombstone" 
        
               // for a motivating exapmle of why this behavior is needed. 
        
               let elemIndex = metaIndex 
        
               let peekIndex = metaIndex + 1 
        
               let latestIndexAfterTombstone: number | undefined 
        
               while (meta[peekIndex] && meta[peekIndex].deleted) { 
        
                   if (meta[peekIndex].markOpsAfter !== undefined) { 
        
                       latestIndexAfterTombstone = peekIndex 
        
                   } 
        
                   peekIndex++ 
        
               } 
        
               if (latestIndexAfterTombstone) { 
        
                   elemIndex = latestIndexAfterTombstone 
        
               } 
        
               return meta[elemIndex].elemId 
        
           } else { 
        
               return element.elemId 
        
           }

And there still are cases that this approach cannot fix. Normally, if we have a rich text span like

<bold><link>a</link></bold>

and insert x after it, the intuitive outcome would be

<bold><link>a</link>x</bold>

However, it’s not always the case if there are tombstones. Imagine this case

<link><bold>1</bold>234</link>5

then we delete 234 and insert an x character after 1.

The image below illustrates the dilemma: no matter where we insert x, we cannot meet the requirements of making x bold and non-link.

The current Peritext implementation chooses the most intuitive position: after 4, which makes x non-bold and non-link. While in situations without tombstones, it would be bold and non-link. Thus the same view and operation to the user might produce different results in the current Peritext implementation.

The text was updated successfully, but these errors were encountered:

zxch3n · 2022-11-04T10:09:16Z

A possible solution

It still assumes the associated plain text CRDT has OpId for each character in the document

We need to fix the above issue without relying on the behavior of the underlying plain text CRDT. Instead, we may resort to style patches. The patches have the types of

interface Patch {
    type: "extend" | "shrink"
    from: OpId
    to: OpId
    targetSpan: OpId
}

If we are agnostic to the text CRDT, when the user performs insert op among a bunch of tombstones, we will lose control of where the new character ends up.

However, the above dilemma example may be fixed by some new styling patches. For example, if client A inserts x before 2, A can generate a shrink patch to make the outcome intuitive.

If client A inserts x after 4, A can generate a extend style patch to make the outcome intuitive.

Downsides

The styles of concurrent inserts inside the new op range are affected

In the below example, we generate a new remove link op. However, there might be a concurrent insert before character 4, which should have the link style. But the new delete link op will remove its style unintentionally.

So to avoid too many unintentional behaviors, we must associate the patch with a target id.

Overhead

Suppose there are n inserts, m deletes, and k styling operations. In the worst case, there will be min(n, m) * k more operations, where the k styling operations can coexist in the same position, and the user keeps tricking the system. But in a real-world scenario, the overhead should be acceptable. Because we only generate patches for the alive style ranges. And inside a series of consecutive tombstones, without new deletion, in the worst case, the algorithm will only generate one patch for each anchor inside those tombstones.

ept · 2022-11-07T11:58:19Z

Hi @zxch3n, yes, this is a known issue, and your proposal seems like a reasonable way of resolving it. We decided to accept this issue in Peritext because it's a fairly unusual edge case, it does not affect convergence, and it can easily be corrected by the user if necessary (e.g. if the user types a character at the end of a bold+link span, and the new character is neither bold nor linked, the user can easily fix up the character by making it bold manually).

Also, when I tested this case in various text editors, Google Docs and Apple Pages actually didn't grow a bold span when a bold and link span ended at the same position, even though they do grow a bold span if it doesn't coincide with an end of link. The fact that popular products do this suggests that users can live with a little bit of inconsistent behaviour.

zxch3n · 2022-11-07T14:17:11Z

@ept Thanks for your reply. You're right, the inconsistency is not an important issue. But I think if the solution works, it can offer two nice properties to the algorithm

We can use Peritext to enhance any plain text CRDT now
When applying a local operation, we can update the state immediately without consulting CRDT

zxch3n · 2023-05-15T10:25:11Z

I've implemented this idea, but its integration turns out to be too complicated to be practical.

https://github.com/loro-dev/crdt-richtext/blob/b432aa42762acce42665cc82a6d756fb28ddff95/src/legacy/README.md

josephg · 2023-05-21T20:36:13Z

Unfortunately, this issue makes Peritext as it stands unsuitable for use in diamond types. I'd also love a workable answer to this problem.

zxch3n mentioned this issue Nov 7, 2022

[LORO-139] Peritext loro-dev/loro#10

Closed

zxch3n closed this as completed May 15, 2023

guevara mentioned this issue Jan 25, 2024

Loro: Reimagine State Management with CRDTs guevara/read-it-later#10701

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make Peritext agnostic to the underlying plain text CRDT #31

How to make Peritext agnostic to the underlying plain text CRDT #31

zxch3n commented Nov 4, 2022 •

edited

zxch3n commented Nov 4, 2022 •

edited

ept commented Nov 7, 2022

zxch3n commented Nov 7, 2022

zxch3n commented May 15, 2023

josephg commented May 21, 2023

How to make Peritext agnostic to the underlying plain text CRDT #31

How to make Peritext agnostic to the underlying plain text CRDT #31

Comments

zxch3n commented Nov 4, 2022 • edited

zxch3n commented Nov 4, 2022 • edited

A possible solution

Downsides

The styles of concurrent inserts inside the new op range are affected

Overhead

ept commented Nov 7, 2022

zxch3n commented Nov 7, 2022

zxch3n commented May 15, 2023

josephg commented May 21, 2023

zxch3n commented Nov 4, 2022 •

edited

zxch3n commented Nov 4, 2022 •

edited