Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to make Peritext agnostic to the underlying plain text CRDT #31

Closed
zxch3n opened this issue Nov 4, 2022 · 5 comments
Closed

How to make Peritext agnostic to the underlying plain text CRDT #31

zxch3n opened this issue Nov 4, 2022 · 5 comments

Comments

@zxch3n
Copy link

zxch3n commented Nov 4, 2022

In the current implementation, Peritext needs a special behavior to fix an issue related to tombstones, as mentioned in the article. It requires the plain text CRDT to insert text after the tombstone with a special property (a tombstone that has an “after” anchor on it) to make the span expansion behavior intuitive.

peritext/src/micromerge.ts

Lines 775 to 800 in 89c162d

if (options?.lookAfterTombstones) {
// Normally in Automerge we insert new characters before any tombstones at the insertion position.
// But when formatting is involved, we sometimes want to insert after some of the tombstones.
// We peek ahead and see if there are any tombstones that have a nonempty markOpsAfter;
// If there are, we want to put this new character after the last such tombstone.
// This ensures that if there are non-growing marks which end at this insertion position,
// this new character is inserted after the span-end.
// See the test case labeled "handles growth behavior for spans where the boundary is a tombstone"
// for a motivating exapmle of why this behavior is needed.
let elemIndex = metaIndex
let peekIndex = metaIndex + 1
let latestIndexAfterTombstone: number | undefined
while (meta[peekIndex] && meta[peekIndex].deleted) {
if (meta[peekIndex].markOpsAfter !== undefined) {
latestIndexAfterTombstone = peekIndex
}
peekIndex++
}
if (latestIndexAfterTombstone) {
elemIndex = latestIndexAfterTombstone
}
return meta[elemIndex].elemId
} else {
return element.elemId
}

And there still are cases that this approach cannot fix. Normally, if we have a rich text span like

<bold><link>a</link></bold>

and insert x after it, the intuitive outcome would be

<bold><link>a</link>x</bold>

However, it’s not always the case if there are tombstones. Imagine this case

<link><bold>1</bold>234</link>5

then we delete 234 and insert an x character after 1.

The image below illustrates the dilemma: no matter where we insert x, we cannot meet the requirements of making x bold and non-link.

image

The current Peritext implementation chooses the most intuitive position: after 4, which makes x non-bold and non-link. While in situations without tombstones, it would be bold and non-link. Thus the same view and operation to the user might produce different results in the current Peritext implementation.

@zxch3n
Copy link
Author

zxch3n commented Nov 4, 2022

A possible solution

It still assumes the associated plain text CRDT has OpId for each character in the document

We need to fix the above issue without relying on the behavior of the underlying plain text CRDT. Instead, we may resort to style patches. The patches have the types of

interface Patch {
    type: "extend" | "shrink"
    from: OpId
    to: OpId
    targetSpan: OpId
}

If we are agnostic to the text CRDT, when the user performs insert op among a bunch of tombstones, we will lose control of where the new character ends up.

image

However, the above dilemma example may be fixed by some new styling patches. For example, if client A inserts x before 2, A can generate a shrink patch to make the outcome intuitive.

image

If client A inserts x after 4, A can generate a extend style patch to make the outcome intuitive.

image

Downsides

The styles of concurrent inserts inside the new op range are affected

In the below example, we generate a new remove link op. However, there might be a concurrent insert before character 4, which should have the link style. But the new delete link op will remove its style unintentionally.

image

image

So to avoid too many unintentional behaviors, we must associate the patch with a target id.

Overhead

Suppose there are n inserts, m deletes, and k styling operations. In the worst case, there will be min(n, m) * k more operations, where the k styling operations can coexist in the same position, and the user keeps tricking the system. But in a real-world scenario, the overhead should be acceptable. Because we only generate patches for the alive style ranges. And inside a series of consecutive tombstones, without new deletion, in the worst case, the algorithm will only generate one patch for each anchor inside those tombstones.

@ept
Copy link
Collaborator

ept commented Nov 7, 2022

Hi @zxch3n, yes, this is a known issue, and your proposal seems like a reasonable way of resolving it. We decided to accept this issue in Peritext because it's a fairly unusual edge case, it does not affect convergence, and it can easily be corrected by the user if necessary (e.g. if the user types a character at the end of a bold+link span, and the new character is neither bold nor linked, the user can easily fix up the character by making it bold manually).

Also, when I tested this case in various text editors, Google Docs and Apple Pages actually didn't grow a bold span when a bold and link span ended at the same position, even though they do grow a bold span if it doesn't coincide with an end of link. The fact that popular products do this suggests that users can live with a little bit of inconsistent behaviour.

@zxch3n
Copy link
Author

zxch3n commented Nov 7, 2022

@ept Thanks for your reply. You're right, the inconsistency is not an important issue. But I think if the solution works, it can offer two nice properties to the algorithm

  1. We can use Peritext to enhance any plain text CRDT now
  2. When applying a local operation, we can update the state immediately without consulting CRDT

@zxch3n
Copy link
Author

zxch3n commented May 15, 2023

I've implemented this idea, but its integration turns out to be too complicated to be practical.

https://github.com/loro-dev/crdt-richtext/blob/b432aa42762acce42665cc82a6d756fb28ddff95/src/legacy/README.md

@zxch3n zxch3n closed this as completed May 15, 2023
@josephg
Copy link

josephg commented May 21, 2023

Unfortunately, this issue makes Peritext as it stands unsuitable for use in diamond types. I'd also love a workable answer to this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants