Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full index rebuild on first pull after index creation #601

Closed
aboodman opened this issue Sep 30, 2021 · 0 comments · Fixed by #624
Closed

Full index rebuild on first pull after index creation #601

aboodman opened this issue Sep 30, 2021 · 0 comments · Fixed by #624

Comments

@aboodman
Copy link
Contributor

aboodman commented Sep 30, 2021

The first time we pull after creating an index, that index will be fully rebuilt. The relevant code is here:

https://github.com/rocicorp/replicache/blob/main/src/sync/pull.ts#L162

The reason is somewhat subtle. Imagine the situation in which we have just created an index, then pull:

S1(lmid=0) - M1 - CI1 - main
    \ S2(lmid=1) - sync

If we simply apply the patch from S2 to S1 and commit, then the index created by CI1 will be lost completely. It's not defined yet at S1.

However, we also cannot just take the index state from CI1 and apply the patch from S2 because the patch at S2 is with respect to S1, not CI1. In the simple case where S2 includes only the changes from M1 then this might appear to work because applying the patch from S2 to the index state at CI1 will result in no visible change. However imagine the case where the authoritative db computed a different result for M1 -- for example, maybe it ignored this change. In that case, the index state at CI1 reflects M1 but the patch from S2 does not. If we reuse the index state from CI1 then we will exit this pull with the index having entries in it that the main map does not.

To avoid this issue what we do now is simply rebuild all the indexes described by M1 on top of S1 before applying the patch from S2. Our index creation code ignores duplicate index definitions (https://github.com/rocicorp/replicache/blob/main/src/db/write.ts#L209) so in practice this means we will build indexes introduced between S1 and main, but not other indexes.

Anyway, this is kind of a bummer. I believe that we can now fix this by:

  1. start with the index state at main.
  2. compute a diff from main to snapshot.
  3. modify the index state to reflect each difference

Our diff computation is currently slow but this should still be faster than recreating the index from scratch, and once #596 is implemented it will get even faster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant