Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

watch() #839

Closed
aboodman opened this issue Feb 13, 2022 · 24 comments
Closed

watch() #839

aboodman opened this issue Feb 13, 2022 · 24 comments
Milestone

Comments

@aboodman
Copy link
Contributor

aboodman commented Feb 13, 2022

One of our users maintains state outside of Replicache (in a separate indexeddb instance) similar to #497 proposes. This is done to implement full text search. However it's not possible to maintain this state efficiently because we hide the raw change notification (key-added, key-modified, key-removed) from users and only make the results of subscribe available.

@aboodman
Copy link
Contributor Author

@arv
Copy link
Contributor

arv commented Feb 14, 2022

Seems easy enough to implement. The only question is if we want to expose a single callback that takes all the changes or if we want to call the callback mutliple times with a single change each time. Gut feeling is to call with all the changes as an array.

@aboodman
Copy link
Contributor Author

I guess a single callback would be best for performance reasons. Maybe resurrect onChange and have it receive a Patch ?

@aboodman
Copy link
Contributor Author

aboodman commented Mar 7, 2022

I think it is going to be important for correctness reasons for this callback to receive as a parameter the previous rootHash and the new rootHash. Users that are trying to track Replicache state in some other store (e.g., another idb) are going to need to store the newRootHash in their store (transactionally) then compare the incoming (prevRootHash) to the one they have stored to ensure they are applying the change against the correct state.

@aboodman
Copy link
Contributor Author

aboodman commented Mar 7, 2022

So conceptually, what this API looks like to me is:

function onChange(from: Hash, to: Hash, patch: Patch) {

}

I am not sure if this API should be synchronous or asynchronous with respect to the causing Replicache transaction. I can imagine pros and cons both ways.

@aboodman aboodman added this to the R3: Replicache Ready Release milestone Mar 7, 2022
@tmcw
Copy link
Contributor

tmcw commented Apr 4, 2022

Frankly I'd be interested in this as an alternative to scan(). If I'm running a single scan on startup then updating that datastructure in response to change notifications, it seems feasible to get performance relative to the amount of data changing, rather than the total amount of data. As it stands, large datasets (40k features, about 250 bytes each) don't scan quickly enough to be updated interactively.

@aboodman
Copy link
Contributor Author

aboodman commented Apr 6, 2022

@tmcw's use case is similar except that caller is really only interested in a subset of the keyspace.

Also, I think that since there will eventually need to be a variant of this API that runs inside the transaction, so I've renamed it to make usage more clear (analagous to onbeforeulnoad vs onunload).

Updated API strawperson:

class Replicache {
  onAfterCommit(prefix: string, handler AfterCommitHandler): Unwatcher;
  // future: onBeforeCommit(prefix: string, handler: BeforeCommitHandler): Unwatcher;
}

// from and to can be used to keep separate storage in sync
type AfterCommitHandler = (patch: Patch, from: Hash, to: Hash) => void;
type Unwatcher = () => void;

// future:
type BeforeCommitHandler = (tx: WriteTransaction, patch: Patch, from: Hash, to: Hash) => Promise<void>;

Some thoughts:

  1. This must fire after a transaction has committed to memdb. We can't have this fire, have apps update in-memory state, then have tx fail.
  2. I think in the future there will also be need for a similar API but that runs before transaction commits, for maintaining user-defined indexes. That one we could name onBeforeCommit, and it would also receive a tx.
  3. I have the patch entirely in-memory rather than being an async iterator. I figure that commits big enough to be a problem for memory should probably be split into separate commits anyway. Maybe that constraint is not realistic. Unsure.
  4. Higher-level things like useScan() that return an array can be built on top of this.

@ChadBurggraf
Copy link

Is there any potential for issues if AfterCommitHandler is not reentrant and is async? Probably not because the environment is single-threaded, but just pointing it out since I expect most implementations to actually be (patch: Patch) => Promise<void>.

@aboodman
Copy link
Contributor Author

aboodman commented Apr 6, 2022

Thanks for chiming in chad,

Is there any potential for issues if AfterCommitHandler is not reentrant and is async

Yes, in that case the implementation will have to enforce mutual exclusion of AfterCommitHandler. Perhaps using https://github.com/rocicorp/lock.

I expect most implementations to actually be (patch: Patch) => Promise.

I get why your implementation will be async (because you're trying to maintain some separate persistent state), but I think something like @tmcw's use case would actually be more common use of this API.

For maintaining computed persistent data like you want to, it's going to be difficult to do correctly outside of Replicache because of lack of transactionality. That's why I originally had the hashes in the API in #839 (comment). (I'll add them to the latest proposal now). Because that way you can record in your separate storage the hash the change is based on. If you get an update from some other hash, you'll know you missed something and should rebuild.

I imagine the right way to do what you are trying to do is really with a combination of #497 and the variant of this API I referred to in #839 (comment):

I think in the future there will also be need for a similar API but that runs before transaction commits, for maintaining user-defined indexes. That one we could name onBeforeCommit, and it would also receive a tx.

With these two APIs together you could maintain your computed index transactionally within Replicache itself. Used correctly this would ensure your computed state always moved transactionally with the rest of Replicache.

@aboodman
Copy link
Contributor Author

aboodman commented Apr 6, 2022

Sorry for the edit storm on this bug -- I had forgotten the history. I've edited the initial bug description as well as my latest comments and I think they should make more sense now.

@KeKs0r
Copy link

KeKs0r commented Apr 11, 2022

I might misread the API, but it seems it does not expose the previous value in case of a change event.
Or is that something that can be retrieved via the hash?
Background is the search index. The library I am using can only remove an element from the index with all the right vaues. So in case of a change, I would need to remove the document with all old values and then add the new document with the correct values.

Not 100% sure how that would be achieved with this api.

@aboodman
Copy link
Contributor Author

The Patch type includes the previous value in the case of deletions and modifications.

arv added a commit that referenced this issue Apr 20, 2022
Instead of having a `string[]` with the changed keys we now use
`DiffResult<ReadonlyJSONValue>[]` all the way.

TODO: This does not rename things yet.

Towards #839
arv added a commit that referenced this issue Apr 20, 2022
Instead of having a `string[]` with the changed keys we now use
`DiffResult<ReadonlyJSONValue>[]` all the way.

TODO: This does not rename things yet.

Towards #839
@arv
Copy link
Contributor

arv commented Apr 20, 2022

The Patch type includes the previous value in the case of deletions and modifications.

We need a new type here. Patch is for describing how to change the state. We are describing how the state changed.

Strawman:

type DiffOperation = {
  readonly op: 'add';
  readonly key: string;
  readonly newValue: ReadonlyJSONValue;
} |
  readonly op: 'del';
  readonly key: string;
} |
  readonly op: 'change';
  readonly key: string;
  readonly newValue: ReadonlyJSONValue;
  readonly oldValue: ReadonlyJSONValue;
};
type Diff = readonly DiffOperation[];

arv added a commit that referenced this issue Apr 20, 2022
arv added a commit that referenced this issue Apr 20, 2022
arv added a commit that referenced this issue Apr 20, 2022
This is a first implementation of onAfterCommit. It takes a shortcut and does
handle prefixes at all. It is up to the consumer to deal with those.

Towards #839
@arv
Copy link
Contributor

arv commented Apr 20, 2022

class Replicache {
  onAfterCommit(prefix: string, handler AfterCommitHandler): Unwatcher;
  // future: onBeforeCommit(prefix: string, handler: BeforeCommitHandler): Unwatcher;
}

onAfterCommit is not the right name. We should have a verb, similarly to subscribe. onX implies that you should assign a handler:

x.onFoo = () => { ... };

@aboodman
Copy link
Contributor Author

aboodman commented Apr 20, 2022

OK let's just go with watch then. I really like that name, it's a verb, it's simple and snappy. The only downside is that when we eventually add the ability to watch before commit, there's a chance of developers confusing the two cases. But we'll cross that bridge when we get there (and maybe just fix with docs).

@aboodman aboodman changed the title RFE: Expose raw change notifications (key-added, key-changed, key-removed) watch() Apr 20, 2022
arv added a commit that referenced this issue Apr 21, 2022
Now you can get called after a commit is complete and the arguments
passed to the watch function includes a diff of the changes.

Closes #839
arv added a commit that referenced this issue Apr 21, 2022
Now you can get called after a commit is complete and the arguments
passed to the watch function includes a diff of the changes.

Closes #839
arv added a commit that referenced this issue Apr 21, 2022
This is a first implementation of onAfterCommit. It takes a shortcut and does
handle prefixes at all. It is up to the consumer to deal with those.

Towards #839
arv added a commit that referenced this issue Apr 21, 2022
Now you can get called after a commit is complete and the arguments
passed to the watch function includes a diff of the changes.

Closes #839
arv added a commit that referenced this issue Apr 21, 2022
Now you can get called after a commit is complete and the arguments
passed to the watch function includes a diff of the changes.

Closes #839
@arv arv closed this as completed Apr 28, 2022
@tslocke
Copy link

tslocke commented May 9, 2022

Will there be a way for the watch callback to know if the changes are entirely local, vs. include pulled changes?

The reason this would be useful for me - I'm using SolidJS, and one of the key benefits, is that reactivity happens in a very fine-grained way. A change to a sub-field, or even a sub-sub-sub-field can flow from UI action, to the reactive store, to a single DOM property (e.g. a text value or element class name) being directly updated, with no diffing or rebuilding of other dom (or vdom) nodes.

With Replicache, real-time reactivity is at the granularity of top-level key/values. If the values are objects, you don't know which sub-fields changed (you could deep-diff, but that's not so cheap). This is fine for reactivity over the network, but I wouldn't want to lose the very focussed updates for entirely local activity. Hence I'd like to 'watch' for network changes, but ignore local-only changes.

I understand there is a complication: when a pull happens, Replicache will immediately replay any non-pushed mutations on top of the new data, so the changes passed to watch are not simply "remote" or "local". But it seems like it would be straightforward to pass an argument that indicates "includes pulled changes" vs "entirely local changes".

I guess I could maintain a flag like that myself: e.g.didPull = true in a custom puller, and then reset to false in my watch callback. But I think this would be a nice addition to the API, if it's not available already.

@aboodman
Copy link
Contributor Author

aboodman commented May 18, 2022

Sorry we forgot to respond do this.

I understand the benefit of SolidJS, but still haven't had time to play with it extensively so I don't have it in my head as intuitively as I would like.

However, ignoring an update from watch violates a key assumption of Replicache: that the UI always reflects the client view. Replicache is a reactive system with a unidirectional dataflow. The idea is you make changes by executing a mutation which modifies the client view which fires a change event which directly renders the client view. By shortcutting the client view and modifying the UI you create a situation where the UI and client view can diverge indefinitely. This doesn't seem like a good thing to me.

I get the desire for fine-grained updates. As Replicache is right now, the only way to achieve that is by diffing the values. However object identity is typically preserved in child fields so you can do a quick diff that way.

There is probably a way we could extend Replicache in the future to provide fine-grained diffs, even over pull, for systems such as Solid. For example Firebase and Mongo have APIs analagous to set() that take subtree paths to update. Something like that could be used to track fine-grained changes. But that's a different bug.

@tslocke
Copy link

tslocke commented May 18, 2022

I don't think my app breaks this assumption. After a mutation there will be a poke/pull and the Solid store will be patched to align with the Replicache client view. I agree they could diverge for the duration of the mutate/poke/pull cycle, but the same is true in a normal Replicache app if the client and server implementations of a mutation differ.

Yes I have to get the code to patch the Solid store right, but only once - it's not code that grows with app complexity. For this extra effort I get two big benefits: very fast/fine-grained local updates, and synchronous access to the UI state. (I also consume more RAM of course).

I think this need for more than one layer of state is hard to avoid. In the Discord I see people are debouncing keystrokes, and using things like TipTap/ProseMirror, which are inherently stateful.

Unidirectional is kind of a fiction - in reality there is memory, IndexedDB, the server and the database. Yes, if this fiction is maintained by very reliable library code, then it becomes reality from the perspective of my application code, which is a huge win. I definitely get that. But if one layer of my app breaks the rules and then restores the fiction, I can still think unidirectionally in the higher layers. My current take is that the extra burden of correctness is worth it, for the mentioned benefits, but I could well be wrong!

@aboodman
Copy link
Contributor Author

Yes I have to get the code to patch the Solid store right, but only once - it's not code that grows with app complexity.

This is the part I don't understand. I'm not even sure it is possible to patch the UI state correctly in the general case.

Consider a component containing an unchecked checkbox. I tap the checkbox. Solid state updates and the component re-renders with the checkbox checked. Asynchronously, your app's code runs a mutation that flips the bit in replicache. Replicache sends the mutation to the server. When the mutation runs against the server, say it fails. Perhaps the database is down or, maybe there's a bug on the server, or maybe there's business logic that says that actually this check change is not valid.

When Replicache later pulls, it will get a diff that contains zero changes, because there have been zero changes since the last pull. Replicache will fork from the last snapshot, apply the diff that has no changes, and discard the optimistic mutation that flipped the checkbox.

Replicache now has the correct state -- the checkbox is unchecked, which matches the server. But in the UI the checkbox is still checked. What causes the checkbox in the UI to get fixed up?

@aboodman
Copy link
Contributor Author

What causes the checkbox in the UI to get fixed up?

Nevermind, this part was incorrect. Replicache will fire a change because of the local state change from checked to unchecked.

This idea still makes me nervous but let me see if I can come up with a good example why.

@aboodman
Copy link
Contributor Author

aboodman commented May 20, 2022

So the assumption here is that the solid state always gets updated immediately in the same way that the mutation firing the subscription/watch would later cause the UI to get updated. Any violation to this assumption will cause problems. Here's one:

You have an <input type="number"> in your app set to the value "0". The user enters "1". The solid state changes to "1" and the component renders. The UI initiates a mutation in Replicache to change the value to "1", but before it begins, a sync from the server comes in that sets the value to "2". The pull response is processed, the client view updates to 2, an event is fired, the solid state is changed to 2, and the UI re-renders at "2". Now the original mutation runs and sets the client view to "1". An event is fired with the "localChangesOnly" bit so the diff is ignored. Now the UI has the value "2" but the client view has the value "1". This discrepancy continues until that UI component is destroyed. This can happen today in Replicache because mutations are async and although they typically run immediately, they can yield (e.g., to read from IDB).

@aboodman
Copy link
Contributor Author

I know you really want those sweet fine-grained solid updates. I get it and want that too, but I think for correctness likely needs to be designed more rigorously and probably implemented inside Replicache.

You can create more granular updates on a case-by-case basis for now by breaking down entities into multiple keys. But I don't recommend going crazy and making every field its own key.

@tslocke
Copy link

tslocke commented May 20, 2022

An event is fired with the "localChangesOnly" bit so the diff is ignored. Now the UI has the value "2" but the client view has the value "1". This discrepancy continues until that UI component is destroyed.

The mutation that got interrupted (text = "1") completes after the pull with "2". So the poke/pull triggered by it also comes after. A watch event with localChangesOnly false happens, it differs from what Solid has ("2") so the UI is set back to "1". Right?

@tslocke
Copy link

tslocke commented May 20, 2022

Ahh hang on, I'm wrong

A watch event with localChangesOnly false happens

No it doesn't, because the client cache has "1" and the pull has "1", so no change.

OK so for the purposes of this feature-request, I agree: the example you give shows how the localChangesOnly flag is a bad idea.

However my current implementation uses the puller hook, not watch, so it's OK I think. Well, with this example at least. After every pull, for each key touched by the pull I compare what's in replicache to what's in the solid store, and apply any differences to solid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants