-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmarks and comparison to SyncedStore discussion #1
Comments
Hi @bogobogo! Thanks for your detailed feedback! 👍 I was too thinking of using fluid framework a while ago, but last time I saw their communication protocol is much more complicated than yjs. I agreed we should stick with raw js API interfaces whenever possible as it will generally last longer than a specific crdt implementation. I have known about SyncedStore (which is a great project anyway). The main reason I created this library is because I have not much confidence in the mutation-based API used in large react projects. From the experience of using valtio (a mutation-based state library) from my previous project, everything is great and clean at the beginning when the project is small, but once I have to pass down the sub-state proxies to sub-compoents deep in the tree, I quickly get lost what is mutable and what isn't. Also mutation-based API cannot guarantee the atomicity of changes (in valtio it automatically batch updates in a black box way, which may lead to conflicts with react rendering, I am not sure what SyncedStore does regarding this). I feel like mutations should be controlled in a manageable maner, in the corner where they are really needed and stay just there, not by default be exposed to the outer world, at least in the react land this should be more compatible. About the array set operation, that's a good catch 👍. I don't really want to ban the usage of replacing array element by index as it is a common operation. Configuration is a doable option, another option will be to print a warning message in dev environment when it detects such a usage (but that's annoying too). Another reason I want to allow this even it has strange merge behavior is that I don't think conflict can be avoided in the state API level anyway. E.g. we could easily have stale state just using insert & delete by index, if other users are also inserting/deleting by the same index concurrently. I'm not sure, the best way would probably be solving it in yjs itself. Regarding the sharing granularity, I understand what you mean, I am just not sure what the API would be like, and how the nested object would be stored in yjs if it is not itself a Y.AbstractType. For example if I store a plain object I feel like this could be solved by a hook API, which allows the user to hook into the YEvents & immer patches handling process and do whatever they want. Then if needed, a schema API could be implemented on top of the hook API, providing sensible defaults for untracked nested object type, Y.Text, Y.XmlElement, etc,. I am absolutely open to PR, but I would like to discuss the API first, what do you think? About performance, the mutation side should be in par with immer itself, plus an extra round of iterating the patches generated by immer. The reading side should be in par with plain object, since the snapshot is just a plain object after all. But that's just my rough guesses, we definitely need tests on that as well. I am glad to hear your feedback! |
Thanks @sep2 for the quick and detailed response. Mutability and ImmutabilityI see what you are saying regarding mutability and react. I have personally had positive experiences with mobx based state solutions. That being said, It does lead to confusions. Especially in hook dependencies, where sometimes two objects are considered the same even though in an immutable solution they won't. ArrayRegarding array set, I totally agree with you that this should be solved in yjs. Schema and Granularity
The answer is yes. While yjs supports nesting CRDT's, you don't have to. you can have a simple yjs map at the root and huge plain nested objects as values (and in some cases that would actually make sense). API DiscussionI agree that starting with a hook makes sense. Let's take a simple example const todosById = {
someTodoId: {
title: "groceries",
completed: false
}
} We have a map of todos. Each todo is itself a map. So todosById should be a shared map, but each todo should be a plain object. In the current solution, both will be shared, which may lead to bad UX (and in some cases, corrupt states). API Ideas
Option 1The simplest idea is to add a configuration object to bind(initialDoc, { applyPatch: (target, patch) => boolean }
While this will enable what we want, this feels like a feature that is at the core of the idea behind immer-yjs. It will also be quite complex to implement for the developer. Option 2Define schema - figure out where each change falls on the schema, and then update based on that. This is my dream solution as a developer. import bind, {SharedMap, SharedArray} from 'immer-yjs'
const schema = {
todosById: new SharedMap(),
todosOrder: new SharedArray()
}
bind(initialDoc, schema} Both And that's it. immer-yjs will figure out what is the relevant root CRDT for each patch, and update that behind the scenes. I believe this can be achieved using the Would love to hear your thoughts! |
Update 2: After reconsidered the problem, I am convinced that Option 2 should be the way to go. Update 1: Option 1 is already implemented with a slightly variated version, please see readme for detail. Thanks for the clarification of storing plain object in y.js. API DiscussionIt seems to me that option 1 would be trivially enough to implement. Once this is implemented, the option 2 could be implemented as follow: // with-schema.ts
function withSchema(schema, options) {
const applyPatch = (target, patch) => {
// apply the patch with schema
}
return {
...options,
applyPatch
}
}
// usage:
bind(initialDoc, withSchema(/* schema */)) The benefits of this approach
Schema formatAbout the schema format, I would propose a rule-out schema: const schema = [
'todosById.*'
'someKey.nestedKey.otherkey'
]
bind(initialDoc, withSchema(schema))
The pros
The cons
This is just my initial thought, it contains obvious flaws. ConclusionI would like to add option 1 to the next release, this should unblock the impementation of a I'm sorry due to the complexity of schema parsing & validating & testing & maintainance burden, I probaby won't add this function to the library itself unless there are more users requesting it so that it outweighs the efforts. That being said, later I (or others) could probably share a sample implementation of Will this solve your needs? Please don't hesitate to share comments. |
@bogobogo After reconsidered the problem, I am convinced that Option 2 should be the way to go. If you would like to submit a PR to implement Option 2, I will be happy to accept it. My only concern is that the schema should be serializable, so it can be easily integrated into the sync protocol. Option 1 is too lower-level that developers should not concern about, it should probably be removed from the API. And the rule-out schema proposal I posted above is way more counterintuitive to use. |
Hey @sep2, I will try to implement Option 2 at some point in the future but it might take some time since I have quite a lot on my plate right now. I agree with making schemas serialisable, and I don't think it will be an issue. An explicit schema as I defined above could be very easily described in a json like object. One option is that the In any case, I believe there is an opportunity here to create a solid, production grade solution. |
A third option might be to do data marshalling, ie embedding the type information in the immer data. For example DynamoDB has a special "DynamoDB JSON" which describes Maps, Lists, Strings, Numbers etc.
The pros would be that it's schemaless, cons that drilling down into an object is a bit more verbose (eg edit: This seems to make adding optional schemas even easier too. It should solve #4 out of the box. I will continue down this route, here is my WIP: https://codesandbox.io/s/immer-yjs-marshalled-t4bids |
For the second option, would it be best to use a subset of JSON schema to describe the data shape? It would be great to be able to use existing data valadation libs and typescript-to-json-schema tools. data = {
"plainStringField": "I'm a plain string",
"sharedTextField": "I'm a CRDT string"
}
schema = {
"type": "object",
"properties": {
"plainStringField": { "type": "string" },
"sharedTextField": { "yType": "YText" }
}
} edit: This seems to work OK with some tweaking, I have a WIP test of schema parsing here: https://codesandbox.io/s/immer-yjs-schema-demo-oblgnh |
I have done a bit of work on a marshalled data fork and now both reading and writing work properly, note yarray/ytext/yxml aren't implemented yet. Since type information is now being stored in immer, it should be possible to automatically generate a schema (as long as type information is provided when data is first added). I am excited about this! |
This is so cool! Sorry, I'm too busy with other things to catch up on this issue. In fact, I encourage forking this library and modifying it as needed, since this is really a small codebase. |
|
Hey @sep2, Thanks for making immer-yjs!
I have been thinking about making something similar and I am glad someone is already doing that :) (in my ideal world, one could even swap yjs with fluid framework or whatever, but thats for another time)
I have been playing with SyncedStore and I wonder how the two solutions compare in developer experience and performance. Some of my initial thoughts:
Developer Experience Advantages
immer-yjs has a plain object snapshot and translates immer patches to yjs, while syncedstore uses yjs as the store, and creates proxies around them. This means that reading the store is easy, no annoying proxies and cloning issues.
immer-yjs is also opinionated in the sense that an array set calls the yjs splice method by default - which might be not ideal in a collaborative setting. See my discussion here for reference. In any case, that is something that could potentially be configured in the future.
Wrapping everything around a produce plays nicely with transactions, which could be useful with the yjs support for transaction based undo/redo.
Sharing Granularity
The main issue I currently see with the "magic solution" is the crdt granularity. In most cases, you don't want the whole nested object to be composed of nested crdts, as this might create undesirable states. This is solved in SyncedStore with the Boxed values. This works but kind of annoying - you need to constantly wrap objects with a Box and then call
.value
on everything.The ideal solution, in my opinion, is defining a schema and then updating the crdt's based on the schema. Are you open to a PR that adds support for something like that?
Performance
How does immer patches and maintaining the snapshot compare to the proxy based solution of synced store? I wonder what would be a good test for that.
Would love to hear your opinion about this, and if you are accepting PR's
The text was updated successfully, but these errors were encountered: