-
Notifications
You must be signed in to change notification settings - Fork 575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Write up Sync details #5
Comments
In the current documentation you mention using a sync engine with the current synchronization primitives. Did you have specific ones in mind? I'd like to see if i can get something working |
@luisgregson that might be difficult for you to figure out before we push documentation and some helper functions for it, but just in short:
Again, there will be a few helper functions so that it's straightforward. |
@radex As long as the documentation is not there I'd abuse this thread to ask a few additional questions:
|
Yes, it tracks which columns were changed since the last sync. If you do: record.update(() => {
record.text = 'abc'
record.fooBar = 10 // the @field does `record._setRaw('foo_bar', 10)` under the hood
}) Watermelon will update This is so that if you have a conflict (server says that it has an updated record R1, but local R1's status is also updated), you can take the server's version and only replace the columns that were changed locally. After sync (after you send the local versions of records to the server), you're supposed to set the status to 'synced' and reset '_changed' to ''. If the status is 'created', then changes are not tracked (since the whole record is going to be new to the database).
Using sanitizedRaw in sync code is fine! Agreed, a bunch of helper functions would be useful (for finding all the records to sync, resolving the conflicts automatically, etc.)
I think we're doing the same thing. I agree it's not ideal. But I don't know all the details. I think this is partly an artifact of the backend system we use. @Stanley knows more about this |
Hi, @radex , thanks a lot for the project! Very timely. I will also abuse this thread for some clarification questions. =) Maybe you could help clarify the following: Thanks a lot for your help. |
Sorry for the confusion: the synsStatus property is a |
Thanks a lot. |
Do you guys have sample implementations? I would rather want to see a live code if possible :) |
Not yet from me. @theVasyl, @fletling did you get the sync to work on your side? Do you have code snippets to share maybe? :) |
Working with @fletling and @theVasyl here. We're currently struggling on how to get all deleted records from a specific table? Is it possible to include deleted items in the result when querying against a specific table?
does not work because _status != deleted is always added to the underlying query. |
#5 (comment) — you can only get the IDs of the deleted records. That should be enough, since… well… they're deleted, so you only want to delete them on the server, correct? Or is there a specific reason why you need the full details of the record marked as deleted? |
@radex Thank you for your help! Adapted our prototype sync workflow to use |
I guess this is just used for sync, so I didn't think that it's necessary to expose it to Collection. And if you're asking why it just returns IDs — we figured this is enough, will be more performant, and there were some concerns about caching/consistency, but I don't remember the details. I have to dig through internal documentation and open-source that too |
Yes, we want to use it for sync as well. Currently looking into building a generic sync mechanism for all our models. And how to pass the model names. That's the reason for the question. Yes, understood the reasoning behind IDs from the previous responses. |
The current blocker is how to Any ideas? |
Why do you need that? |
The current setup is a local WatermelonDB containing all the projects for a given user. Syncing all projects at once would take too long and use too much data. So trying to make the sync functionality work project wise. This seems to work except for the deletion part. |
I can't promise this is going to work correctly all of the time, as it really wasn't designed to do so, but I think this workaround might work: const query = collection.query(Q.where(xxx), Q.where('_status', 'deleted'))
query.description = query._rawDescription
const records = await query.fetch() Let me know if this works. It might be easier if you sync deletions all at once, or if you fetch data for sync all at once but just send it in batches... |
Thanks! |
Right! But this is because |
In the case of a conflict between server and client, do I correctly infer from what is above that my code will be able to determine what to do? (For contrast, and to explain why I ask, GunDB—last time I looked at it—seemed more concerned with having a perfectly deterministic resolution algorithm than with allowing the developer to control the resolution and determine which data is worth preserving.) For example, if I wanted to display both sets of data to the end user and ask them to choose what to keep (or combine data into a third option), would that be possible? |
Yes, it's up to you. The information you have during sync is:
So you could hook up UI to show both versions somehow and let user decide. The easiest resolution scheme is to use server version, and apply fields changed locally since last sync from local version. |
@fletling @theVasyl @sebastian-schlecht @brandondrew et al: I'm beginning the work of rewriting our sync implementation, open sourcing it as part of 🍉, and have it be essentially self-documenting sample implementation for others to follow. Check it out here: #142 . This is very early, almost no code there, but you can follow along, see the proposed API, and the rough procedure of sync. I'd be happy to hear your comments! |
Alright, during the past weeks we (@theVasyl , @sebastian-schlecht, @fletling and others) have built a sync engine for our app on top of WatermelonDB and we’re very close to releasing it to our production app. In the following I’m going to share some insights how we did it and what we learned. Our app:Pave is an issue tracking app for construction site managers and architects. In terms of the data model it is basically a todo app. The problem:
How we envision the architecture of a sync engine:
What we currently do at Pave:
class Ticket extends SyncableModel {
static table = Tables.tickets;
static syncConfig = {
read: {
operationName: 'tickets',
query: `// GraphQL query string goes here`,
// creates input variables object for query
variables: async ({
additionalVariables,
since,
convertLocalToRemoteId,
}: ReadVariablesInput) => ({
input: {
projectId: await convertLocalToRemoteId(additionalVariables.projectId),
withDeleted: true,
since,
},
}),
},
create: {
operationName: 'createTicket',
mutation: `// GraphQL mutation goes here`,
// creates input variables object for create mutation
// Here we transform the local record to the remote representation
variables: async ({
record,
uncleanRecord,
projectId,
convertLocalToRemoteId,
}: CreateVariablesInput<SyncableModel>) => {
// [...]
};
},
},
// ... same also for update / delete mutations
mapRemoteToRawLocal: async (...) => {
// [...]
// Transform remote representation to raw local record
},
// ...
const unsyncedRecords = !!additionalQuery
? await this.query(
Q.where(columnName('_status'), Q.notEq('synced')),
additionalQuery,
).fetch()
: await this.query(
Q.where(columnName('_status'), Q.notEq('synced')),
).fetch();
// this is a hack from Radek see https://github.com/Nozbe/WatermelonDB/issues/5
// the !!additionalQuery? is necessary because with the hack watermelondb throws an error when no additionalQuery is provided
const queryForDeleted = !!additionalQuery
? // $FlowFixMe
this.query(Q.where(columnName('_status'), 'deleted'), additionalQuery)
: this.query(Q.where(columnName('_status'), 'deleted'));
queryForDeleted.description = queryForDeleted._rawDescription;
const deletedRecords = await queryForDeleted.fetch(); And then the signature of the sync function is
class Attachment extends SyncableModel {
static syncConfig = {
// [...]
mapRemoteToRawLocal: async ({
cleanRemote,
convertRemoteToLocalId,
projectId,
}: MapRemoteToRawLocalInput<SyncableModel>) => {
// [...]
rawLocal.file = await fileSync.sync(cleanRemote.file.url, projectId)
// [...]
};
}
} The rest of the syncengine doesn't even need to be aware of the notion of files. Some resources that we found quite helpful while building this:
DX of WatermelonDB in particular with regards to syncing:
So, this was quite a long post. I hope it helps, I’m available for questions and of course I’d be the most interested in your feedback on the concept that we came up with and how to further improve it. @radex I haven’t looked at your pull request that you mentioned above yet but will do that some point in time during the next days. |
@fletling whoa, nice post. It will take me a while to digest all this and respond, since your needs are a lot different than ours (for us, no GraphQL, no existing stuff to support, and a preference for full-database, not per-collection sync). But this will help, so thank you. |
@fletling will you be able to find some time this week to take a look at #142 and give us feedback about the proposed API and sketch of the sync algorithm? I can already see that your case is specific enough you might not be able to use the "standard 🍉 sync", but that's also why I could use your feedback a) why isn't it possible for you to use per-database sync, or at least something closer to it but this GraphSQL abstraction, b) how can we structure the standard sync adapter implementation so that it's easier for people with special cases to reuse as much standard 🍉 code as possible |
also, before I fully reply to your note from last week, I have one question:
Why? Why can't you use the same IDs locally as on the server? This is what we've been always doing, and I assumed there's no reason to treat local IDs separately |
Sure! I left some comments in particular regarding the two questions a) and b) in here:
Theoretically it should be possible but let me describe some edge cases: See for instance https://softwareengineering.stackexchange.com/questions/287163/generation-of-ids-in-offline-online-application and https://softwareengineering.stackexchange.com/questions/236309/strategy-for-generating-unique-and-secure-identifiers-for-use-in-a-sometimes-of for some discussion around this topic. Also this here is helpful: https://tech.trello.com/sync-two-id-problem/ Apart from that there’s also security considerations about generating IDs client side - on which I’m not an expert so I’d refer to some discussion thread here again: https://stackoverflow.com/questions/105034/create-guid-uuid-in-javascript?noredirect=1&lq=1 and https://stackoverflow.com/questions/1296234/is-there-any-danger-to-creating-uuid-in-javascript-client-side |
Why would you make that assumption? It seems to me that:
If I'm wrong about 2, then 3 disappears. But isn't # 1 good enough? Aren't the odds higher that all your servers will be simultaneously stolen by criminals or hit by lightning? I might be missing an important point. If so, please clarify for me. Thanks! |
I read through the links you posted and I remain unconvinced that this is necessary. At Nozbe, we've been using 16-character client-generated random IDs with sync for 10 years, and we haven't ever detected conflicting IDs. Currently, 🍉 generates IDs that have 8e24 possibilities. So if you have 10M teams, each of which has 100K records (1T = 1e12), the probability of one collision (one in a trillion!), which would cause a single sync error or a single record data loss is (if I'm calculating correctly)… about 6%. OK, that's actually higher than I expected, but we're talking one in a freaking trillion here. And if that's not enough:
TL;DR: Unless you're building a nuclear power facility, the extremely tiny improvement in data safety by eliminating the possibility of ID conflicts is completely outweighed by the risk of a lot of additional complexity. PS. Regarding safety, I don't see this as relevant to sync. You always have to ensure if a user can create a type of record on the server. IDs are no different here |
Hey guys, you do an amazing job! I can see the Sync feature is about to be released, and some PR's are already merged. There are some docs have been written related to Sync (in PR), but I would ask you to make it more extensive and comprehensive and more examples would nice to get. Looking forward for the final Sync release. |
@servocoder have you seen the docs in this PR: https://github.com/Nozbe/WatermelonDB/blob/cafe5421981693a0d9b5ca051b3eb271bfc41711/docs/Advanced/Sync.md ? Those are a little bit more written up. If you have specific suggestions — please comment on that PR. Otherwise, this is more or less what we'll ship and we expect contributions to improve it more! |
Why the
This allows the sync to work, but when I try to create a new entity in a manner
If this is by design then I am curious what is correct way to access/invoke model action directly? |
I would expect to invoke model action like this:
or
or any other way at this point I could not get how can I access model action |
Actions are necessary to ensure safety. Because of asynchronicity, if you have database writes depending on database reads, and something else is happening simultaneously, Bad Things™ can happen. TL;DR: Only one write action must happen at the same time.
Will be written up in more detail shortly. But:
or
|
Am I correct in assuming this sync strategy will only work if you were to pull "all" of the data available to the user in an app? I've been researching for solutions to a chat based app i'm building that needs to function offline but work in realtime when connected. For the majority of apps discussed using WatermelonDB it seems feasible that when an existing user first logs into the apps that you could fetch all data for the user at launch, but for a chat app that data could potentially be hundreds of threads and thousands upon thousands of messages over a user's lifetime. Are there any alternative ideas floating around that don't involve a flow like this where the first sync involves fetching "everything"? |
@radex this works like a charm:
Concerning the second option that uses
But this looks odd in case I would like to define all my C(R)UD operations inside of model actions. In order to create a new user I have to get another User model instance first. Am I missing something? |
@radex What is the correct way to break/invalidate push process? Let's say in case of a server is down. All changes will be considered as resolved and never be synced. I can throw an error, but is there a more graceful way than
|
Correct, all data is pulled.
Right. A cache for a real-time chat app is not the primary use case for Watermelon. We're not planning to develop a partial synchronization/caching scheme ourselves, but:
|
@servocoder
Correct, this is meant to be used on Model instance methods. If you have global actions (such as creating users — if users don't belong to any other record), you can define functions any way you like and use
This is correct. Throwing an error in an async function is the same as rejecting a promise. If |
No description provided.
The text was updated successfully, but these errors were encountered: