Race condition in the replication algorithm #986

bajtos · 2015-01-09T11:38:27Z

When two clients are preforming replication at the same time and the same object was changed in both clients, some of the changes may get lost. The replication algorithm should check the data before the update to verify it was not changed by another party during the replication process. This will most likely require LoopBack to support transactions for SQL datasources and [Update if Current](http://docs.mongodb.org/manual/tutorial/isolate-sequence-of-operations/#update-if-current] for MongoDB.

Another example are external changes. At the moment, there is a timer-based hook to detect any changes made by non-loopback applications and update change tracking records. If the sync algorithm is started before this timed check was run, changes made by the external writer may get discarded.

umesh0492 · 2015-01-25T19:26:13Z

Can't we handle this through remote hook?

bajtos · 2015-01-27T10:50:24Z

Can't we handle this through remote hook?

No.

Consider the following sequence of execution of two parallel requests updating the same record (model instance):

1. R1 request arrives
2. R1 remote hook performs the check, all is ok
3. R2 requestarrives
4. R2 remote hook performs the check, all is ok
5. R1 runs the update operation
7. R1 completes
6. R2 runs the update operation, R1 changes are discarded
8. R2 completes

violet-day · 2015-01-28T01:00:20Z

@bajtos How about version key?
Each row has a version key,once update just increase the key.Before save to db,we check the key,if not match,throw exception.
It's implemented in mongooe.
versionKey
mongoose-v3-part-1-versioning

bajtos · 2015-01-28T09:03:21Z

@violet-day Yes, I was thinking about implementing something similar. We already have a checksum of the data, this property is even better than version number.

The trick is to design and implement a database-agnostic API in juggler that can be mapped to an atomic operation in all SQL and NoSQL databases (connectors).

A possible solution is to use updateAll, assuming that all connectors are fixed to correctly report the number of updated rows and that we can embed the Change object inside the Model data (*).

Model.updateAll(
  { id: modelId, rev: baseRev }, // update only if the rev (checksum) was not changed
  { id: modelId, rev: newRev, /* the properties to update */ },
  function(err, updatedCount) {
    if (err) return cb(err);

    if (updatedCount === 1) return cb(); // success

    // conflict, the database was updated under our hands
    // restart the replication algorithm
  }
);

This is still not a complete solution. If a non-loopback client modifies the database after we calculated baseRev and before we run the query above, then such change will be lost.

(*) Embedding the Change object inside the Model data may not be possible in the situation where the user wants to enable change tracking on top of an existing (legacy) database where he cannot alter the existing tables. I guess we don't have to support this scenario in the first version.

bajtos · 2015-03-04T17:48:16Z

For v1.0, I would like to implement the following solution, which should be bullet-proof and support all existing replication use-cases, although at the cost of lower performance. Later in the future, we can implement optimised variants.

The diff step will return the current revisions of the data in the database (what we think we have) for each delta.
The instructions for "bulkUpdate" will include this "baseline" revision.
The implementation of "bulkUpdate" will update a record only if it has the same revision as sent in the instructions.

The initial implementation of the last step (unoptimised but generic) for a single model instance:

Query the database to get the current instance "data" via Model.findById(id)
Compute the current revision of the model instance from "data"
If the computed revision does not match the revision from bulkUpdate, then fail with a conflict.
Run "updateAll", using "data" as the where-filter and the values from bulkUpdate instructions as the values to set.
If updateAll modified one row, then we are done. Otherwise fail with a conflict, because somebody has updated the instance under our hands.

When a conflict is detected, we should probably run "rectify change" for the conflicted model.

In many cases, it may be possible to automatically repeat the bulkUpdate() and/or replicate() until it passes. However, this will be probably omitted from the initial implementation for v1.0.

@ritch @raymondfeng Thoughts?

I have also one meta question for you. Should we run Operation hooks for the DAO calls made from bulkUpdate? I am inclining to disable these hooks, or at least the "before save" hook, because if they modify the data being saved (typically change the "modified" property to the current date time), then the replicated data will be different from the original.

Another question is how to deal with ACLs, I have opened a new issue for that: #1166

bajtos · 2015-03-04T17:54:18Z

The implementation outlined in my previous comment depends on #1167.

bajtos · 2015-03-09T18:11:15Z

Timeboxing the story to 5 points, I'll focus on fixing the race condition and leaving out the UX/ease-of-use for a follow-up story if I run out of time.

bajtos · 2015-03-12T07:47:58Z

Run "updateAll", using "data" as the where-filter and the values from bulkUpdate instructions as the values to set.

This still leaves a small hole, because the "data" where-filter checks only that no existing properties were modified, it cannot ensure that no new properties were added. This issue should hopefully apply only to non-strict mode where users can add arbitrary dynamic properties to model instances. Depending on how the MongoDB connector persists and queries undefined properties, it may be affected too.

bajtos added bug #fib-13 labels Jan 9, 2015

bajtos self-assigned this Jan 9, 2015

bajtos added this to the #Epic: Offline Sync V1 milestone Jan 9, 2015

bajtos mentioned this issue Jan 9, 2015

Replication issues #712

Closed

chandadharap added #sprint63 and removed #sprint62 labels Jan 13, 2015

chandadharap added #plan and removed #sprint63 labels Jan 22, 2015

bajtos added #tob and removed #plan labels Jan 22, 2015

chandadharap removed the #tob label Feb 5, 2015

bajtos added the #tob label Feb 23, 2015

cgole added #plan and removed #tob labels Mar 3, 2015

bajtos removed the #fib-13 label Mar 4, 2015

chandadharap added #sprint66 and removed #plan labels Mar 10, 2015

bajtos added #wip and removed #sprint66 labels Mar 12, 2015

This was referenced Mar 16, 2015

Reject CREATE with a duplicate id loopbackio/loopback-datasource-juggler#510

Merged

Detect 3rd-party changes made during replication #1214

Merged

bajtos closed this as completed in #1214 Mar 20, 2015

bajtos removed the #wip label Mar 20, 2015

bajtos mentioned this issue Nov 9, 2016

High-level REST API for replication #987

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in the replication algorithm #986

Race condition in the replication algorithm #986

bajtos commented Jan 9, 2015

umesh0492 commented Jan 25, 2015

bajtos commented Jan 27, 2015

violet-day commented Jan 28, 2015

bajtos commented Jan 28, 2015

bajtos commented Mar 4, 2015

bajtos commented Mar 4, 2015

bajtos commented Mar 9, 2015

bajtos commented Mar 12, 2015

Race condition in the replication algorithm #986

Race condition in the replication algorithm #986

Comments

bajtos commented Jan 9, 2015

umesh0492 commented Jan 25, 2015

bajtos commented Jan 27, 2015

violet-day commented Jan 28, 2015

bajtos commented Jan 28, 2015

bajtos commented Mar 4, 2015

bajtos commented Mar 4, 2015

bajtos commented Mar 9, 2015

bajtos commented Mar 12, 2015