Only fetch requested ops for `getOps` #68

alecgibson · 2018-08-15T15:46:24Z

Introduction

At the moment, when calling getOps, sharedb-mongo actually fetches
all the ops from from to the current version.

That means that if I have a document with 1,000 versions, and I only
ask for ops 0-10, we still fetch all 1,000 ops. This is clearly
inefficient, and on documents with large numbers of ops, this fetch can
take a long time. Anecdotally, it takes ~2s to fetch ops 0-10 of
100,000 ops.

This change has an anecdotal performance increase of ~100x, now taking
~0.01s to fetch ops 0-10 of 100,000 ops.

Background

We fetch all the ops from the current version, because we try to fetch a
"valid" chain of operations. Consider this case:

I have a doc v1
I submit an op v2
I concurrently submit another op v2
Both concurrent ops are committed to the database before the
snapshot
One of the commits successfully writes a snapshot
The other snapshot is rejected, but its op is still committed
I now try to get all ops. A naive find operation would return two
ops for v2.

In theory, this case should actually already be cleaned up, so I
can only assume that we are:

guarding against cases where the op cleanup fails
guarding against unknown inconsistencies in ops
guarding against stupid consumers mucking about in the db themselves
being very defensive

In order to therefore fetch a valid set of ops, we currently therefore
fetch the current snapshot, which has a reference stored to the op that
created it. By looking up that op, we can then check the op that
preceded that, and so on, forming a valid chain of ops that point to
one another.

The obvious downside of this is that we need to start at the current
snapshot and work backwards, necessitating a fetch of all the ops up to
the current version.

Changes

This change attempts to allow us to only fetch the bare minimum number
of ops, whilst still maintaining integrity and discarding erroneous ops.

This is achieved by attempting to find the first op after our to op
that has a unique version. The assumption here is that any op with a
unique version is inherently valid, because it had no collisions.

For example, consider the case where I have a document with some ops:

v1: unique
v2: unique
v3: collision 3
v3: collision 3
v4: collision 4
v4: collision 4
v5: unique
v6: unique
...
v1000: unique

If I want to fetch ops v1-v3, then we:

look up v4
find that v4 is not unique
look up v5
see that v5 is unique and therefore assumed valid
look backwards from v5 for a chain of valid ops

This way we don't need to fetch all the ops from v5 to the current
version.

In the case where a valid op cannot be determined, we still fall back to
fetching all ops and working backwards from the current version.

Further work

Note that this change only affects the getOps method, notably not
touching either getOpsToSnapshot or getOpsBulk.

coveralls · 2018-08-15T15:52:14Z

Coverage increased (+0.4%) to 93.458% when pulling 31004ec on reedsy:get-fewer-ops into 87d76b3 on share:master.

gkubisa · 2018-08-15T18:56:15Z

The idea and implementation look good to me, however, I'd add some more tests to cover all the edge cases supported by _getOpLink.

alecgibson · 2018-08-16T16:18:21Z

@gkubisa good call - caught a bug by expanding test coverage. I've also added some spies to check that our underlying methods are indeed only attempting to fetch a subset of ops.

# Introduction At the moment, when calling `getOps`, `sharedb-mongo` actually fetches all the ops from `from` to the current version. That means that if I have a document with 1,000 versions, and I only ask for ops 0-10, we still fetch all 1,000 ops. This is clearly inefficient, and on documents with large numbers of ops, this fetch can take a long time. Anecdotally, it takes ~2s to fetch ops 0-10 of 100,000 ops. This change has an anecdotal performance increase of ~100x, now taking ~0.01s to fetch ops 0-10 of 100,000 ops. # Background We fetch all the ops from the current version, because we try to fetch a "valid" chain of operations. Consider this case: 1. I have a doc v1 2. I submit an op v2 3. I concurrently submit another op v2 4. Both concurrent ops are committed to the database before the snapshot 5. One of the commits successfully writes a snapshot 6. The other snapshot is rejected, but its op is still committed 7. I now try to get all ops. A naive `find` operation would return two ops for v2. In theory, this case should actually already be [cleaned up][1], so I can only assume that we are: - guarding against cases where the op cleanup fails - guarding against unknown inconsistencies in ops - guarding against stupid consumers mucking about in the db themselves - being very defensive In order to therefore fetch a valid set of ops, we currently therefore fetch the current snapshot, which has a reference stored to the op that created it. By looking up that op, we can then check the op that preceded _that_, and so on, forming a valid chain of ops that point to one another. The obvious downside of this is that we need to start at the current snapshot and work backwards, necessitating a fetch of all the ops up to the current version. # Changes This change attempts to allow us to only fetch the bare minimum number of ops, whilst still maintaining integrity and discarding erroneous ops. This is achieved by attempting to find the first op after our `to` op that has a unique version. The assumption here is that any op with a unique version is inherently valid, because it had no collisions. For example, consider the case where I have a document with some ops: - v1: unique - v2: unique - v3: collision 3 - v3: collision 3 - v4: collision 4 - v4: collision 4 - v5: unique - v6: unique - ... - v1000: unique If I want to fetch ops v1-v3, then we: - look up v4 - find that v4 is not unique - look up v5 - see that v5 is unique and therefore assumed valid - look backwards from v5 for a chain of valid ops This way we don't need to fetch all the ops from v5 to the current version. In the case where a valid op cannot be determined, we still fall back to fetching all ops and working backwards from the current version. # Further work Note that this change *only* affects the `getOps` method, notably not touching either `getOpsToSnapshot` or `getOpsBulk`. [1]: https://github.com/share/sharedb-mongo/blob/2d579ddb80781e987707076e932cd4e01ca066ef/index.js#L189-L193

ericyhwang · 2018-09-05T16:59:02Z

Nate's comments from the PR review meeting:

He has a concern with an edge case around Mongo's (lack of?) guarantees about cross-collection consistency and about ops potentially getting removed by TTLs
Flagging it and having it off by default for now is fine for now

We'll still need to take a look at the PR itself.

The new, faster `getOps` behaviour should be fine, but there may be unforeseen corner cases where the underlying assumptions break down (eg if consumers have been meddling with ops or snapshots outside of ShareDB). In order to remain conservative, and maintain data consistency in all mainline cases, this change hides the faster behaviour behind a flag, which defaults to off. In order to make sure `sharedb-mongo` still acts correctly with this flag enabled, we re-run all the tests again with the flag enabled.

alecgibson · 2018-09-06T08:42:40Z

@ericyhwang I've hidden this feature behind a flag. Please let me know when you've had a chance to look through the code itself.

alecgibson · 2018-09-12T16:33:08Z

README.md

+performance impact on fetching ops for documents with a large number of ops.
+
+If you need faster performance of `getOps`, you can initialise `sharedb-mongo`
+with the option `fasterGetOps: true`.


getOpsWithoutStrictLinking: true

Then also document what "strict" linking is (ie linking back from the current snapshot).

"The default behaviour of getOps is...,

"whereas setting this flag will do 1., 2., 3..."

alecgibson · 2018-09-12T16:38:08Z

index.js

  // data in the mongo database.
  this.allowAggregateQueries = options.allowAllQueries || options.allowAggregateQueries || false;

+  // By default, when calling fetchOps, sharedb-mongo fetchas *all* ops, even


fetchas -> fetches

(also doesn't fetch "all" ops - still honours "from", but ignores "to")

alecgibson · 2018-09-12T16:49:35Z

README.md

+
+Setting this flag will use an alternative method that is much faster than the
+default method, but may behave incorrectly in corner cases where ops or snapshots
+have been manipulated outside of ShareDB (eg by setting a TTL on ops, or manually


We can just throw our hands up here and say that we're not entirely sure what may cause a failure here, but because we're playing around with op integrity we should warn that Bad Things might happen by using this mode, especially if people have changed ops themselves.

We should define "behave incorrectly" and explain that we're concerned about returning ops that were not correctly linked to the canonical version. We need to guard against that because ShareDB sometimes writes more than one op (of the same version) when performing optimistic locking, so if eg if the valid op has been deleted, then we may return an invalid op.

If we incorrectly assume an op is canonical, then it may return an invalid chain.

alecgibson · 2018-09-12T16:58:34Z

index.js

+// To avoid this, we try to fetch the first op after 'to' which has a unique 'v', and then we
+// work backwards from that op using the linked op 'o' field to get a valid chain of ops.
+function getFirstOpWithUniqueVersion(cursor, ops, callback) {
+  if (typeof ops === 'function') {


Don't bother with this requirements check. Just make the first called pass in an empty array.

alecgibson · 2018-09-12T17:00:24Z

index.js

+    && previousVersion !== currentVersion
+    && previousVersion !== oneBeforePreviousVersion;
+
+  if (previousVersionWasUnique) {


Turn this check into a function to shorten this function.

Alternatively, we should pull this out into its own class, who has only these three objects (so that we don't keep the rest of the array of ops in memory that we don't need).

alecgibson · 2018-09-12T17:00:39Z

index.js

+    } else {
+      // If there's no next op to fetch, then we now know the current version is unique so
+      // long as it doesn't match the previous version.
+      var currentVersionIsUnique = typeof currentVersion === 'number'


Could also extract this into its own function for brevity.

alecgibson · 2018-09-14T15:51:03Z

@ericyhwang I've addressed the comments we discussed in the pull review meeting. Namely:

extract complicated unique op checking into its own, tested class
rename the feature flag to getOpsWithoutStrictLinking
update documentation to be more detailed about behaviour with the
flag enabled and disabled

Could you please re-review?

This change addresses some review comments. Namely it: - extracts complicated unique op checking into its own, tested class - renames the feature flag to `getOpsWithoutStrictLinking` - updates documentation to be more detailed about behaviour with the flag enabled and disabled

ericyhwang · 2018-09-19T16:12:21Z

op-link-validator.js

+};
+
+OpLinkValidator.prototype._previousVersionWasUnique = function () {
+  return typeof this._previousVersion() === 'number'


Minor thing, might be worth calling this._previousVersion() once and storing it.

ericyhwang · 2018-09-19T16:52:22Z

sharedb-mongo@1.0.0-beta.7

alecgibson force-pushed the get-fewer-ops branch from 461b3bf to acbc1cb Compare August 15, 2018 15:51

alecgibson force-pushed the get-fewer-ops branch from acbc1cb to cdbe577 Compare August 16, 2018 16:17

alecgibson force-pushed the get-fewer-ops branch 3 times, most recently from db5506c to b57b0dc Compare August 17, 2018 09:26

alecgibson force-pushed the get-fewer-ops branch from b57b0dc to 3df1cf5 Compare August 17, 2018 09:47

alecgibson commented Sep 12, 2018

View reviewed changes

.gitignore .vscode

08c8b39

alecgibson force-pushed the get-fewer-ops branch from 87d54e6 to b9b7f5f Compare September 19, 2018 11:08

alecgibson force-pushed the get-fewer-ops branch from b9b7f5f to e76d4af Compare September 19, 2018 11:11

ericyhwang approved these changes Sep 19, 2018

View reviewed changes

Call _previousVersion() once when checking uniqueness

31004ec

ericyhwang merged commit 1c91b46 into share:master Sep 19, 2018

alecgibson mentioned this pull request May 18, 2021

Optimization opportunity: use "to" in getOps query #61

Closed

Only fetch requested ops for getOps #68

Only fetch requested ops for getOps #68

Uh oh!

Conversation

alecgibson commented Aug 15, 2018

Introduction

Background

Changes

Further work

Uh oh!

coveralls commented Aug 15, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gkubisa commented Aug 15, 2018

Uh oh!

alecgibson commented Aug 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ericyhwang commented Sep 5, 2018

Uh oh!

alecgibson commented Sep 6, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alecgibson commented Sep 14, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ericyhwang commented Sep 19, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Only fetch requested ops for `getOps` #68

Only fetch requested ops for `getOps` #68

coveralls commented Aug 15, 2018 •

edited

Loading

alecgibson commented Aug 16, 2018 •

edited

Loading