-
Notifications
You must be signed in to change notification settings - Fork 64
Only fetch requested ops for getOps
#68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
461b3bf
to
acbc1cb
Compare
The idea and implementation look good to me, however, I'd add some more tests to cover all the edge cases supported by |
acbc1cb
to
cdbe577
Compare
@gkubisa good call - caught a bug by expanding test coverage. I've also added some spies to check that our underlying methods are indeed only attempting to fetch a subset of ops. |
db5506c
to
b57b0dc
Compare
# Introduction At the moment, when calling `getOps`, `sharedb-mongo` actually fetches all the ops from `from` to the current version. That means that if I have a document with 1,000 versions, and I only ask for ops 0-10, we still fetch all 1,000 ops. This is clearly inefficient, and on documents with large numbers of ops, this fetch can take a long time. Anecdotally, it takes ~2s to fetch ops 0-10 of 100,000 ops. This change has an anecdotal performance increase of ~100x, now taking ~0.01s to fetch ops 0-10 of 100,000 ops. # Background We fetch all the ops from the current version, because we try to fetch a "valid" chain of operations. Consider this case: 1. I have a doc v1 2. I submit an op v2 3. I concurrently submit another op v2 4. Both concurrent ops are committed to the database before the snapshot 5. One of the commits successfully writes a snapshot 6. The other snapshot is rejected, but its op is still committed 7. I now try to get all ops. A naive `find` operation would return two ops for v2. In theory, this case should actually already be [cleaned up][1], so I can only assume that we are: - guarding against cases where the op cleanup fails - guarding against unknown inconsistencies in ops - guarding against stupid consumers mucking about in the db themselves - being very defensive In order to therefore fetch a valid set of ops, we currently therefore fetch the current snapshot, which has a reference stored to the op that created it. By looking up that op, we can then check the op that preceded _that_, and so on, forming a valid chain of ops that point to one another. The obvious downside of this is that we need to start at the current snapshot and work backwards, necessitating a fetch of all the ops up to the current version. # Changes This change attempts to allow us to only fetch the bare minimum number of ops, whilst still maintaining integrity and discarding erroneous ops. This is achieved by attempting to find the first op after our `to` op that has a unique version. The assumption here is that any op with a unique version is inherently valid, because it had no collisions. For example, consider the case where I have a document with some ops: - v1: unique - v2: unique - v3: collision 3 - v3: collision 3 - v4: collision 4 - v4: collision 4 - v5: unique - v6: unique - ... - v1000: unique If I want to fetch ops v1-v3, then we: - look up v4 - find that v4 is not unique - look up v5 - see that v5 is unique and therefore assumed valid - look backwards from v5 for a chain of valid ops This way we don't need to fetch all the ops from v5 to the current version. In the case where a valid op cannot be determined, we still fall back to fetching all ops and working backwards from the current version. # Further work Note that this change *only* affects the `getOps` method, notably not touching either `getOpsToSnapshot` or `getOpsBulk`. [1]: https://github.com/share/sharedb-mongo/blob/2d579ddb80781e987707076e932cd4e01ca066ef/index.js#L189-L193
b57b0dc
to
3df1cf5
Compare
Nate's comments from the PR review meeting:
We'll still need to take a look at the PR itself. |
The new, faster `getOps` behaviour should be fine, but there may be unforeseen corner cases where the underlying assumptions break down (eg if consumers have been meddling with ops or snapshots outside of ShareDB). In order to remain conservative, and maintain data consistency in all mainline cases, this change hides the faster behaviour behind a flag, which defaults to off. In order to make sure `sharedb-mongo` still acts correctly with this flag enabled, we re-run all the tests again with the flag enabled.
@ericyhwang I've hidden this feature behind a flag. Please let me know when you've had a chance to look through the code itself. |
README.md
Outdated
performance impact on fetching ops for documents with a large number of ops. | ||
|
||
If you need faster performance of `getOps`, you can initialise `sharedb-mongo` | ||
with the option `fasterGetOps: true`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getOpsWithoutStrictLinking: true
Then also document what "strict" linking is (ie linking back from the current snapshot).
"The default behaviour of getOps
is...,
"whereas setting this flag will do 1., 2., 3..."
index.js
Outdated
// data in the mongo database. | ||
this.allowAggregateQueries = options.allowAllQueries || options.allowAggregateQueries || false; | ||
|
||
// By default, when calling fetchOps, sharedb-mongo fetchas *all* ops, even |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fetchas -> fetches
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(also doesn't fetch "all" ops - still honours "from", but ignores "to")
README.md
Outdated
|
||
Setting this flag will use an alternative method that is much faster than the | ||
default method, but may behave incorrectly in corner cases where ops or snapshots | ||
have been manipulated outside of ShareDB (eg by setting a TTL on ops, or manually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can just throw our hands up here and say that we're not entirely sure what may cause a failure here, but because we're playing around with op integrity we should warn that Bad Things might happen by using this mode, especially if people have changed ops themselves.
We should define "behave incorrectly" and explain that we're concerned about returning ops that were not correctly linked to the canonical version. We need to guard against that because ShareDB sometimes writes more than one op (of the same version) when performing optimistic locking, so if eg if the valid op has been deleted, then we may return an invalid op.
If we incorrectly assume an op is canonical, then it may return an invalid chain.
index.js
Outdated
// To avoid this, we try to fetch the first op after 'to' which has a unique 'v', and then we | ||
// work backwards from that op using the linked op 'o' field to get a valid chain of ops. | ||
function getFirstOpWithUniqueVersion(cursor, ops, callback) { | ||
if (typeof ops === 'function') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't bother with this requirements check. Just make the first called pass in an empty array.
index.js
Outdated
&& previousVersion !== currentVersion | ||
&& previousVersion !== oneBeforePreviousVersion; | ||
|
||
if (previousVersionWasUnique) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turn this check into a function to shorten this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, we should pull this out into its own class, who has only these three objects (so that we don't keep the rest of the array of ops in memory that we don't need).
index.js
Outdated
} else { | ||
// If there's no next op to fetch, then we now know the current version is unique so | ||
// long as it doesn't match the previous version. | ||
var currentVersionIsUnique = typeof currentVersion === 'number' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also extract this into its own function for brevity.
@ericyhwang I've addressed the comments we discussed in the pull review meeting. Namely:
Could you please re-review? |
87d54e6
to
b9b7f5f
Compare
This change addresses some review comments. Namely it: - extracts complicated unique op checking into its own, tested class - renames the feature flag to `getOpsWithoutStrictLinking` - updates documentation to be more detailed about behaviour with the flag enabled and disabled
b9b7f5f
to
e76d4af
Compare
op-link-validator.js
Outdated
}; | ||
|
||
OpLinkValidator.prototype._previousVersionWasUnique = function () { | ||
return typeof this._previousVersion() === 'number' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor thing, might be worth calling this._previousVersion()
once and storing it.
Introduction
At the moment, when calling
getOps
,sharedb-mongo
actually fetchesall the ops from
from
to the current version.That means that if I have a document with 1,000 versions, and I only
ask for ops 0-10, we still fetch all 1,000 ops. This is clearly
inefficient, and on documents with large numbers of ops, this fetch can
take a long time. Anecdotally, it takes ~2s to fetch ops 0-10 of
100,000 ops.
This change has an anecdotal performance increase of ~100x, now taking
~0.01s to fetch ops 0-10 of 100,000 ops.
Background
We fetch all the ops from the current version, because we try to fetch a
"valid" chain of operations. Consider this case:
snapshot
find
operation would return twoops for v2.
In theory, this case should actually already be cleaned up, so I
can only assume that we are:
In order to therefore fetch a valid set of ops, we currently therefore
fetch the current snapshot, which has a reference stored to the op that
created it. By looking up that op, we can then check the op that
preceded that, and so on, forming a valid chain of ops that point to
one another.
The obvious downside of this is that we need to start at the current
snapshot and work backwards, necessitating a fetch of all the ops up to
the current version.
Changes
This change attempts to allow us to only fetch the bare minimum number
of ops, whilst still maintaining integrity and discarding erroneous ops.
This is achieved by attempting to find the first op after our
to
opthat has a unique version. The assumption here is that any op with a
unique version is inherently valid, because it had no collisions.
For example, consider the case where I have a document with some ops:
If I want to fetch ops v1-v3, then we:
This way we don't need to fetch all the ops from v5 to the current
version.
In the case where a valid op cannot be determined, we still fall back to
fetching all ops and working backwards from the current version.
Further work
Note that this change only affects the
getOps
method, notably nottouching either
getOpsToSnapshot
orgetOpsBulk
.