Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Doc.prototype.destroy #204

Merged
merged 8 commits into from
Jul 23, 2018
Merged

Fix Doc.prototype.destroy #204

merged 8 commits into from
Jul 23, 2018

Conversation

gkubisa
Copy link
Contributor

@gkubisa gkubisa commented Apr 18, 2018

It fixes the issue where a document is re-added to a collection after calling destroy, causing a memory leak.

It should also fix #161.

Re not waiting for the unsubscribe callback in destroy, I think it is not necessary because:

  • unsubscribe cannot fail on the server side (as far as I see),
  • unsubscribe is performed automatically on disconnect,
  • the doc is marked as unsubscribed immediately on the client side.

The problem was that unsubscribe re-added the doc to the connection.
Now the doc is removed from the connection after unsubscribe.
Additionally, we're no longer waiting for the unsubscribe response
before executing the callback. It is consistent with Query,
unsubscribe can't fail anyway and the subscribed state is updated
synchronously on the client side.
@coveralls
Copy link

coveralls commented Apr 18, 2018

Coverage Status

Coverage decreased (-0.2%) to 96.274% when pulling 5e009d1 on Teamwork:fix-doc-destroy into 68bde00 on share:master.

@@ -104,10 +104,8 @@ emitter.mixin(Doc);
Doc.prototype.destroy = function(callback) {
var doc = this;
doc.whenNothingPending(function() {
if (doc.wantSubscribe) doc.unsubscribe();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the curly brace style is more clear, but it's a rather pedantic point.

Any way to add tests for this?

@gkubisa
Copy link
Contributor Author

gkubisa commented Apr 19, 2018

Thanks for the review @curran.

I added the braces as you suggested - I also prefer this style but tried to be consistent with the rest of the code base, which very often omits braces.

I also added an assertion to an existing test which failed before my changes and now it passes.

@gkubisa
Copy link
Contributor Author

gkubisa commented Jun 11, 2018

@nateps @ericyhwang , could you provide some feedback on this PR, or merge it, if it's all good, please?

@ericyhwang
Copy link
Contributor

We've got our ~monthly Share PR review meeting tomorrow, so we'll take a look at this and the other couple PRs during the meeting!

(I'll also ask about perhaps switching to shorter, more frequent review sessions, since that'll be easier all around if Nate's schedule can now accommodate it.)

@nateps
Copy link
Contributor

nateps commented Jun 19, 2018

Great catch!

I agree that it isn't 100% clear whether we need to wait for the unsubscribe to happen, and we might get away with calling the callback before the unsubscribe is fully effective. As you mentioned, the client calls back when it is disconnected.

Here is the reasoning behind that: Calling back on disconnect is needed to ensure that the callback to unsubscribe is always called. (The unsubscribe callback may be called after acknowledgement from the sever if we are connected, in a nextTick if we are disconnected, or at the time of disconnection.) If the client is disconnected, the server won't be able to send response messages and it will clean up the server-side agent responsible for the client. The client will get a new agent if it reconnects. So from the perspective of the client, the unsubscribe is effective immediately after a disconnection.

It is different if the client is connected, since if we call destroy() on a subscribed document, we could continue to get ops from the server until it acknowledges the unsubscribe. Now, if the document has been destroyed, the connection will ignore these ops. But if the document is then created or fetched again, it could get complicated. I'm not 100% sure I know where this would go wrong, but it seems like we should just be conservative and delay the callback until after the unsubscribe is complete to avoid any possibility of complexity arising from not waiting long enough to call back. In every case where I've assumed something should be safe to do and couldn't prove it, that later bit me. ;-)

Roughly, I could imagine something like the following being an issue:

  1. Connection A is subscribed to a doc
  2. Connection B performs an op on a doc. The server commits it
  3. Connection A deletes the doc, then calls destroy immediately
  4. As soon as the delete op is committed, the Connection A sends an unsubscribe command
  5. In the callback to destroy, Connection A creates the doc
  6. Connection B's op is sent to Connection A before the unsubscribe or create messages are received by the server
  7. Connection A receives the op from Connection B thinking that it shouldn't be getting any more ops, because it called destroy
  8. Connection A might not be able to OT the op from Connection B against its local data, because the document was created anew. If we had waited until the unsubscribe were complete, then we wouldn't have this race condition.

(This would be an issue with existing ShareDB code as well. Just clarifying why I think we should wait until unsubscribe calls back.)

The above is really complicated, but I think it might be an issue, and there is an easy way to avoid testing fate in this case. Knowing that unsubscribe will always call back and destroy waits until pending operations are complete, I think it is best if we just wait until unsubscribe calls back in all cases before calling the destroy callback. I'm a lot more confident we won't run into any race conditions if we do it that way.

Thus, I recommend the following:

Doc.prototype.destroy = function(callback) {
  var doc = this;
  doc.whenNothingPending(function() {
    if (doc.wantSubscribe) {
      doc.unsubscribe(function() {
        doc.connection._destroyDoc(doc);
        callback();
      });
    } else {
      doc.connection._destroyDoc(doc);
      if (callback) callback();
    }
  });
};

@gkubisa
Copy link
Contributor Author

gkubisa commented Jun 19, 2018

I think it's ok to wait for unsubscribe in destroy, however, I'm not convinced that it's necessary and it doesn't solve a bigger problem I'll describe below.

Waiting for unsubscribe

First of all, here's a slightly improved version of destroy which handles unsubscribe errors, ensures callback is always called asynchronously and always checks, if callback is specified.

Doc.prototype.destroy = function(callback) {
  var doc = this;
  var sync = true; // indicates if whenNothingPending's callback was executed synchronously
  doc.whenNothingPending(function() {
    if (doc.wantSubscribe) {
      doc.unsubscribe(function(err) {
        if (!err) doc.connection._destroyDoc(doc);
        if (callback) return callback(err);
        if (err) this.emit('error', err); 
      });
    } else {
      doc.connection._destroyDoc(doc);
      if (callback) {
        if (sync) process.nextTick(callback);
        else callback();
      }
    }
  });
  sync = false;
};

Why I think it's not necessary

The scenario you outlined above looks like one OT is designed to handle. From reading the source code, it looks like ShareDB can handle ops coming in wrong order, targeted at an older or newer version of the snapshot, with conflicting version, duplicated, etc. So, receiving any extraneous or conflicting operations, or missing some operations, is not a problem.

It also looks like extraneous calls to _handleUnsubscribe (resulting from unsubscribe we did not wait for) would not affect the correctness of a new Doc.

Bigger problem

Docs are shared freely between queries and can be retrieved individually. This is great for performance and efficient, however, it might lead to unpredictable behaviour. For example, an application may have several independent components using ShareDB. If any of the components calls unsubscribe or destroy on their Docs, they might affect other components using the same Docs. If components never call unsubscribe, then unused Docs would still use up resources unnecessarily to process updates. If components never call destroy, then unused Docs will still be referenced by Connection and never garbage collected.

Here's a simple scenario in which destroy leads to problems, regardless of waiting for the callback:

  1. Component A: Get Doc 1
  2. Component B: Get Doc 1 (shared with Component A)
  3. Component A: Destroy Doc 1 (Component B is not aware of it)
  4. Component A: Get Doc 1 (new instance, not shared with Component B)

Components A and B now have 2 separate Doc instances but Connection can support only one. As the components use their Docs, they'll surely get some incorrect behaviour.

I'm not sure what's the best way to solve it... perhaps ref counting to know when Doc is no longer used and can be safely removed from Connection. For subscriptions we could increment wantSubscribe on subscribe and decrement it on unsubscribe, so that wantSubscribe > 0 => subscribed and wantSubscribe === 0 => unsubscribed.

@ericyhwang
Copy link
Contributor

ericyhwang commented Jul 11, 2018

Nate's comments from the PR review meeting:

  • Philosophy: In Share, server shouldn't be responsible for maintaining client state, that should be all handled by the client.
  • Not going to do promises any time soon. Code should always be async (e.g. process.nextTick) or always be sync, not both.
  • Waiting on the callback longer likely won't be an issue, since you generally won't be blocking on destroy. I'm leaning towards doing what is safe.
  • For this PR: Remove the sync workaround, make a separate PR for making whenNothingPending async consistently.

@gkubisa
Copy link
Contributor Author

gkubisa commented Jul 12, 2018

I made the requested changes and created a new PR to fix whenNothingPending.

Copy link
Contributor

@nateps nateps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! Definitely good fix to make sure we are cleaning up memory properly. 💥

doc.unsubscribe(function(err) {
if (err) {
if (callback) callback(err);
else this.emit('error', err);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be doc. I'll just go ahead and merge this change and make the fix, since we're close.

@nateps nateps merged commit d255048 into share:master Jul 23, 2018
@gkubisa gkubisa deleted the fix-doc-destroy branch July 24, 2018 07:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Looks like server ops to a doc are queued up and not applied after a client doc is destroyed
5 participants