Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consume: prevent loss of data in write buffers / pending batches #16

Open
gavin-norman-sociomantic opened this issue Nov 23, 2017 · 4 comments

Comments

@gavin-norman-sociomantic
Copy link

gavin-norman-sociomantic commented Nov 23, 2017

The neo Consume request has the facility for the client to initiate a clean shutdown, ensuring that all records that have already been popped from the channel are flushed to the client before the request ends.

There are, however, other situations where records can be lost:

  • When the node is shut down. Currently there is no support for waiting for active requests to end cleanly.
  • When a connection error occurs. Currently, this just ends all requests on that connection.

We should consider ways to fix these cases.

(Created here, not in dmqnode, as the fix will likely require both changes to the storage engine and accompanying protocol changes.)

@gavin-norman-sociomantic
Copy link
Author

gavin-norman-sociomantic commented Nov 23, 2017

One idea would be to modify the process of popping a record from the storage engine. Rather than a destructive pop, it could be a two step process:

  1. "Reserve" the record, so that no other requests can access it.
  2. When finished, tell the storage engine to either remove the record from the channel (if the request succeeded) or to mark it as unreserved (if the request failed).

That way, the ordering of records in the channel would not be altered.

@david-eckardt-sociomantic
Copy link
Contributor

the fix will likely require both changes to the storage engine and accompanying protocol changes

To end Consume requests when a node is shut down a protocol change is not necessary -- just to flush the pending batch and send Stopped is sufficient, the client will handle that accordingly.

Currently there is no support for waiting for active requests to end cleanly.

An easier way of accomplishing this could be to use a timeout rather than waiting for requests to report that they have ended. On SIGINT or SIGTERM active requests could be notified somehow, then the node sleeps for, say, a second and finally proceeds shutting down.

"Reserve" the record, so that no other requests can access it.

This could be done at the storage engine level in a way that the storage engine creates the batch and writes it to a dump file on disk on shutdown if it hasn't been sent. On the other hand, many times (that is, if there are sufficiently many records in the storage) the batch is removed from the storage immediately before sending it so that wouldn't help if the shutdown intercepts sending, and there is no feedback from MessageSender when the data have been sent -- they can be buffered. It sounds like a very tricky feature to implement.

@david-eckardt-sociomantic
Copy link
Contributor

active requests could be notified somehow

That could be a listener event, raised by the storage engine when it shuts down (Nemanja's idea).

@gavin-norman-sociomantic
Copy link
Author

This could be done at the storage engine level in a way that the storage engine creates the batch and writes it to a dump file on disk on shutdown if it hasn't been sent. On the other hand, many times (that is, if there are sufficiently many records in the storage) the batch is removed from the storage immediately before sending it so that wouldn't help if the shutdown intercepts sending, and there is no feedback from MessageSender when the data have been sent -- they can be buffered. It sounds like a very tricky feature to implement.

Ahh yes, I was only thinking about the queue in memory. Having to also handle the disk queue would indeed make this more complicated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants