New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manage stomp heartbeat in a separate thread. #65
Conversation
Hopefully, this will resolve all kinds of issues we've been seeing with stomp heartbeat management. With the call to `callLater`, the same thread is used but unfortunately the call is often made *after* the `interval` elapses, even when the queue is not that busy. By putting heartbeat events in a separate thread, hopefully the hub can keep up with the promise it made to the broker.
👍 makes sense |
Will wait to merge until I get some feedback from Wai and Luiz. They're testing it out for real in the dev instances of their apps. |
Ran a few rounds of tests sending a bunch of messages to the dev host with the patch, it didn't seem to miss any messages and I don't see any logs about reconnect. We had problems with stale clients being created last time before we turned off stomp heartbeat though, so I'd let it run for at least overnight to confirm it's not a problem. |
@@ -155,12 +156,13 @@ def failover(self): | |||
|
|||
def start_heartbeat(self, interval): | |||
self._heartbeat_enabled = True | |||
reactor.callLater(interval / 1000.0, self.heartbeat, interval) | |||
reactor.callInThread(self.heartbeat, interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At a glance, this doesn't look right. Nothing in self.heartbeat
blocks, so deferring it to a thread is not only unnecessary, it's unsafe because self.proto
is (presumably) a Twisted protocol which you should never call outside the reactor thread. If you're finding that callLater isn't called around when you want it to, it's because something is running blocking code in the reactor thread which is when you should use this API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the first point: see that a new (disappointing) time.sleep
call is added in self.heartbeat
.
On the second point: I have the same worry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, you can't rely on interacting with the protocol object outside the twisted thread, it's not thread-safe. You could call back in with deferFromThread (or whatever it's called), but you're going to run into the same scheduling issue you're trying to avoid. Whatever is resulting in blocking calls (the consumers?) needs to run it its own thread.
OK - I think @jeremycline's observations mean this PR is a dead-end. We do already have consumers running in their own threads, but only if the hub is not running in blocking mode... and in that case, the hub will ack messages as soon as they arrive - not after they are processed. We're going to need:
After those two are complete, our preferred configuration for services should be:
|
+1 I think the approach outlined by @ralphbean makes a lot of sense. |
Hm. @wcheang and @lcarva both report that this patch resolves an existing problem for them (in two different applications) and that it doesn't generate any other obvious side effects (like unsafe manipulation of that protocol object). I'm going to not merge this, but I will include it temporarily in a release of python-moksha-hub in fedora. I'll let that sit in updates-testing indefinitely, and not plan to ship it to stable. It will be eventually superseded by a different solution (described above). |
Hopefully, this will resolve all kinds of issues we've been seeing with stomp
heartbeat management. With the call to
callLater
, the same thread is usedbut unfortunately the call is often made after the
interval
elapses, evenwhen the queue is not that busy.
By putting heartbeat events in a separate thread, hopefully the hub can keep up
with the promise it made to the broker.