Manage stomp heartbeat in a separate thread. #65

ralphbean · 2018-09-20T19:16:51Z

Hopefully, this will resolve all kinds of issues we've been seeing with stomp
heartbeat management. With the call to callLater, the same thread is used
but unfortunately the call is often made after the interval elapses, even
when the queue is not that busy.

By putting heartbeat events in a separate thread, hopefully the hub can keep up
with the promise it made to the broker.

Hopefully, this will resolve all kinds of issues we've been seeing with stomp heartbeat management. With the call to `callLater`, the same thread is used but unfortunately the call is often made *after* the `interval` elapses, even when the queue is not that busy. By putting heartbeat events in a separate thread, hopefully the hub can keep up with the promise it made to the broker.

mprahl · 2018-09-20T19:58:25Z

👍 makes sense

ralphbean · 2018-09-20T20:00:19Z

Will wait to merge until I get some feedback from Wai and Luiz. They're testing it out for real in the dev instances of their apps.

wcheang · 2018-09-20T21:16:28Z

Ran a few rounds of tests sending a bunch of messages to the dev host with the patch, it didn't seem to miss any messages and I don't see any logs about reconnect. We had problems with stale clients being created last time before we turned off stomp heartbeat though, so I'd let it run for at least overnight to confirm it's not a problem.

jeremycline · 2018-09-21T18:39:29Z

moksha.hub/moksha/hub/stomp/stomp.py

@@ -155,12 +156,13 @@ def failover(self):

    def start_heartbeat(self, interval):
        self._heartbeat_enabled = True
-        reactor.callLater(interval / 1000.0, self.heartbeat, interval)
+        reactor.callInThread(self.heartbeat, interval)


At a glance, this doesn't look right. Nothing in self.heartbeat blocks, so deferring it to a thread is not only unnecessary, it's unsafe because self.proto is (presumably) a Twisted protocol which you should never call outside the reactor thread. If you're finding that callLater isn't called around when you want it to, it's because something is running blocking code in the reactor thread which is when you should use this API.

On the first point: see that a new (disappointing) time.sleep call is added in self.heartbeat.

On the second point: I have the same worry.

Well, you can't rely on interacting with the protocol object outside the twisted thread, it's not thread-safe. You could call back in with deferFromThread (or whatever it's called), but you're going to run into the same scheduling issue you're trying to avoid. Whatever is resulting in blocking calls (the consumers?) needs to run it its own thread.

ralphbean · 2018-09-24T17:01:56Z

OK - I think @jeremycline's observations mean this PR is a dead-end.

We do already have consumers running in their own threads, but only if the hub is not running in blocking mode... and in that case, the hub will ack messages as soon as they arrive - not after they are processed.

We're going to need:

A patch in moksha that (when blocking mode is set to false) will defer ACKing until after the message is processed by its consumers. Today, when blocking mode is set to false, messages are acked as soon as they are added to the internal queue.
Durable queues in the new activemq upgrade.

After those two are complete, our preferred configuration for services should be:

Set blocking mode to false.
Set ack mode to client-individual.
Turn stomp_heartbeat on to some value (any value).

mikebonnet · 2018-09-24T17:25:40Z

+1 I think the approach outlined by @ralphbean makes a lot of sense.

ralphbean · 2018-09-24T18:13:55Z

Hm. @wcheang and @lcarva both report that this patch resolves an existing problem for them (in two different applications) and that it doesn't generate any other obvious side effects (like unsafe manipulation of that protocol object).

I'm going to not merge this, but I will include it temporarily in a release of python-moksha-hub in fedora. I'll let that sit in updates-testing indefinitely, and not plan to ship it to stable. It will be eventually superseded by a different solution (described above).

ralphbean · 2018-09-24T20:00:10Z

Created at https://bodhi.fedoraproject.org/updates/python-moksha-hub-1.5.13-2.0.1cb025525.el7

jeremycline reviewed Sep 21, 2018

View reviewed changes

ralphbean closed this Sep 24, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Manage stomp heartbeat in a separate thread. #65

Manage stomp heartbeat in a separate thread. #65

ralphbean commented Sep 20, 2018

mprahl commented Sep 20, 2018

ralphbean commented Sep 20, 2018

wcheang commented Sep 20, 2018

jeremycline Sep 21, 2018

ralphbean Sep 21, 2018

jeremycline Sep 21, 2018

ralphbean commented Sep 24, 2018 •

edited

mikebonnet commented Sep 24, 2018

ralphbean commented Sep 24, 2018

ralphbean commented Sep 24, 2018

Manage stomp heartbeat in a separate thread. #65

Manage stomp heartbeat in a separate thread. #65

Conversation

ralphbean commented Sep 20, 2018

mprahl commented Sep 20, 2018

ralphbean commented Sep 20, 2018

wcheang commented Sep 20, 2018

jeremycline Sep 21, 2018

Choose a reason for hiding this comment

ralphbean Sep 21, 2018

Choose a reason for hiding this comment

jeremycline Sep 21, 2018

Choose a reason for hiding this comment

ralphbean commented Sep 24, 2018 • edited

mikebonnet commented Sep 24, 2018

ralphbean commented Sep 24, 2018

ralphbean commented Sep 24, 2018

ralphbean commented Sep 24, 2018 •

edited