You can clone with
For code that uses ZMQStreams for heartbeat (notebook, hub), it is actually possible for server load to disrupt the heartbeat. I'm not sure if this actually happens, but it is a real possibility. I expect the most likely case is the Hub, which can have very long blocking calls when using the mongodb/sqlite task stores.
The relevant aspects of the code:
I am not sure I have a good sense of which of the solutions would be better. I would say that 3 seems like the simplest option, but 2 seems almost equivalent. 1 sounds a bit more complex but perhaps it more directly targets the underlying problem. I don't see any reason why any of these would not work though.
I did 2. in PR #1312. The case against 3 is that it's sort-of a private API call, whereas 2. is the ~official way to send immediately. I honestly don't know which is better. 2. is more 'official', while 3. is more direct.
By the way, I was able to induce this heart failure with artificial load in the notebook server (sleep longer than heartbeat in PeriodicCallback), and PR #1312 successfully prevented this from causing a heart failure, even with heartbeats of 0.1s.