For code that uses ZMQStreams for heartbeat (notebook, hub), it is actually possible for server load to disrupt the heartbeat. I'm not sure if this actually happens, but it is a real possibility. I expect the most likely case is the Hub, which can have very long blocking calls when using the mongodb/sqlite task stores.
The relevant aspects of the code:
I am not sure I have a good sense of which of the solutions would be better. I would say that 3 seems like the simplest option, but 2 seems almost equivalent. 1 sounds a bit more complex but perhaps it more directly targets the underlying problem. I don't see any reason why any of these would not work though.
I did 2. in PR #1312. The case against 3 is that it's sort-of a private API call, whereas 2. is the ~official way to send immediately. I honestly don't know which is better. 2. is more 'official', while 3. is more direct.
By the way, I was able to induce this heart failure with artificial load in the notebook server (sleep longer than heartbeat in PeriodicCallback), and PR #1312 successfully prevented this from causing a heart failure, even with heartbeats of 0.1s.