Skip to content
This repository has been archived by the owner on Aug 31, 2021. It is now read-only.

heartbeat timeout cause crash #70

Closed
jacktuck opened this issue Jan 22, 2018 · 6 comments
Closed

heartbeat timeout cause crash #70

jacktuck opened this issue Jan 22, 2018 · 6 comments

Comments

@jacktuck
Copy link
Collaborator

jacktuck commented Jan 22, 2018

This would seem to have happened on remit 2 but not certain because it's happened on a now dead docker container.

events.js:182
      throw er; // Unhandled 'error' event
      ^

Error: Heartbeat timeout
    at Heart.<anonymous> (/app/node_modules/amqplib/lib/connection.js:425:19)
    at emitNone (events.js:105:13)
    at Heart.emit (events.js:207:7)
    at Heart.runHeartbeat (/app/node_modules/amqplib/lib/heartbeat.js:88:17)
    at ontimeout (timers.js:469:11)
    at tryOnTimeout (timers.js:304:5)
    at Timer.listOnTimeout (timers.js:264:5)

but this error was not caught and caused a crash

@jacktuck
Copy link
Collaborator Author

jacktuck commented Jan 22, 2018

nothing in rabbitmq logs, so assuming its a client error that's just not handled as of yet.

https://github.com/squaremo/amqp.node/blob/31bb2a9c40219ce2fc8c4fede15a84a9f54b36a1/lib/connection.js#L426

@jpwilliams
Copy link
Owner

Ooh looks fun!

That'd usually indicate that the client's connection dropped in which case the crash (and therefore process exit) is expected. For anything where the connection dies, the immediate action is to crash.

It could throw a handleable error, but then you'd have a running container that wasn't actually working as expected.

Is that correct or have I found the wrong end of the stick?

@jacktuck
Copy link
Collaborator Author

jacktuck commented Jan 22, 2018

No you've got the gist.

Just so happens that it's running on... nodemon... so if we were not it would have restarted and been fine; instead it waits for changes.

But maybe it'd be better to handle reconnection without crashing?

@jpwilliams
Copy link
Owner

We could look in to handling reconnect attempts, yeah, but would be a pretty large change. That said, if I remember correctly there are a few libraries floating around that wrap amqplib to support reconnection logic, so then we'd just need to add logic in to Remit to recreate endpoints etc on reconnection.

Possible, but needs some looking in to.

For now, switch that mofo to pm2 I guess. :D

@jacktuck
Copy link
Collaborator Author

jacktuck commented Jan 22, 2018

yeah that works, definitely shouldn't have been on nodemon, just an observation i noticed since it had been using it.

i guess though if you can imagine a split brain situation where an api server can no longer reach an remote rabbitmq server. If the heartbeat causes a crash everything would be reconnecting at the same time and potentially bring down rabbitmq?

I guess even if we did not crash and tried to reconnect the same would apply, unless the library which wraps amqplib has an option for some randomness.

For example, SocketCluster has the following config:

autoReconnectOptions: {
  delay: 4000, // fixed delay in milliseconds
  randomness: 1000 // randomness to add on top in milliseconds
}

That's not a possibility in our use as we have relatively few services, but yeah maybe migrating to a wrapper could be a good shout especially if it's completely compatible with our current one :) essentially would be reconnection logic for free

@jpwilliams
Copy link
Owner

Sure that makes sense. I'll close this - as technically it's expected behaviour - but open up another issue (#71) for a feature request for reconnection logic.

I'll try have a look at the listed packages tonight and see what's viable.

@jpwilliams jpwilliams added this to Done in Test Project Jul 6, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
No open projects
Test Project
  
Done
Development

No branches or pull requests

2 participants