Heartbeat? #48

JanKoppe · 2016-12-05T12:03:27Z

We've just had a instance of pyCA simply stop and didn't notice until it was scheduled to record. Our process monitoring did not alert us, nor did the process fail/quit, which would have led to systemd restarting the unit.

I'm wondering if it would be a good idea to implement some kind of heartbeat to continuously monitor the health of the instance. Of course this would need a proper implementation with the upcoming switch to multiple threads for different jobs, requiring some internal health checks for each of those threads. We could then provide the option to simply provide a HTTP api endpoint (.../status.json) or touching a local file, or periodically accessing a predefined URL (active monitoring).

On the other hand, we could just say: that's way too much overhead, monitoring should be done outside of the application, e.g. via watching the logfiles, etc.

I'm really not sure what the best way would be here. Any thoughts?

The text was updated successfully, but these errors were encountered:

JanKoppe · 2017-03-06T16:24:18Z

See #64 and #66, which cover this.

JanKoppe added the enhancement label Dec 5, 2016

JanKoppe mentioned this issue Feb 28, 2017

Process and agent state management #64

Closed

JanKoppe closed this as completed Mar 6, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Heartbeat? #48

Heartbeat? #48

JanKoppe commented Dec 5, 2016

JanKoppe commented Mar 6, 2017

Heartbeat? #48

Heartbeat? #48

Comments

JanKoppe commented Dec 5, 2016

JanKoppe commented Mar 6, 2017