You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've just had a instance of pyCA simply stop and didn't notice until it was scheduled to record. Our process monitoring did not alert us, nor did the process fail/quit, which would have led to systemd restarting the unit.
I'm wondering if it would be a good idea to implement some kind of heartbeat to continuously monitor the health of the instance. Of course this would need a proper implementation with the upcoming switch to multiple threads for different jobs, requiring some internal health checks for each of those threads. We could then provide the option to simply provide a HTTP api endpoint (.../status.json) or touching a local file, or periodically accessing a predefined URL (active monitoring).
On the other hand, we could just say: that's way too much overhead, monitoring should be done outside of the application, e.g. via watching the logfiles, etc.
I'm really not sure what the best way would be here. Any thoughts?
The text was updated successfully, but these errors were encountered:
We've just had a instance of pyCA simply stop and didn't notice until it was scheduled to record. Our process monitoring did not alert us, nor did the process fail/quit, which would have led to systemd restarting the unit.
I'm wondering if it would be a good idea to implement some kind of heartbeat to continuously monitor the health of the instance. Of course this would need a proper implementation with the upcoming switch to multiple threads for different jobs, requiring some internal health checks for each of those threads. We could then provide the option to simply provide a HTTP api endpoint (
.../status.json
) or touching a local file, or periodically accessing a predefined URL (active monitoring).On the other hand, we could just say: that's way too much overhead, monitoring should be done outside of the application, e.g. via watching the logfiles, etc.
I'm really not sure what the best way would be here. Any thoughts?
The text was updated successfully, but these errors were encountered: