You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem?
There is no system setup to notify when a master node has not been processing snapshots or has gone down.
Describe the solution you'd like
The /health endpoint of the core-api should periodically query the protocol state contract for the current epoch and compare the epochId against active projects to detect how many epochs have passed since the last finalization. If any sampled projects have not finalized within a reasonable time (20-30 epochs?), then pooler will send a Slack alert notifying of a failure to finalize.
Describe alternatives you've considered
It might be worth exploring an external monitoring service that tracks the status independently since finalization requires submissions from multiple master nodes. However, it would be possible to get additional information for specific nodes if the monitoring is done internally.
The text was updated successfully, but these errors were encountered:
The associated PR has been reviewed by me. Ideally would need another review from one of the @PowerLoom/backend-engineering team members before merging. Moving completion date to this week, Tuesday EOD.
Is your feature request related to a problem?
There is no system setup to notify when a master node has not been processing snapshots or has gone down.
Describe the solution you'd like
The
/health
endpoint of the core-api should periodically query the protocol state contract for the current epoch and compare the epochId against active projects to detect how many epochs have passed since the last finalization. If any sampled projects have not finalized within a reasonable time (20-30 epochs?), then pooler will send a Slack alert notifying of a failure to finalize.Describe alternatives you've considered
It might be worth exploring an external monitoring service that tracks the status independently since finalization requires submissions from multiple master nodes. However, it would be possible to get additional information for specific nodes if the monitoring is done internally.
The text was updated successfully, but these errors were encountered: