Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add alerts for current leading indicator of slow ingest #2207

Open
masih opened this issue Aug 3, 2023 · 1 comment
Open

Add alerts for current leading indicator of slow ingest #2207

masih opened this issue Aug 3, 2023 · 1 comment
Assignees

Comments

@masih
Copy link
Member

masih commented Aug 3, 2023

Add alerts, integrated into Slack and OpsGenie which trigger when the ingest rate slows down and the provider lag grows. We already have an alert for ingest rate stopping for more than an hour which is not catching the gap in ingest issues.

We should look at existing alternative leading indicators to alert on this. Namely:

  • Probelab providers, which check lookup success for CIDs published within 5 minutes of their publication
  • Lag value reported for providers at /provider backed. In both recent incidents NFT.Storage lag on /provider backends consistently grew. The lag for this particular provider should typically remain below 20.
@masih masih added the P0 label Aug 3, 2023
@masih masih added P1 and removed P0 labels Aug 3, 2023
@gammazero
Copy link
Collaborator

Added additional alerts from metrics collected by the telemetry service. Problab data probably does not apply anymore.

Telemetry service can poll the head advertisement from NFT storage, get some multihashes from that, and then lookup those multihashes. An alert can be generated if the multihashes cannot be looked up after some amount of time. Alternatively, the NFT storage provider distance can be tracked, and an alert generated if the distance grows too large.

@gammazero gammazero removed the P1 label Nov 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants