Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

services/horizon: add metrics for ingestions failures and alert on them #5256

Open
mollykarcher opened this issue Mar 21, 2024 · 0 comments
Open

Comments

@mollykarcher
Copy link
Contributor

What problem does your feature solve?

There's been 2 incidents in the past year where ingestion in horizon halted (for different reasons):

During the resolution process, we had a lot of general health metric alerts firing but it wasn't immediately clear it was an ingestion halt until we looked at error logs.

What would you like to see?

An explicit metric that tracks ingestion failures. When a single instance of this occurs, we should alert (critical) on it. This will make it clearer to the responding engineer immediately what the root cause of the issue is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

2 participants