Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert history #3180

Closed
alexandre-allard opened this issue Mar 9, 2021 · 0 comments · Fixed by #3191
Closed

Alert history #3180

alexandre-allard opened this issue Mar 9, 2021 · 0 comments · Fixed by #3191
Assignees
Labels
complexity:medium Something that requires one or few days to fix kind:enhancement New feature or request

Comments

@alexandre-allard
Copy link
Contributor

Component: alerting, logging

Why this is needed:
In NextGen UI we are introducing the Global Health Component that shows real time entity Health but also intends to show entity Health over the last X days.
As for now, we do not have a way of persisting alerts, but this component needs to be able to retrieve past alerts no longer stored in Alertmanager.

See https://github.com/scality/ringx/blob/development/1.0/docs/developer/architecture/alert_history.rst for details

What should be done:
Implement a mechanism to persist all alerts in a database and then be able to query this database via an API to retrieve them.

Implementation proposal (strongly recommended):
We will use an Alertmanager webhook which will log every alert received to its standard output, this way fluent-bit will be able to retrieve them and forward them to Loki.
The alerts can then be retrieved querying Loki like any other log.

See https://github.com/scality/ringx/pull/44 for details

Test plan:
Add a test in post-install checking that we can retrieve alerts via Loki API (we should at least have the Watchdog alert).

alexandre-allard added a commit that referenced this issue Mar 9, 2021
This is a simple HTTP server, listening
on a port (default to 19094), waiting for
HTTP post request from alertmanager.
It then logs the content of these requests
to stdout.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
This container image is used for alert history.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
Remove the date from the logs once parsed by
fluent-bit as we do not need it anymore and
it allows to improve the logs readability.
Plus, it is also need by the alert history
feature, this way we only end up with a JSON
formatted alert in logs which makes it easier
to parse/use by other components.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
We now send all the alerts to the alert-logger
receiver.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
alexandre-allard added a commit that referenced this issue Mar 9, 2021
This scenario checks that we can retrieve the
cluster alerts from the Loki API.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
This scenario checks that we can retrieve the
cluster alerts from the Loki API.

Refs: #3180
@alexandre-allard alexandre-allard added complexity:medium Something that requires one or few days to fix kind:enhancement New feature or request labels Mar 9, 2021
alexandre-allard added a commit that referenced this issue Mar 9, 2021
This is a simple HTTP server, listening
on a port (default to 19094), waiting for
HTTP post request from alertmanager.
It then logs the content of these requests
to stdout.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
This container image is used for alert history.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
Remove the date from the logs once parsed by
fluent-bit as we do not need it anymore and
it allows to improve the logs readability.
Plus, it is also need by the alert history
feature, this way we only end up with a JSON
formatted alert in logs which makes it easier
to parse/use by other components.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
We now send all the alerts to the alert-logger
receiver.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 9, 2021
alexandre-allard added a commit that referenced this issue Mar 9, 2021
This scenario checks that we can retrieve the
cluster alerts from the Loki API.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
This is a simple HTTP server, listening
on a port (default to 19094), waiting for
HTTP post request from alertmanager.
It then logs the content of these requests
to stdout.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
This container image is used for alert history.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
Remove the date from the logs once parsed by
fluent-bit as we do not need it anymore and
it allows to improve the logs readability.
Plus, it is also need by the alert history
feature, this way we only end up with a JSON
formatted alert in logs which makes it easier
to parse/use by other components.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
We now send all the alerts to the alert-logger
receiver.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
We also need to enable CORS to allow Cross Origin
requests coming from the web UI.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
This scenario checks that we can retrieve the
cluster alerts from the Loki API.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
This is a simple HTTP server, listening
on a port (default to 19094), waiting for
HTTP post request from alertmanager.
It then logs the content of these requests
to stdout.

Refs: #3180
alexandre-allard added a commit that referenced this issue Mar 10, 2021
This container image is used for alert history.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This document describes how the alert history
feature works and explain why.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This is a simple HTTP server, listening
on a port (default to 19094), waiting for
HTTP post request from alertmanager.
It then logs the content of these requests
to stdout.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This container image is used for alert history.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
Remove the date from the logs once parsed by
fluent-bit as we do not need it anymore and
it allows to improve the logs readability.
Plus, it is also need by the alert history
feature, this way we only end up with a JSON
formatted alert in logs which makes it easier
to parse/use by other components.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
We now send all the alerts to the alert-logger
receiver.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
We also need to enable CORS to allow Cross Origin
requests coming from the web UI.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This scenario checks that we can retrieve the
cluster alerts from the Loki API.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This is a simple HTTP server, listening
on a port (default to 19094), waiting for
HTTP post request from alertmanager.
It then logs the content of these requests
to stdout.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This container image is used for alert history.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
Remove the date from the logs once parsed by
fluent-bit as we do not need it anymore and
it allows to improve the logs readability.
Plus, it is also need by the alert history
feature, this way we only end up with a JSON
formatted alert in logs which makes it easier
to parse/use by other components.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
We now send all the alerts to the alert-logger
receiver.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
We also need to enable CORS to allow Cross Origin
requests coming from the web UI.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This scenario checks that we can retrieve the
cluster alerts from the Loki API.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This is a simple HTTP server, listening
on a port (default to 19094), waiting for
HTTP post request from alertmanager.
It then logs the content of these requests
to stdout.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This container image is used for alert history.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
Remove the date from the logs once parsed by
fluent-bit as we do not need it anymore and
it allows to improve the logs readability.
Plus, it is also need by the alert history
feature, this way we only end up with a JSON
formatted alert in logs which makes it easier
to parse/use by other components.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
We now send all the alerts to the alert-logger
receiver.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
We also need to enable CORS to allow Cross Origin
requests coming from the web UI.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 6, 2021
This scenario checks that we can retrieve the
cluster alerts from the Loki API.

Refs: #3180
alexandre-allard added a commit that referenced this issue Apr 7, 2021
Remove the date from the logs once parsed by
fluent-bit as we do not need it anymore and
it allows to improve the logs readability.
Plus, it is also need by the alert history
feature, this way we only end up with a JSON
formatted alert in logs which makes it easier
to parse/use by other components.

Refs: #3180
(cherry picked from commit 73f46cf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complexity:medium Something that requires one or few days to fix kind:enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants