-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[grafana] Alerting: add options and documentation to deploy a HA cluster #865
Conversation
4fe3f03
to
99e3b45
Compare
Signed-off-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>
Signed-off-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>
99e3b45
to
205682d
Compare
Signed-off-by: Jean-Philippe Quémémer <jeanphilippe.quemener@grafana.com>
d95242b
to
8f42e54
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi ! Very interested in this being merged, as we've been seeing odd behaviors with our current HA Grafana deployments using legacy alerts.
Thank you for your work !
- protocol: TCP | ||
port: 3000 | ||
targetPort: 3000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be set to 9094
so that Grafana can query {{ Name }}-headless:9094
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't really matter, this is only used for a normal deployment to bind to the service as otherwise it wouldn't if there is no port. But I can change it to 9094, so it reflects more the usage of the service.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, understood, that makes sense.
I'm still seeing an issue on a local copy of this. Will comment in main thread to discuss further
To test this out locally, I checked out the latest version of this chart ( apiVersion: v1
kind: Service
metadata:
name: grafana-headless
spec:
clusterIP: None
selector:
app.kubernetes.io/name: grafana
app.kubernetes.io/instance: <my release name here>
type: ClusterIP
ports:
- protocol: TCP
port: 3000
targetPort: 3000 We deploy the Grafana chart with HA enabled through an external PSQL database, and have three replicas. I amended the grafana.ini:
alerting:
enabled: false
unified_alerting:
enabled: true
ha_peers: grafana-headless:9094 I can see the headless service being created, taking on the right IPs as the
After the pods have finished booting up, the headless service picks them up, adds them as valid endpoints, and the unified alerting gossip mechanism starts working as expected.
kubectl rollout restart deploy grafana yields a series of t=2021-11-30T10:17:42+0000 lvl=info msg="component=cluster level=debug memberlist=\"2021/11/30 10:17:42 [DEBUG] memberlist: Failed to join 10.0.2.200: dial tcp 10.0.2.200:9094: connect: no route to host\\n\"" logger=ngalert.multiorg.alertmanager Where Maybe there's an easy fix, or I just missed something obvious, but I couldn't find anything relevant in the I'm not sure of how to fix this, or whether this should be addressed at the Grafana level or the Chart level though. Let me know if there's any additional info I can add, or if you think this deserves its own issue either here or on the Grafana repo |
Yes, for now there is nothing we can do about this. We are thinking of changing the default timeouts or making them configurable. grafana/grafana#42300 |
To be clear, this does not affect the system in any way. It works fine. It just spills out those massages for a certain time. This is nothing we can fix on Chart level. |
Appreciate the feedback, thank you ! |
This PR adds the possibility to create a headless service for normal deployments and adds some documentation on how to setup a cluster for unfied alerting.
fixes #747