Description
What Grafana version are you using?
5.2.2
What datasource are you using?
Mixed (irrelevant to my question)
What OS are you running grafana on?
Linux
What did you do?
Grafana is one of the main tools for troubleshooting during outages and we are looking to a way to make it as much reliable as possible.
We have a big installation of Grafana (around 90 instances in the same Domain/Org)
We are using MySQL to store both Grafana config and Sessions.
We are using MySQL Master->many_Slaves replication chain managed by Orchestrator https://github.com/github/orchestrator
Since Grafana can utilize only single MySQL server(Master) for it internal needs #399 #3676 we are providing service discovery by using short living DNS records pointing to a current Master, updated by Orchestrator for redundancy. This setup work fine until recently.
The recent DNS outage proved that this DNS dependency is better to be avoided, as If DNS fails then we do not have a way to discover MySQL and Grafana fails to perform any operations without MySQL available. Essentially if your DNS is broken, then everything else is broken.
We were thinking about adding some extra fail-over logic on Grafana boxes where we could pull all available slaves and make them available locally for grafana, for instance via /etc/hosts. This should make grafana available for troubleshooting, however grafana just can't deal with read-only database.
What was the expected result?
I would expect it to keep providing read-only access to all dashboards, search and navigation should work.
What happened instead?
If restarted it can't start
t=2018-09-24T13:43:23+0200 lvl=eror msg="Server shutdown" logger=server reason="Service init failed: Datasource provisioning error: Error 1290: The MySQL server is running with the --read-only option so it cannot execute this statement"
If IP of the MySQL has been changed to RO then it can't search for dashboards and navigation is impossible with "Server error" or "Unknown error"
Proper solution as I can see it:
- Teach Grafana to work with Write / RO connections separately
- Fail gracefully if Write connection unavailable