Introducing the data source health check subsystem #20

amotl · 2022-06-20T08:35:48Z

Originally coming from How to find out unused datasources?, based on our discussion at Finding unhealthy data sources, we finally made a start to lay out the foundation with #19. Health checks for a few data source types will be supported already ¹, but we will have to complete the list diligently, which requires dedicated work.

Corresponding documentation how to work with the health check subsystem on behalf of an example program examples/datasource-health.py can be inspected at examples/datasource-health.rst. The program is intended to evaluate the new subsystem with different databases, in order to gradually improve the implementation. We tried to make it easily usable for others to run in order to support this endeavor.

When the feature is reasonably ready over here, we will return to grafana-toolbox/grafana-wtf#19 in order to continue the discussion how to use it within grafana-wtf appropriately.

If this resonates with you, you might want to lend a hand? All kinds of feedback, both in terms of testing and further contributions will be greatly appreciated. Thank you very much in advance.

With kind regards,
Andreas.

Usage

The documentation at examples/datasource-health.rst will guide you through a full development sandbox installation, including running Grafana and some database services as Docker containers, and setting up the working tree from the Git repository.

In this section, we outline an alternative approach how to work with the feature using the example program examples/datasource-health.py. It might save a few keystrokes, specifically when aiming to work on an existing infrastructure.

# Setup
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade git+https://github.com/panodata/grafana-client
wget https://raw.githubusercontent.com/panodata/grafana-client/main/examples/datasource-health.py

# Run
export GRAFANA_URL=http://daq.example.org:3000
export GRAFANA_TOKEN=eyJrIjoiUWVrYXJh....aWQiOjJ9
python datasource-health.py --type=influxdb --url=http://daq.example.org:8086

Details

The example program will create a data source item named probe-{dstype}, with the designated database target URL. Then, it will run a data source health check on it and report about its outcome.

This will work well in situations when running the database services as Docker containers, as outlined in the reference documentation at examples/datasource-health.rst. It might not work well in other situations, where the data source configuration might need further adjustments.

In this case, don't hesitate to adjust the datasource_factory() code correspondingly to match your setup.

References

Original feature request: Finding unhealthy data sources grafana-wtf#19
Discussion: Introducing the data source health check subsystem #20
First patch: Improve data source API by adding a data source health-check probe #19
Documentation: examples/datasource-health.rst
Example program: examples/datasource-health.py

CrateDB, Elasticsearch, InfluxDB, PostgreSQL, Prometheus, Testdata ↩

The text was updated successfully, but these errors were encountered:

jangaraj · 2022-06-20T12:26:08Z

Grafana has introduced API for datasource health:

Ref:

I would use this API now.

amotl · 2022-06-20T13:02:49Z

Dear Jan,

thank you for sharing this information. This is very sweet. I will improve the implementation to use the new endpoint from Grafana 9 onwards.

Edit: A draft has been submitted at #21. However, I am still observing problems with the new /health endpoint. Maybe you can have a look and hopefully tell me what I am doing wrong?

With kind regards,
Andreas.

amotl · 2022-06-29T22:04:55Z

Dear @jangaraj, @chenlujjj, and @peekjef72,

with #24 and #25, the data source health check subsystem became mature enough that adding the most prominent missing data sources on behalf of #27 was a real breeze. You can take that as a reference if you need to add health check capabilities for further data sources.

Effectively, unless a data source would have a total different shape in terms of query and response formats, the gist is to fill in the gap solely within grafana_client/knowledge.py. If you are interested in how this works for the individual data sources, I recommend to look at the individual commits of this patch (#27).

For testing, I've scanned both play.grafana.org and weather.hiveeyes.org, running data source health checks on all data sources. The machinery is not able to scan swarm.hiveeyes.org, because it is still running Grafana 6. Data source health checks will only work on Grafana >= 7.

The list of supported data sources is now:

[
  "elasticsearch",
  "fetzerch-sunandmoon-datasource",
  "grafana-simple-json-datasource",
  "graphite",
  "influxdb",
  "influxdb+flux",
  "influxdb+influxql",
  "jaeger",
  "loki",
  "mssql",
  "mysql",
  "opentsdb",
  "postgres",
  "prometheus",
  "simpod-json-datasource",
  "tempo",
  "testdata",
  "zipkin"
]

Do you think we are missing any important item here?

I am sure the implementation still contains bugs. On this matter, I will be very happy to hear back from the community about spots where things go south with specific data sources / conditions.

With kind regards,
Andreas.

P.S.: @jangaraj: Regarding our last conversation, the data source health check subsystem now consumes the new server-side per-datasource health check endpoint available with Grafana >= 9. If that fails, it will fall back to the client-side implementation provided by grafana-client. -- #21 (comment).

amotl · 2022-07-02T10:09:13Z

Hi again,

grafana-client==3.0.0 has been released. With this release, you will be able to use the new data source health check subsystem. We will be happy to hear about your observations and eventual bug reports.

With kind regards,
Andreas.

amotl · 2023-09-19T20:02:32Z

With GH-112 by @peekjef72 (thanks again!), this subsystem can now be used to also conduct generic queries to Grafana datasource databases, and people start using it for that purpose already, see GH-85.

This was referenced Jun 20, 2022

Finding unhealthy data sources grafana-toolbox/grafana-wtf#19

Open

Improve data source API by adding a data source health-check probe #19

Merged

amotl mentioned this issue Jun 20, 2022

Support data source health check endpoint introduced with Grafana 9 #21

Merged

amotl mentioned this issue Jun 29, 2022

Add more data source health check adapters: Jaeger, Loki, Microsoft SQL Server, Tempo, Zipkin #27

Merged

amotl added the enhancement New feature or request label Jul 2, 2022

amotl mentioned this issue Sep 19, 2022

unhashable type: 'dict' grafana-toolbox/grafana-snapshots-tool#2

Open

amotl mentioned this issue May 3, 2023

Grafana Smart Query Prometheus missing key #85

Closed

amotl mentioned this issue Sep 19, 2023

Extract all unique Prometheus label names referenced on a given dashboard or all Grafana objects grafana-toolbox/grafana-wtf#67

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introducing the data source health check subsystem #20

Introducing the data source health check subsystem #20

amotl commented Jun 20, 2022 •

edited

Loading

jangaraj commented Jun 20, 2022

amotl commented Jun 20, 2022 •

edited

Loading

amotl commented Jun 29, 2022 •

edited

Loading

amotl commented Jul 2, 2022

amotl commented Sep 19, 2023

Introducing the data source health check subsystem #20

Introducing the data source health check subsystem #20

Comments

amotl commented Jun 20, 2022 • edited Loading

Usage

Details

References

Footnotes

jangaraj commented Jun 20, 2022

amotl commented Jun 20, 2022 • edited Loading

amotl commented Jun 29, 2022 • edited Loading

amotl commented Jul 2, 2022

amotl commented Sep 19, 2023

amotl commented Jun 20, 2022 •

edited

Loading

amotl commented Jun 20, 2022 •

edited

Loading

amotl commented Jun 29, 2022 •

edited

Loading