Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing the data source health check subsystem #20

Open
amotl opened this issue Jun 20, 2022 · 5 comments
Open

Introducing the data source health check subsystem #20

amotl opened this issue Jun 20, 2022 · 5 comments
Labels
enhancement New feature or request

Comments

@amotl
Copy link
Contributor

amotl commented Jun 20, 2022

Dear @chenlujjj and @jangaraj,

Originally coming from How to find out unused datasources?, based on our discussion at Finding unhealthy data sources, we finally made a start to lay out the foundation with #19. Health checks for a few data source types will be supported already 1, but we will have to complete the list diligently, which requires dedicated work.

Corresponding documentation how to work with the health check subsystem on behalf of an example program examples/datasource-health.py can be inspected at examples/datasource-health.rst. The program is intended to evaluate the new subsystem with different databases, in order to gradually improve the implementation. We tried to make it easily usable for others to run in order to support this endeavor.

When the feature is reasonably ready over here, we will return to grafana-toolbox/grafana-wtf#19 in order to continue the discussion how to use it within grafana-wtf appropriately.

If this resonates with you, you might want to lend a hand? All kinds of feedback, both in terms of testing and further contributions will be greatly appreciated. Thank you very much in advance.

With kind regards,
Andreas.


Usage

The documentation at examples/datasource-health.rst will guide you through a full development sandbox installation, including running Grafana and some database services as Docker containers, and setting up the working tree from the Git repository.

In this section, we outline an alternative approach how to work with the feature using the example program examples/datasource-health.py. It might save a few keystrokes, specifically when aiming to work on an existing infrastructure.

# Setup
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade git+https://github.com/panodata/grafana-client
wget https://raw.githubusercontent.com/panodata/grafana-client/main/examples/datasource-health.py

# Run
export GRAFANA_URL=http://daq.example.org:3000
export GRAFANA_TOKEN=eyJrIjoiUWVrYXJh....aWQiOjJ9
python datasource-health.py --type=influxdb --url=http://daq.example.org:8086

Details

The example program will create a data source item named probe-{dstype}, with the designated database target URL. Then, it will run a data source health check on it and report about its outcome.

This will work well in situations when running the database services as Docker containers, as outlined in the reference documentation at examples/datasource-health.rst. It might not work well in other situations, where the data source configuration might need further adjustments.

In this case, don't hesitate to adjust the datasource_factory() code correspondingly to match your setup.

References

Footnotes

  1. CrateDB, Elasticsearch, InfluxDB, PostgreSQL, Prometheus, Testdata

@jangaraj
Copy link

Grafana has introduced API for datasource health:

Ref:

I would use this API now.

@amotl
Copy link
Contributor Author

amotl commented Jun 20, 2022

Dear Jan,

thank you for sharing this information. This is very sweet. I will improve the implementation to use the new endpoint from Grafana 9 onwards.

Edit: A draft has been submitted at #21. However, I am still observing problems with the new /health endpoint. Maybe you can have a look and hopefully tell me what I am doing wrong?

With kind regards,
Andreas.

@amotl
Copy link
Contributor Author

amotl commented Jun 29, 2022

Dear @jangaraj, @chenlujjj, and @peekjef72,

with #24 and #25, the data source health check subsystem became mature enough that adding the most prominent missing data sources on behalf of #27 was a real breeze. You can take that as a reference if you need to add health check capabilities for further data sources.

Effectively, unless a data source would have a total different shape in terms of query and response formats, the gist is to fill in the gap solely within grafana_client/knowledge.py. If you are interested in how this works for the individual data sources, I recommend to look at the individual commits of this patch (#27).

For testing, I've scanned both play.grafana.org and weather.hiveeyes.org, running data source health checks on all data sources. The machinery is not able to scan swarm.hiveeyes.org, because it is still running Grafana 6. Data source health checks will only work on Grafana >= 7.

The list of supported data sources is now:

[
  "elasticsearch",
  "fetzerch-sunandmoon-datasource",
  "grafana-simple-json-datasource",
  "graphite",
  "influxdb",
  "influxdb+flux",
  "influxdb+influxql",
  "jaeger",
  "loki",
  "mssql",
  "mysql",
  "opentsdb",
  "postgres",
  "prometheus",
  "simpod-json-datasource",
  "tempo",
  "testdata",
  "zipkin"
]

Do you think we are missing any important item here?

I am sure the implementation still contains bugs. On this matter, I will be very happy to hear back from the community about spots where things go south with specific data sources / conditions.

With kind regards,
Andreas.

P.S.: @jangaraj: Regarding our last conversation, the data source health check subsystem now consumes the new server-side per-datasource health check endpoint available with Grafana >= 9. If that fails, it will fall back to the client-side implementation provided by grafana-client. -- #21 (comment).

@amotl
Copy link
Contributor Author

amotl commented Jul 2, 2022

Hi again,

grafana-client==3.0.0 has been released. With this release, you will be able to use the new data source health check subsystem. We will be happy to hear about your observations and eventual bug reports.

With kind regards,
Andreas.

@amotl
Copy link
Contributor Author

amotl commented Sep 19, 2023

With GH-112 by @peekjef72 (thanks again!), this subsystem can now be used to also conduct generic queries to Grafana datasource databases, and people start using it for that purpose already, see GH-85.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants