Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hive: metrics support with prometheus and grafana #665

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

protolambda
Copy link
Contributor

@protolambda protolambda commented Dec 11, 2022

This PR implements supports for Prometheus metrics collection and automated Grafana setup 🎉

Like hiveproxy, hive now optionally runs two additional global containers in the background:

  • Prometheus: the metrics datasource service that scrapes all configured client instances
  • Grafana: metrics frontend, automatically provisioned with dashboards and datasources, so nothing has to be manually set up as hive user :)

Run hive in dev-mode to keep the Hive server with its metrics containers running across test runs, and explore the metrics during/after the simulator test runs.

Hive will add metrics scrape targets to the prometheus instance automatically, and remove them automatically, for all containers with configured metrics options. The hive API that creates client containers will use the hive metadata to add the metrics scrape target option to the function that we create containers with.

See updated hive docs about client configuration. TLDR: an optional metrics entry in the hive.yaml of the client defines a scrape target with port and labels, and hive adds labels like suite/test/version/etc. dynamically.

Metrics are disabled by default, but can be enabled and configured with 3 new flags:

  -metrics
        Flag to enable metrics collection with prometheus
  -metrics.grafana uint
        Host port to bind grafana frontend to, grafana will not run if this is 0. (default 8080)
  -metrics.prometheus uint
        Host port to bind prometheus to, prometheus will run but not be exposed to the host if this is 0 (host port is not required for plugging into grafana).

Long-term we could also consider adding a 3rd optional metrics container: there's a grafana renderer docker image available that will run grafana in a headless way, and exposes an API to generate images of dashboards or individual panels. That way we could generate and persist metrics reports for simulator runs! For now we can just start with regular grafana, useful during development, and we can start making nice Hive grafana dashboards.

Prometheus

Example of the prometheus admin frontend (when exposed to host with -metrics.prometheus=9090), the targets tab:
image

These targets will be available for grafana charts to query from, and the labels can be used to filter the data of different test-runs, clients, etc.

Grafana

The default port is 8080, but this can be changed with the -metrics.grafana flag.

Example of the Lighthouse Summary dashboard (taken from here: https://github.com/sigp/lighthouse-metrics and then modified to use the provisioned prometheus datasource):

image

With upcoming eth2 testnet setup deduplication work and new simulators we can build a better Hive ethereum testnet dashboard. And maybe the client-teams can add dashboards for their respective clients.

All dashboards are put in the internal/libdocker/graf/dashboards directory, and can be grouped with nested file structure. Just make sure you use the prometheus datasource (UID is hardcoded and won't change):

{
// ... my panel json data
"datasource": {        "type": "prometheus", "uid": "P1809F7CD0C75ACF3"},
}

Reviewing

The diff is only 600 lines, but the Summary.json dashboard source file is 4300 lines. Let me know if I can help explain/document the hive changes themselves better.

@protolambda
Copy link
Contributor Author

Rebased on master, ready for review again.

@fjl
Copy link
Collaborator

fjl commented Jan 4, 2023

Sorry it's taking a while to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants