Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add prometheus metrics for health check results. #2759

Merged
merged 4 commits into from
Nov 28, 2022

Conversation

Kircheneer
Copy link
Contributor

@Kircheneer Kircheneer commented Nov 7, 2022

Closes: DNE

What's Changed

Exposes metrics for the health check results under /metrics as follows:

# HELP health_check_database_info Multiprocess metric
# TYPE health_check_database_info gauge
health_check_database_info 1.0
# HELP health_check_cache_ops_redis_info Multiprocess metric
# TYPE health_check_cache_ops_redis_info gauge
health_check_cache_ops_redis_info 0.0
# HELP health_check_redis_backend_info Multiprocess metric
# TYPE health_check_redis_backend_info gauge
health_check_redis_backend_info 1.0

I have defaulted to enabling the metrics endpoint in the development configuration. I am happy to change this back if you want.

I have explicitly added

# Custom prometheus metrics
prometheus-client = "~0.14.1"

to pyproject.toml. This is a transitive dependency through django-prometheus, so it is not technically a new dependency,

TODO

  • Explanation of Change(s)
  • Added change log fragment(s) (for more information see the documentation)
  • Attached Screenshots, Payload Example
  • Unit, Integration Tests
  • Documentation Updates (when adding/changing features)
  • Outline Remaining Work, Constraints from Design

@Kircheneer Kircheneer changed the title Add prometheus metrics for health check results. WIP: Add prometheus metrics for health check results. Nov 7, 2022
@Kircheneer Kircheneer marked this pull request as draft November 7, 2022 15:47
@Kircheneer
Copy link
Contributor Author

For some reason the cache ops health check always seems to fail, but that health check is not included in the (default) /health endpoint anyway, possibly its broken?

@Kircheneer Kircheneer marked this pull request as ready for review November 8, 2022 12:58
@Kircheneer Kircheneer changed the title WIP: Add prometheus metrics for health check results. Add prometheus metrics for health check results. Nov 8, 2022
@bryanculver
Copy link
Member

For some reason the cache ops health check always seems to fail, but that health check is not included in the (default) /health endpoint anyway, possibly its broken?

Well we are ripping out Cacheops in 2.0 and it's already been disabled by default for new installs in 1.5.

What is the intended consumer of the /metrics endpoint?

@bryanculver bryanculver self-assigned this Nov 22, 2022
@Kircheneer
Copy link
Contributor Author

What is the intended consumer of the /metrics endpoint?

Prometheus/Telegraf for telemetry purposes.

nautobot/extras/health_checks.py Outdated Show resolved Hide resolved
nautobot/extras/health_checks.py Outdated Show resolved Hide resolved
@Kircheneer Kircheneer force-pushed the lk-health-check-metrics branch 2 times, most recently from 7185892 to a168e8e Compare November 23, 2022 08:24
@bryanculver bryanculver mentioned this pull request Nov 28, 2022
6 tasks
@bryanculver bryanculver merged commit 61c8663 into nautobot:develop Nov 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants