Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move to non-deprecated sql exporter #110

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

mrjones-plip
Copy link
Collaborator

@mrjones-plip mrjones-plip commented May 2, 2024

Following the recent work done in medic/cht-watchdog#81, this PR

  • updates compose file to stop using the SQL exporter from deprecated one (prometheuscommunity/postgres-exporter) and start using the supported one (burningalchemist/sql_exporter)
  • converts queries from old YAML format to new
  • updates scrape files in prometheus
  • updates JSON dashboard files in grafana

per #112

@mrjones-plip
Copy link
Collaborator Author

mrjones-plip commented May 2, 2024

test steps

  1. in watchdog repo, check out branch mrjones-sql-exporter-data-ingest-dont-merge
  2. in watchdog, make sure you have a symlink data-ingest -> ../cht-app-monitoring-data-ingest/watchdog-config
  3. in data ingest, checkout branchmrjones-migrate-sql-exporter
  4. in data ingest, copy watchdog-config/sql_servers_example.yml to watchdog-config/sql_servers.yml
  5. in data ingest edit sql_servers.yml to have a valid postgres username and password and IP based off your local RDBMS tunnel
  6. in data ingest, edit scrape.yml so that scrape_interval: 10s and scrape_timeout: 5s - don't commit these values though!
  7. in watchdog ./development/kill.start.ips.sh

demo video of test steps

data-ingest-exporter-demo.webm

demo mapping new metrics to old metrics

taking the dwh_replication_by_status metric, we can open the panel in the "Edit" view and compare it to the live metric on Watchdog. We can get metric parity by:

  1. selecting the the new metric: replication_by_status_replication_failure_count -> dwh_replication_by_status
  2. adding another label filter: type = replication_failure_count

old: last_over_time(replication_by_status_replication_failure_count{cht_instance="$cht_instance"}[$__interval])
new: last_over_time(dwh_replication_by_status{cht_instance="$cht_instance", type="replication_failure_count"}[$__interval])

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant