Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHT Sync monitoring #84

Open
andrablaj opened this issue Apr 12, 2024 · 1 comment
Open

CHT Sync monitoring #84

andrablaj opened this issue Apr 12, 2024 · 1 comment
Labels
Priority: 3 - Low Can be bumped from the release

Comments

@andrablaj
Copy link
Member

Set up and document a monitoring solution for CHT Sync (CHT Watchdog), together with relevant metrics and alerts.

@mrjones-plip
Copy link
Contributor

mrjones-plip commented May 17, 2024

I think one of the key metrics to have, from a "is my dashboard up to date" perspective, is a "last sequence ID synced from couchdb" for all databases being synced. I suggest we need this as a key feature for launch.

Based on how Watchdog monitors this today, a drop in replacement would be to have a /metrics HTTP endpoint that looks like this (I choose "logstash" as the metric, but this can be what ever good name makes sense):

# HELP logstash_progress_sequence cht-sync backlog.
# TYPE logstash_progress_sequence counter
logstash_progress_sequence{cht_instance="cht.example.com",db="_users",job="db_targets",target="postgres.example.com"} 4
logstash_progress_sequence{cht_instance="cht.example.com",db="medic",job="db_targets",target="postgres.example.com"} 232
logstash_progress_sequence{cht_instance="cht.example.com",db="medic-logs",job="db_targets",target="postgres.example.com"} 21
logstash_progress_sequence{cht_instance="cht.example.com",db="medic-sentinel",job="db_targets",target="postgres.example.com"} 130
logstash_progress_sequence{cht_instance="cht.example.com",db="medic-users-meta",job="db_targets",target="postgres.example.com"} 6
# HELP scrape_duration_seconds How long it took to scrape the target in seconds
# TYPE scrape_duration_seconds gauge
scrape_duration_seconds{job="db_targets",target="postgres.example.com"} 0.000498091
# HELP up 1 if the target is reachable, or 0 if the scrape failed
# TYPE up gauge
up{job="db_targets",target="postgres.example.com"} 1

If exposing it as a Prometheus native endpoint is too hard, then simply mirroring the SQL Schema used in couch2pg will be fine. Here's the couchdb_progress schema:

CREATE TABLE
  public.couchdb_progress (
    seq character varying NULL,
    source character varying NOT NULL
  );

And here's 4 example rows. Note that each row allows you to know which CHT Core instance is being maintained, which database it is, the sequence count and the sequence ID. Sequence ID is truncated for brevity, they're much longer:

"132-g1AAAAOReJyV0s1N-SNIP-tdx7tu4XR6D2hQ"	"cht.example.com/medic"
"20-g1AAAANheJyV0s9Nw-SNIP-cs5OlfYfl5HuKQ"	"cht.example.com/medic-logs"
"3-g1AAAANBeJyV0s9Nwz-SNIP-lX0qHS_QDY5OzD"	"cht.example.com/medic-users-meta"
"72-g1AAAAOReJyd0s1Nw-SNIP-XB7blz_Q_E5fc2"	"cht.example.com/medic-sentinel"

@andrablaj andrablaj added this to the CHT Sync Production milestone May 30, 2024
@andrablaj andrablaj added the Priority: 3 - Low Can be bumped from the release label May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: 3 - Low Can be bumped from the release
Projects
Status: Todo
Development

No branches or pull requests

2 participants