You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I think one of the key metrics to have, from a "is my dashboard up to date" perspective, is a "last sequence ID synced from couchdb" for all databases being synced. I suggest we need this as a key feature for launch.
Based on how Watchdog monitors this today, a drop in replacement would be to have a /metrics HTTP endpoint that looks like this (I choose "logstash" as the metric, but this can be what ever good name makes sense):
# HELP logstash_progress_sequence cht-sync backlog.
# TYPE logstash_progress_sequence counter
logstash_progress_sequence{cht_instance="cht.example.com",db="_users",job="db_targets",target="postgres.example.com"} 4
logstash_progress_sequence{cht_instance="cht.example.com",db="medic",job="db_targets",target="postgres.example.com"} 232
logstash_progress_sequence{cht_instance="cht.example.com",db="medic-logs",job="db_targets",target="postgres.example.com"} 21
logstash_progress_sequence{cht_instance="cht.example.com",db="medic-sentinel",job="db_targets",target="postgres.example.com"} 130
logstash_progress_sequence{cht_instance="cht.example.com",db="medic-users-meta",job="db_targets",target="postgres.example.com"} 6
# HELP scrape_duration_seconds How long it took to scrape the target in seconds
# TYPE scrape_duration_seconds gauge
scrape_duration_seconds{job="db_targets",target="postgres.example.com"} 0.000498091
# HELP up 1 if the target is reachable, or 0 if the scrape failed
# TYPE up gauge
up{job="db_targets",target="postgres.example.com"} 1
If exposing it as a Prometheus native endpoint is too hard, then simply mirroring the SQL Schema used in couch2pg will be fine. Here's the couchdb_progress schema:
CREATE TABLE
public.couchdb_progress (
seq character varying NULL,
source character varying NOT NULL
);
And here's 4 example rows. Note that each row allows you to know which CHT Core instance is being maintained, which database it is, the sequence count and the sequence ID. Sequence ID is truncated for brevity, they're much longer:
Set up and document a monitoring solution for CHT Sync (CHT Watchdog), together with relevant metrics and alerts.
The text was updated successfully, but these errors were encountered: