-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CHT Sync monitoring #84
Comments
I think one of the key metrics to have, from a "is my dashboard up to date" perspective, is a "last sequence ID synced from couchdb" for all databases being synced. I suggest we need this as a key feature for launch. Based on how Watchdog monitors this today, a drop in replacement would be to have a
If exposing it as a Prometheus native endpoint is too hard, then simply mirroring the SQL Schema used in couch2pg will be fine. Here's the CREATE TABLE
public.couchdb_progress (
seq character varying NULL,
source character varying NOT NULL
); And here's 4 example rows. Note that each row allows you to know which CHT Core instance is being maintained, which database it is, the sequence count and the sequence ID. Sequence ID is truncated for brevity, they're much longer:
|
having metrics scraped from sql is convenient since most of what we need is there already or easy to add. for the metrics themselves
Potentially could do (dashboard last update - current time) but that would have to be dynamic somehow |
Looking good @witash ! Agree that remote DB permissions vs exposing ingress is a tricky choice to make. I defer to eco team for how to best proceed, but suspect that leaving it in the DB would be fine as this status quo as compared to couch2pg and we can always improve it later. Which ever route we go, be sure we end up with the URL of the CHT instance of the stats! Critical for multi-tenant CHT Sync deployments which I think MoH KE wanted. |
* feat(#84): add optional sql exporter and ingress * feat(#84): adding pending and update time to couch2pg * feat(#84): adding dbt monitoring queries * chore(#84): fix lints * chore(#84): fix tests * feat(#84): separate request to get pending, and null if unknown * chore(#84): adding tests * feat(#84): better query for dbt_latency * chore(#84): fixing lint * chore(#84): fixing tests * chore(#84): adding upgrade script
@witash this issue was moved to |
Set up and document a monitoring solution for CHT Sync (CHT Watchdog), together with relevant metrics and alerts.
The text was updated successfully, but these errors were encountered: