You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This affects replication monitoring in the way that if only pg_up and pg_replication_lag_seconds are monitored in Secondary servers and there's a network outage between Primary and Secondary servers, Secondary servers get lagged without any alarm being triggered.
It seems more reasonable to monitor replication looking at Primary server data. SELECT COUNT(*) FROM pg_stat_replication WHERE client_addr='SLAVE_IP' AND state = 'streaming';
If it returns 0, we have an unreachable Secondary server.
SELECT COALESCE(EXTRACT(EPOCH FROM replay_lag)::bigint, 0) AS replay_lag FROM pg_stat_replication WHERE client_addr='SLAVE_IP';
If it returns more than X we have a lagged Secondary server.
Proposal
There are existing queries for
pg_stat_replication
incmd/postgres_exporter/queries.go
. These metrics should be migrated to the collector package.The text was updated successfully, but these errors were encountered: