Skip to content

Conversation

@theTibi
Copy link

@theTibi theTibi commented Oct 20, 2025

…metrics

Updated the pg_custom_replication_wal query to conditionally return the correct WAL metrics based on the database recovery state. Added logic to differentiate between received and current WAL LSNs, and adjusted lag calculation accordingly. Set the 'master' flag to true for this query to ensure it runs on the primary database instance.

…metrics

Updated the pg_custom_replication_wal query to conditionally return the correct WAL metrics based on the database recovery state. Added logic to differentiate between received and current WAL LSNs, and adjusted lag calculation accordingly. Set the 'master' flag to true for this query to ensure it runs on the primary database instance.
@theTibi theTibi requested a review from a team as a code owner October 20, 2025 13:21
@theTibi
Copy link
Author

theTibi commented Oct 20, 2025

@kaminenibhargav can you have a look on this. Thanks.

Updated the pg_custom_replication_wal query to improve readability and ensure accurate reporting of WAL metrics based on the node type (primary or replica). Adjusted the descriptions of the metrics to specify their relevance to either the primary or replica nodes, enhancing the overall understanding of the metrics collected.
@theTibi theTibi enabled auto-merge October 22, 2025 10:24
@theTibi theTibi disabled auto-merge October 22, 2025 10:24
@theTibi theTibi self-assigned this Oct 22, 2025
@kaminenibhargav
Copy link
Collaborator

Hi Team - for some PG versions, we observe below error :

Oct 22 23:04:22 ip-10-25-2-73 pmm-agent[1574337]: time="2025-10-22T23:04:22.222-04:00" level=info msg="ts=2025-10-23T03:04:22.222Z caller=namespace.go:241 level=info err=\"Unexpected error parsing column: pg_custom_replication_wal replayed_lsn [52 56 68 67 47 56 55 70 70 70 55 54 48]\\n\"" agentID=c6398591-a78b-4e07-9a1f-3592432c40bf component=agent-process type=postgres_exporter

I think it's good to use pg_wal_lsn_diff to get the lag diff. Please check the below query

pg_custom_replication_wal:
  master: true
  query: |
    SELECT
      CASE WHEN pg_is_in_recovery() THEN 'replica' ELSE 'primary' END AS node_type,
      CASE WHEN pg_is_in_recovery() THEN pg_wal_lsn_diff(pg_last_wal_receive_lsn(), '0/0')::bigint ELSE NULL END AS received_lsn,
      CASE WHEN pg_is_in_recovery() THEN pg_wal_lsn_diff(pg_last_wal_replay_lsn(), '0/0')::bigint ELSE NULL END AS replayed_lsn,
      CASE WHEN pg_is_in_recovery() THEN NULL ELSE pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0')::bigint END AS current_lsn,
      CASE
        WHEN pg_is_in_recovery() THEN pg_wal_lsn_diff(pg_last_wal_receive_lsn(), pg_last_wal_replay_lsn())
        ELSE NULL
      END AS lag_bytes;
   metrics:
    - node_type:
        usage: "LABEL"
        description: "Type of node (primary or replica)."
    - received_lsn:
        usage: "GAUGE"
        description: "Last WAL location received by the standby server (replica only) in bytes."
    - replayed_lsn:
        usage: "GAUGE"
        description: "Last WAL location replayed by the standby server (replica only) in bytes."
    - current_lsn:
        usage: "GAUGE"
        description: "Current WAL location on the primary server (primary only) in bytes."
    - lag_bytes:
        usage: "GAUGE"
        description: "Current WAL replication lag in bytes (replica only)."

pg_wal_lsn_diff(lsn1, lsn2) → returns a difference in bytes

@ademidoff ademidoff merged commit 4c057a6 into main Oct 29, 2025
6 checks passed
@ademidoff ademidoff deleted the PMM-14411 branch October 29, 2025 14:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants