-
Notifications
You must be signed in to change notification settings - Fork 1
Added wal-g exporter to prometheus #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
exporter/README.md
Outdated
The exporter provides the following metrics: | ||
|
||
### Backup Metrics | ||
- `walg_backup_lag_seconds{backup_type}` - Time since last backup-push in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for all timestamps, let's specify clearly if it's timestamp of beginning of the process of end of it
exporter/README.md
Outdated
|
||
### Backup Metrics | ||
- `walg_backup_lag_seconds{backup_type}` - Time since last backup-push in seconds | ||
- `walg_backup_count{backup_type}` - Number of backups (full/delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
successful attempts only or all of them?
exporter/README.md
Outdated
- `walg_backup_timestamp{backup_type}` - Timestamp of last backup | ||
|
||
### WAL Metrics | ||
- `walg_wal_lag_seconds{timeline}` - Time since last wal-push in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- `walg_wal_lag_seconds{timeline}` - Time since last wal-push in seconds | |
- `walg_wal_lag_seconds{timeline}` - Time since last successful wal-push in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another question: "time since" is a derived metric. Isn't it better to export timestamps and let monitoring decide what to show to users/AI, raw timestamps or lag values (or both)?
exporter/README.md
Outdated
- `walg_wal_integrity_status{timeline}` - WAL integrity status (1 = OK, 0 = ERROR) | ||
|
||
### PITR Metrics | ||
- `walg_pitr_window_seconds` - Point-in-time recovery window size in seconds |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if we have gaps / multiple windows?
…ed walg_backup_start_timestamp
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…from internal/databases/postgres/lsn.go
Database name
PostgreSQL - This PR adds a Prometheus exporter for WAL-G PostgreSQL backup and WAL monitoring.
Pull request description
Describe what this PR adds
This PR introduces a WAL-G Prometheus Exporter that provides comprehensive observability for WAL-G backup operations for PostgreSQL databases.
🎯 What This PR Adds
This PR adds a complete Prometheus exporter (
/exporter
directory) with the following capabilities:Core Exporter Components:
exporter.go
- Main Prometheus collector implementationmain.go
- HTTP server and CLI interfacepitr.go
- Point-in-time recovery window calculationswal_lag.go
- LSN parsing and WAL lag calculation logicmock-wal-g
- Mock script for testing and developmentgo.mod/go.sum
- Go module dependenciesKey Features:
📊 Backup Monitoring
_D_
suffix naming convention to correctly distinguish full vs incremental backupsbase_backup
label showing which full backup they're based on📈 WAL Stream Monitoring
🔍 Storage Health Monitoring
⏰ PITR & Recovery Monitoring
🔧 Operational Metrics
📊 Metrics Provided
Backup Metrics
Critical Labels:
backup_type
:full
ordelta
(correctly determined by_D_
suffix presence)base_backup
: For incremental backups, shows which full backup they're based onbackup_name
: Complete backup identifierWAL Metrics
Storage & Health Metrics
🔧 Technical Implementation Highlights
✅ Correct Backup Type Detection
One of the key technical achievements is accurate backup type classification:
The Problem: Naive implementations often mark ALL backups as "full" because they all start with
base_
prefix.The Solution: This exporter correctly uses WAL-G's actual naming convention:
base_000000010000000000000025
(no_D_
suffix)base_000000010000000500000007_D_000000010000000000000025
(contains_D_
)⏱️ Dual Timestamp Architecture
walg_backup_start_timestamp
- When backup operation startedwalg_backup_finish_timestamp
- When backup completed successfully🧪 Comprehensive Testing Framework
🚀 Usage
Basic Usage
Configuration Options
Prometheus Integration
📈 Monitoring Examples
Backup Age Monitoring
Storage Health
🧪 Testing
Development Testing
Integration Testing
📋 Files Added
This PR adds the complete
/exporter
directory with:exporter.go
- Core Prometheus collector (466 lines)main.go
- HTTP server and CLI interfacepitr.go
- PITR window calculation logicwal_lag.go
- LSN parsing and lag calculationmock-wal-g
- Testing mock scriptREADME.md
- Comprehensive documentationgo.mod/go.sum
- Go module configuration🎯 Value Proposition
This exporter transforms WAL-G from a "black box" backup solution into a fully observable system:
🔗 Dependencies
The exporter requires:
--walg.path
📚 Documentation
Complete documentation is provided in
/exporter/README.md
including: