Postgres Orchestrated Docker Stack
High availability PostgreSQL cluster with 1 primary (R/W) and 2 read replicas,
using native streaming replication and PgBouncer connection pooling — all in Docker.
# Launch the entire stack
docker compose up -d
# Check all services are running
docker compose ps
# Verify replication is streaming
docker exec pg_master psql -U postgres -d appdb \
-c "SELECT client_addr, state, sent_lsn, replay_lsn FROM pg_stat_replication;"
# Tear down (data is preserved in ./data/)
docker compose down| Component | Role | Host Port | Details |
|---|---|---|---|
| Master | Primary (R/W) | 5440 |
Accepts all writes, streams WAL to replicas |
| Replica 1 | Standby (R/O) | 5433 |
Hot standby, receives WAL via replica1_slot |
| Replica 2 | Standby (R/O) | 5434 |
Hot standby, receives WAL via replica2_slot |
| Replica N | Standby (R/O) | 5435+ |
Dynamically added via ./pods.sh add |
| PgBouncer | Connection Pooler | 6432 |
Transaction-mode pooling with named read/write pools |
| Health API | Monitoring | 8080 |
REST API exposing cluster health, replication lag, node status |
| Prometheus | Metrics | 9090 |
Scrapes all exporters, stores time-series data |
| Grafana | Dashboards | 3000 |
Pre-built dashboards for cluster, replication, resources |
| Backup | Scheduled Backups | — | pg_basebackup + WAL archiving, S3/local, cron-scheduled |
| cAdvisor | Resources | 8081 |
Container CPU, RAM, and network metrics |
Client App
│
▼
PgBouncer (:6432)
│
├── appdb_write ──→ Master (:5440) ── R/W
├── appdb_read1 ──→ Replica1 (:5433) ── R/O
├── appdb_read2 ──→ Replica2 (:5434) ── R/O
└── appdb ────────→ Master (:5440) ── default
Master ──── WAL Stream ───→ Replica1
├─── WAL Stream ───→ Replica2
└─── WAL Stream ───→ ReplicaN... (dynamically scaled)
Health API (:8080)
│
├── /health ──→ Full cluster overview (auto-discovers all replicas)
├── /health/master ──→ Master node status
├── /health/replicaN ──→ Specific replica status
└── /health/replication──→ Replication lag & slot details
# Direct connections
psql -h localhost -p 5440 -U postgres -d appdb # Master (R/W)
psql -h localhost -p 5433 -U postgres -d appdb # Replica 1 (R/O)
psql -h localhost -p 5434 -U postgres -d appdb # Replica 2 (R/O)
# Via PgBouncer (pooled)
psql -h localhost -p 6432 -U postgres -d appdb_write # → Master
psql -h localhost -p 6432 -U postgres -d appdb_read1 # → Replica 1
psql -h localhost -p 6432 -U postgres -d appdb_read2 # → Replica 2
psql -h localhost -p 6432 -U postgres -d appdb # → Master (default)Dynamically add or remove read replicas without editing any config files.
./pods.sh add # Auto-adds replica3 (next available)
./pods.sh add replica5 # Adds a specific replica
./pods.sh remove replica3 # Removes a dynamic replica
./pods.sh list # Lists all replicas with status
./pods.sh status # Full cluster health overviewEach add automatically: creates a replication slot, generates a compose override, adds a PgBouncer read pool, spins up a postgres_exporter, updates Prometheus targets, starts the container, and waits for streaming confirmation. See Scaling Documentation for details.
REST API on port 8080 for monitoring cluster health. Auto-discovers all replicas — no config changes needed when scaling.
curl http://localhost:8080/health # Full cluster status (200 = healthy, 503 = degraded)
curl http://localhost:8080/health/master # Master node only
curl http://localhost:8080/health/replica1 # Specific replica
curl http://localhost:8080/health/replication # Replication lag & slotsSee Health API Documentation for response schemas and integration guide.
Full observability with a pre-built dashboard. Open http://localhost:3000 (default: admin / pods_admin).
Dashboard panels:
- Cluster Overview — UP/DOWN status per node, active connections, database size
- Replication — Lag in bytes over time, WAL retained per slot
- Queries & Throughput — TPS per node, active/idle queries, row operations (insert/update/delete rates)
- Connections per Node — Connection state breakdown for master and each replica
- PgBouncer — Client/server connections, waiting clients, queries per second per pool
- Container CPU — CPU usage over time + current snapshot per container
- Container Memory — RAM usage over time + current snapshot per container
- Resource Summary — Total CPU/RAM for PG nodes and full stack
Exporters auto-scale with replicas — ./pods.sh add creates a postgres_exporter and updates Prometheus targets automatically.
See Monitoring Documentation for metrics reference, PromQL queries, and configuration.
Automated backups with scheduled pg_basebackup snapshots and continuous WAL archiving. Supports local storage and S3.
./pods.sh backup # Immediate full backup
./pods.sh backup-list # List all backups with sizes
./pods.sh restore full_25-03-2026_14-30-00.tar.gz # Restore a backup- Schedule: Configurable via
BACKUP_SCHEDULE(default: daily at 2 AM) - Retention: Auto-cleanup after
BACKUP_RETENTION_DAYS(default: 7 days) - Storage: Auto-detects S3 when
AWS_ACCESS_KEY_ID+AWS_S3_BUCKETare set; otherwise stores locally - WAL archiving: Continuous via
archive_commandfor point-in-time recovery capability - Naming:
full_DD-MM-YYYY_HH-MM-SS.tar.gz
See Backup Documentation for S3 setup, restore procedures, and architecture details.
pods/
├── docker-compose.yml # Service orchestration (base stack)
├── docker-compose.override.yml # Dynamic replicas (auto-generated by pods.sh)
├── .env # Credentials & config (gitignored)
├── .gitignore
├── pods.sh # Replica scaling management script
├── config/
│ ├── pgbouncer/
│ │ ├── pgbouncer.ini # Pool definitions & settings
│ │ └── userlist.txt # PgBouncer auth credentials
│ ├── prometheus/
│ │ ├── prometheus.yml # Scrape targets & jobs
│ │ └── pg_targets.json # Dynamic replica targets (auto-managed)
│ └── grafana/
│ ├── provisioning/ # Auto-provisioned datasource & dashboard config
│ └── dashboards/
│ └── pods-cluster.json # Pre-built cluster overview dashboard
├── scripts/
│ ├── master-init.sh # Creates replication user & slots
│ └── replica-init.sh # Runs pg_basebackup & configures standby
├── health-api/
│ ├── Dockerfile # Python 3.13 slim image
│ ├── requirements.txt # FastAPI, uvicorn, psycopg2
│ └── main.py # Health API with auto-discovery
├── backup/
│ ├── Dockerfile # Backup container (postgres:18 + cron + awscli)
│ ├── backup.sh # Full backup, WAL archive, cleanup, list
│ ├── restore.sh # Restore a backup to target directory
│ └── entrypoint.sh # Cron scheduler entrypoint
├── data/ # Persistent PostgreSQL data (gitignored)
│ ├── master/
│ ├── replica1/
│ ├── replica2/
│ └── replicaN/ # Created dynamically by pods.sh
└── documentation/
├── pods.png # Architecture diagram
├── master.md # Master node deep-dive
├── replication.md # Streaming replication explained
├── pgbouncer.md # Connection pooling config
├── environment.md # Environment variables reference
├── operations.md # Ops, monitoring & troubleshooting
├── health-api.md # Health API endpoints & responses
├── scaling.md # Dynamic replica scaling guide
├── monitoring.md # Prometheus, Grafana, exporters, metrics
└── backup.md # Backup & restore guide
| Module | What's Inside |
|---|---|
| Master Node | WAL parameters, pg_hba.conf rules, replication user & slot creation, health checks |
| Replication | How streaming replication works, pg_basebackup flags, standby.signal, slot mechanics, why not Patroni |
| PgBouncer | Pool modes (session/transaction/statement), every pgbouncer.ini parameter, monitoring commands |
| Environment | All .env variables, defaults, how they flow through the stack, production recommendations |
| Operations | Start/stop, log viewing, replication monitoring queries, data reset, troubleshooting guide |
| Health API | REST endpoints, response schemas, auto-discovery, integration with load balancers |
| Scaling | pods.sh commands, how dynamic replicas work, override file, PgBouncer auto-config |
| Monitoring | Prometheus config, Grafana dashboards, all exporters, metrics reference, PromQL queries |
| Backup & Restore | Scheduled backups, WAL archiving, S3/local storage, restore procedures |
| Decision | Reasoning |
|---|---|
| Native streaming replication over Patroni | Fixed topology (1 primary + 2 replicas) — no need for automatic leader election. Simpler, fewer dependencies |
| Physical replication slots | Guarantees WAL retention per replica — prevents data loss if a replica goes offline temporarily |
| PgBouncer in transaction mode | Best connection reuse for web workloads without sacrificing transaction safety |
| Bind-mount volumes over Docker volumes | Data lives in ./data/ — visible, portable, easy to backup and inspect |
PostgreSQL 18 + io_uring |
Native async I/O reduces syscall overhead — measurable throughput gains on read-heavy replicas |
| Auto-discovery in Health API | Queries pg_replication_slots to find replicas — no config changes needed when scaling |
docker-compose.override.yml for scaling |
Dynamic replicas live in an auto-generated override — base compose stays clean and version-controlled |
Built with ❤️ PostgreSQL, PgBouncer, and Docker Compose
