P.O.D.S.

Postgres Orchestrated Docker Stack

High availability PostgreSQL cluster with 1 primary (R/W) and 2 read replicas,
using native streaming replication and PgBouncer connection pooling — all in Docker.

Quick Start

# Launch the entire stack
docker compose up -d

# Check all services are running
docker compose ps

# Verify replication is streaming
docker exec pg_master psql -U postgres -d appdb \
  -c "SELECT client_addr, state, sent_lsn, replay_lsn FROM pg_stat_replication;"

# Tear down (data is preserved in ./data/)
docker compose down

Architecture

Component	Role	Host Port	Details
Master	Primary (R/W)	`5440`	Accepts all writes, streams WAL to replicas
Replica 1	Standby (R/O)	`5433`	Hot standby, receives WAL via `replica1_slot`
Replica 2	Standby (R/O)	`5434`	Hot standby, receives WAL via `replica2_slot`
Replica N	Standby (R/O)	`5435+`	Dynamically added via `./pods.sh add`
PgBouncer	Connection Pooler	`6432`	Transaction-mode pooling with named read/write pools
Health API	Monitoring	`8080`	REST API exposing cluster health, replication lag, node status
Prometheus	Metrics	`9090`	Scrapes all exporters, stores time-series data
Grafana	Dashboards	`3000`	Pre-built dashboards for cluster, replication, resources
Backup	Scheduled Backups	—	pg_basebackup + WAL archiving, S3/local, cron-scheduled
cAdvisor	Resources	`8081`	Container CPU, RAM, and network metrics

Data Flow

Client App
    │
    ▼
PgBouncer (:6432)
    │
    ├── appdb_write ──→ Master (:5440)  ── R/W
    ├── appdb_read1 ──→ Replica1 (:5433) ── R/O
    ├── appdb_read2 ──→ Replica2 (:5434) ── R/O
    └── appdb ────────→ Master (:5440)  ── default

Master ──── WAL Stream ───→ Replica1
       ├─── WAL Stream ───→ Replica2
       └─── WAL Stream ───→ ReplicaN...  (dynamically scaled)

Health API (:8080)
    │
    ├── /health            ──→ Full cluster overview (auto-discovers all replicas)
    ├── /health/master     ──→ Master node status
    ├── /health/replicaN   ──→ Specific replica status
    └── /health/replication──→ Replication lag & slot details

Connection Details

# Direct connections
psql -h localhost -p 5440 -U postgres -d appdb        # Master (R/W)
psql -h localhost -p 5433 -U postgres -d appdb        # Replica 1 (R/O)
psql -h localhost -p 5434 -U postgres -d appdb        # Replica 2 (R/O)

# Via PgBouncer (pooled)
psql -h localhost -p 6432 -U postgres -d appdb_write  # → Master
psql -h localhost -p 6432 -U postgres -d appdb_read1  # → Replica 1
psql -h localhost -p 6432 -U postgres -d appdb_read2  # → Replica 2
psql -h localhost -p 6432 -U postgres -d appdb        # → Master (default)

Replica Scaling

Dynamically add or remove read replicas without editing any config files.

./pods.sh add              # Auto-adds replica3 (next available)
./pods.sh add replica5     # Adds a specific replica
./pods.sh remove replica3  # Removes a dynamic replica
./pods.sh list             # Lists all replicas with status
./pods.sh status           # Full cluster health overview

Each add automatically: creates a replication slot, generates a compose override, adds a PgBouncer read pool, spins up a postgres_exporter, updates Prometheus targets, starts the container, and waits for streaming confirmation. See Scaling Documentation for details.

Health API

REST API on port 8080 for monitoring cluster health. Auto-discovers all replicas — no config changes needed when scaling.

curl http://localhost:8080/health          # Full cluster status (200 = healthy, 503 = degraded)
curl http://localhost:8080/health/master   # Master node only
curl http://localhost:8080/health/replica1 # Specific replica
curl http://localhost:8080/health/replication # Replication lag & slots

See Health API Documentation for response schemas and integration guide.

Monitoring (Prometheus + Grafana)

Full observability with a pre-built dashboard. Open http://localhost:3000 (default: admin / pods_admin).

Dashboard panels:

Cluster Overview — UP/DOWN status per node, active connections, database size
Replication — Lag in bytes over time, WAL retained per slot
Queries & Throughput — TPS per node, active/idle queries, row operations (insert/update/delete rates)
Connections per Node — Connection state breakdown for master and each replica
PgBouncer — Client/server connections, waiting clients, queries per second per pool
Container CPU — CPU usage over time + current snapshot per container
Container Memory — RAM usage over time + current snapshot per container
Resource Summary — Total CPU/RAM for PG nodes and full stack

Exporters auto-scale with replicas — ./pods.sh add creates a postgres_exporter and updates Prometheus targets automatically.

See Monitoring Documentation for metrics reference, PromQL queries, and configuration.

Backup & Restore

Automated backups with scheduled pg_basebackup snapshots and continuous WAL archiving. Supports local storage and S3.

./pods.sh backup             # Immediate full backup
./pods.sh backup-list        # List all backups with sizes
./pods.sh restore full_25-03-2026_14-30-00.tar.gz  # Restore a backup

Schedule: Configurable via BACKUP_SCHEDULE (default: daily at 2 AM)
Retention: Auto-cleanup after BACKUP_RETENTION_DAYS (default: 7 days)
Storage: Auto-detects S3 when AWS_ACCESS_KEY_ID + AWS_S3_BUCKET are set; otherwise stores locally
WAL archiving: Continuous via archive_command for point-in-time recovery capability
Naming: full_DD-MM-YYYY_HH-MM-SS.tar.gz

See Backup Documentation for S3 setup, restore procedures, and architecture details.

Project Structure

pods/
├── docker-compose.yml              # Service orchestration (base stack)
├── docker-compose.override.yml     # Dynamic replicas (auto-generated by pods.sh)
├── .env                            # Credentials & config (gitignored)
├── .gitignore
├── pods.sh                         # Replica scaling management script
├── config/
│   ├── pgbouncer/
│   │   ├── pgbouncer.ini           # Pool definitions & settings
│   │   └── userlist.txt            # PgBouncer auth credentials
│   ├── prometheus/
│   │   ├── prometheus.yml          # Scrape targets & jobs
│   │   └── pg_targets.json         # Dynamic replica targets (auto-managed)
│   └── grafana/
│       ├── provisioning/           # Auto-provisioned datasource & dashboard config
│       └── dashboards/
│           └── pods-cluster.json   # Pre-built cluster overview dashboard
├── scripts/
│   ├── master-init.sh              # Creates replication user & slots
│   └── replica-init.sh             # Runs pg_basebackup & configures standby
├── health-api/
│   ├── Dockerfile                  # Python 3.13 slim image
│   ├── requirements.txt            # FastAPI, uvicorn, psycopg2
│   └── main.py                     # Health API with auto-discovery
├── backup/
│   ├── Dockerfile                  # Backup container (postgres:18 + cron + awscli)
│   ├── backup.sh                   # Full backup, WAL archive, cleanup, list
│   ├── restore.sh                  # Restore a backup to target directory
│   └── entrypoint.sh               # Cron scheduler entrypoint
├── data/                           # Persistent PostgreSQL data (gitignored)
│   ├── master/
│   ├── replica1/
│   ├── replica2/
│   └── replicaN/                   # Created dynamically by pods.sh
└── documentation/
    ├── pods.png                    # Architecture diagram
    ├── master.md                   # Master node deep-dive
    ├── replication.md              # Streaming replication explained
    ├── pgbouncer.md                # Connection pooling config
    ├── environment.md              # Environment variables reference
    ├── operations.md               # Ops, monitoring & troubleshooting
    ├── health-api.md               # Health API endpoints & responses
    ├── scaling.md                  # Dynamic replica scaling guide
    ├── monitoring.md              # Prometheus, Grafana, exporters, metrics
    └── backup.md                  # Backup & restore guide

Documentation

Module	What's Inside
Master Node	WAL parameters, `pg_hba.conf` rules, replication user & slot creation, health checks
Replication	How streaming replication works, `pg_basebackup` flags, `standby.signal`, slot mechanics, why not Patroni
PgBouncer	Pool modes (session/transaction/statement), every `pgbouncer.ini` parameter, monitoring commands
Environment	All `.env` variables, defaults, how they flow through the stack, production recommendations
Operations	Start/stop, log viewing, replication monitoring queries, data reset, troubleshooting guide
Health API	REST endpoints, response schemas, auto-discovery, integration with load balancers
Scaling	`pods.sh` commands, how dynamic replicas work, override file, PgBouncer auto-config
Monitoring	Prometheus config, Grafana dashboards, all exporters, metrics reference, PromQL queries
Backup & Restore	Scheduled backups, WAL archiving, S3/local storage, restore procedures

Key Design Decisions

Decision	Reasoning
Native streaming replication over Patroni	Fixed topology (1 primary + 2 replicas) — no need for automatic leader election. Simpler, fewer dependencies
Physical replication slots	Guarantees WAL retention per replica — prevents data loss if a replica goes offline temporarily
PgBouncer in transaction mode	Best connection reuse for web workloads without sacrificing transaction safety
Bind-mount volumes over Docker volumes	Data lives in `./data/` — visible, portable, easy to backup and inspect
PostgreSQL 18 + `io_uring`	Native async I/O reduces syscall overhead — measurable throughput gains on read-heavy replicas
Auto-discovery in Health API	Queries `pg_replication_slots` to find replicas — no config changes needed when scaling
`docker-compose.override.yml` for scaling	Dynamic replicas live in an auto-generated override — base compose stays clean and version-controlled

_{Built with ❤️ PostgreSQL, PgBouncer, and Docker Compose}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

P.O.D.S.

Quick Start

Architecture

Data Flow

Connection Details

Replica Scaling

Health API

Monitoring (Prometheus + Grafana)

Backup & Restore

Project Structure

Documentation

Key Design Decisions

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
backup		backup
config		config
documentation		documentation
health-api		health-api
scripts		scripts
.gitignore		.gitignore
README.md		README.md
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
env.example		env.example
pods.sh		pods.sh

Folders and files

Latest commit

History

Repository files navigation

P.O.D.S.

Quick Start

Architecture

Data Flow

Connection Details

Replica Scaling

Health API

Monitoring (Prometheus + Grafana)

Backup & Restore

Project Structure

Documentation

Key Design Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages