A production-ready PostgreSQL 18 image optimized for data lake and analytical workloads. Built on Debian Bookworm with pre-configured extensions for modern data processing, foreign data wrappers, vector operations, and geospatial analysis.
- PostgreSQL Contrib: uuid-ossp,pgcrypto,hstore,ltree,tablefunc,postgres_fdw,file_fdw
- Statistics & Monitoring: pg_stat_statementswith enhanced tracking
- Job Scheduling: pg_cronfor automated database tasks
- file_fdw: Built-in support for CSV, TSV, and other delimited files
- pg_parquet: Coming soon - Native Parquet support (requires Rust toolchain)
- postgres_fdw: Connect to other PostgreSQL databases
- ClickHouse FDW: Temporarily unavailable - Waiting for PostgreSQL 18 compatibility
- pg_partman: Automated table partitioning and maintenance
- pgvector: Vector similarity search and embeddings (AI/ML workloads)
- PostGIS: Full spatial database capabilities
- PostGIS Raster: Raster data support
- PostGIS Topology: Topological analysis
# Pull and run the image
docker run -d \
  --name postgres-datalake \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=your_secure_password \
  -e POSTGRES_DB=datalake \
  rozaydin/postgres-datalake:latest# Create a volume for data persistence
docker volume create postgres-datalake-data
# Run with persistent storage
docker run -d \
  --name postgres-datalake \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=your_secure_password \
  -e POSTGRES_DB=datalake \
  -v postgres-datalake-data:/var/lib/postgresql/data \
  rozaydin/postgres-datalake:latestversion: "3.8"
services:
  postgres-datalake:
    image: rozaydin/postgres-datalake:latest
    container_name: postgres-datalake
    environment:
      POSTGRES_PASSWORD: your_secure_password
      POSTGRES_DB: datalake
      POSTGRES_USER: postgres
    ports:
      - "5432:5432"
    volumes:
      - postgres-datalake-data:/var/lib/postgresql/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres"]
      interval: 30s
      timeout: 5s
      retries: 5
volumes:
  postgres-datalake-data:The image comes with optimized settings for 2-core/8GB RAM systems:
- Memory: 2GB shared_buffers, 6GB effective_cache_size
- Connections: Up to 80 concurrent connections
- Parallelism: Configured for 2-4 worker processes
- WAL: Optimized for performance with compression
- Logging: Comprehensive query and performance logging
All standard PostgreSQL environment variables are supported:
- POSTGRES_PASSWORD- Required: Database superuser password
- POSTGRES_USER- Optional: Superuser name (default: postgres)
- POSTGRES_DB- Optional: Default database name
- POSTGRES_INITDB_ARGS- Optional: Additional initdb arguments
To override the default configuration:
# Mount your own postgresql.conf
docker run -d \
  --name postgres-datalake \
  -v /path/to/your/postgresql.conf:/etc/postgresql/postgresql.conf \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=your_password \
  rozaydin/postgres-datalake:latest-- Create a table with vector embeddings
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding VECTOR(384)
);
-- Find similar documents
SELECT content
FROM documents
ORDER BY embedding <-> '[0.1,0.2,0.3,...]'::vector
LIMIT 5;-- Create a foreign table for CSV files
CREATE EXTENSION IF NOT EXISTS file_fdw;
CREATE SERVER file_server FOREIGN DATA WRAPPER file_fdw;
-- Map a CSV file as a foreign table
CREATE FOREIGN TABLE sales_data (
    id INTEGER,
    product_name TEXT,
    sale_date DATE,
    amount NUMERIC
) SERVER file_server
OPTIONS (filename '/data/sales.csv', format 'csv', header 'true');
-- Query CSV data directly
SELECT product_name, SUM(amount) as total_sales
FROM sales_data 
WHERE sale_date >= '2024-01-01'
GROUP BY product_name;
-- Create table from CSV data
CREATE TABLE imported_sales AS 
SELECT * FROM sales_data WHERE amount > 100;
```### Automated Partitioning
```sql
-- Set up automatic monthly partitioning
SELECT partman.create_parent(
    p_parent_table => 'public.sales',
    p_control => 'sale_date',
    p_type => 'range',
    p_interval => 'monthly'
);-- Schedule a daily cleanup job
SELECT cron.schedule('daily-cleanup', '0 2 * * *', 'VACUUM ANALYZE;');
-- List scheduled jobs
SELECT * FROM cron.job;- CPU: 2 cores
- RAM: 4GB (8GB recommended)
- Storage: 10GB+ for database files
- OS: Any Docker-compatible system
- CPU: 2-4 cores
- RAM: 8GB+
- Storage: SSD recommended for best performance
For different memory configurations, adjust these parameters:
| System RAM | shared_buffers | effective_cache_size | work_mem | 
|---|---|---|---|
| 4GB | 1GB | 3GB | 16MB | 
| 8GB | 2GB | 6GB | 32MB | 
| 16GB | 4GB | 12GB | 64MB | 
| 32GB | 8GB | 24GB | 128MB | 
For high-concurrency workloads, consider using pgBouncer:
# Add to docker-compose.yml
pgbouncer:
  image: pgbouncer/pgbouncer:latest
  environment:
    DATABASES_HOST: postgres-datalake
    DATABASES_PORT: 5432
    DATABASES_USER: postgres
    DATABASES_PASSWORD: your_password
    DATABASES_DBNAME: datalake
  ports:
    - "6432:5432"# Check container health
docker ps
# View logs
docker logs postgres-datalake
# Connect to database
docker exec -it postgres-datalake psql -U postgres -d datalake-- Check slow queries
SELECT query, mean_exec_time, calls
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 10;
-- Monitor table sizes
SELECT schemaname, tablename,
       pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC;# Clone the repository
git clone https://github.com/rozaydin/postgres-docker.git
cd postgres-docker
# Build the image
docker build -t postgres-datalake:latest .
# Run locally
docker run -d \
  --name postgres-datalake \
  -p 5432:5432 \
  -e POSTGRES_PASSWORD=test123 \
  postgres-datalake:latest| Extension | Source | Purpose | Status | 
|---|---|---|---|
| file_fdw | PostgreSQL Contrib | CSV/TSV file support | β Available | 
| pg_parquet | adriangb/pg_parquet | Native Parquet support | β³ Requires Rust toolchain | 
| clickhouse_fdw | ildus/clickhouse_fdw | ClickHouse connectivity | β³ Awaiting PG18 support | 
| pg_partman | pgpartman/pg_partman | Automated partitioning | β Available | 
| pgvector | pgvector/pgvector | Vector operations | β Available | 
| pg_cron | PostgreSQL APT | Job scheduling | β Available | 
| PostGIS | PostgreSQL APT | Geospatial analysis | β Available | 
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Issues: GitHub Issues
- Documentation: PostgreSQL Documentation
- Extensions: Check individual extension documentation for specific usage
Built with β€οΈ for modern data workloads