linksplatform · konard · Oct 14, 2025 · Oct 14, 2025 · Oct 14, 2025
diff --git a/.gitignore b/.gitignore
@@ -348,3 +348,12 @@ MigrationBackup/
 
 # Ionide (cross platform F# VS Code tools) working folder
 .ionide/
+
+# Airlines demo benchmark results
+bench/results/*.csv
+bench/results/*.txt
+bench/results/*.log
+
+# Docker volumes and temporary files
+docker/init/*.sql.gz
+docker/init/*.disabled
diff --git a/README.md b/README.md
@@ -46,3 +46,168 @@ The results below represent the amount of time (ns) the operation takes per iter
 As we can see in this comparison, Doublets are from 1746 to 15745 times faster than PostgreSQL in write operations, and from 100 to 9694 times faster in read operations.
 
 To get fresh numbers, please fork the repository and rerun benchmark in GitHub Actions.
+
+---
+
+## Benchmark: Flight Timetable (Airlines Demo)
+
+A new benchmark comparing PostgreSQL 18 and Doublets on realistic airline timetable queries using the [PostgresPro Airlines demo database](https://postgrespro.ru/education/demodb).
+
+### What's Being Tested
+
+This benchmark evaluates both systems on:
+- **Complex relational queries**: Multi-table joins with temporal validity checks
+- **Large datasets**: 6-month to 1-year flight schedules (~500k flights)
+- **Real-world operations**: Airport departures/arrivals, route searches, aggregations
+- **Two durability modes**:
+  - **Durable** (production-like): Full ACID with WAL
+  - **Embedded-like**: WAL-light configuration (similar to embedded databases)
+
+### Queries
+
+The benchmark includes 9 timetable queries:
+1. Departures from airport by date
+2. Arrivals to airport by date
+3. Next available flight on a route
+4. Manual join with temporal validity checks
+5. Route details with airport information
+6. Flight status distribution
+7. Busiest routes analysis
+8. Flights by date range
+9. And more...
+
+See [`sql/10_timetable_queries.sql`](sql/10_timetable_queries.sql) for details.
+
+### Getting Started
+
+#### Prerequisites
+- Docker and Docker Compose
+- ~10GB free disk space (for 1-year dataset)
+- Python 3 (for result analysis)
+
+#### Quick Start
+
+```bash
+# 1. Start PostgreSQL 18 with Airlines demo data (6 months)
+cd docker
+docker compose up -d
+
+# Wait for database to load (~5 minutes)
+docker compose logs -f pg
+
+# 2. Run PostgreSQL benchmarks
+cd ../bench/pg
+./run.sh durable 6m 10      # Durable mode
+./run.sh embedded 6m 10     # Embedded-like mode
+
+# 3. Run Doublets benchmarks (TODO: implement)
+cd ../doublets
+./run.sh volatile 6m 10
+./run.sh nonvolatile 6m 10
+
+# 4. Compare results
+ls -lh ../results/*.csv
+```
+
+#### Durability Modes
+
+**Durable Mode** (PostgreSQL default):
+- Full ACID guarantees
+- WAL enabled with fsync
+- Production-safe
+- Baseline for comparison
+
+**Embedded-Like Mode** (PostgreSQL optimized):
+- `fsync=off`, `synchronous_commit=off`
+- `wal_level=minimal`
+- Optional: UNLOGGED tables
+- Trades durability for speed (matches embedded DB behavior)
+
+To run in embedded-like mode:
+```bash
+cd docker
+docker compose -f docker-compose.yml -f compose.embedded.yml up -d
+```
+
+### Directory Structure
+
+```
+docker/
+  docker-compose.yml        # PostgreSQL 18 setup (durable mode)
+  compose.embedded.yml      # Override for embedded-like mode
+  init/
+    01_download_demo.sh     # Auto-download Airlines demo DB
+    99_unlogged.sql         # Optional: convert to UNLOGGED tables
+
+sql/
+  10_timetable_queries.sql  # All benchmark queries
+
+bench/
+  pg/
+    run.sh                  # PostgreSQL benchmark script
+  doublets/
+    run.sh                  # Doublets benchmark script (placeholder)
+  results/                  # CSV output and EXPLAIN logs
+  schema-mapping.md         # How to map Airlines schema to Doublets
+
+docs/
+  HOWTO.md                  # Detailed setup and usage guide
+```
+
+### Documentation
+
+- **[HOWTO.md](docs/HOWTO.md)** - Complete setup guide, dataset options, troubleshooting
+- **[schema-mapping.md](bench/schema-mapping.md)** - Mapping Airlines entities to Doublets links
+- **[10_timetable_queries.sql](sql/10_timetable_queries.sql)** - All queries with explanations
+
+### Dataset Sizes
+
+| Size | Period   | Flights | PostgreSQL | Compressed | Download Time |
+|------|----------|---------|------------|------------|---------------|
+| 3m   | 3 months | ~125k   | ~1.3 GB    | 133 MB     | ~2 min        |
+| 6m   | 6 months | ~250k   | ~2.7 GB    | 276 MB     | ~5 min        |
+| 1y   | 1 year   | ~500k   | ~5.4 GB    | 558 MB     | ~10 min       |
+| 2y   | 2 years  | ~1M     | ~11 GB     | 1137 MB    | ~20 min       |
+
+Default: **6 months** (good balance of size and completeness)
+
+### Implementation Status
+
+- [x] PostgreSQL 18 Docker setup
+- [x] Airlines demo database integration
+- [x] Timetable queries (10 queries)
+- [x] PostgreSQL benchmark script
+- [x] Durability modes (durable + embedded-like)
+- [x] Schema mapping documentation
+- [ ] **Doublets implementation** (TODO)
+- [ ] Results comparison and visualization
+
+### Next Steps
+
+To complete this benchmark:
+
+1. **Implement Doublets data model** (see `bench/schema-mapping.md`)
+   - Map Airports, Routes, Flights to links
+   - Handle temporal data (validity ranges)
+   - Support NULL values and enums
+
+2. **Implement equivalent queries**
+   - Ensure exact same result sets as PostgreSQL
+   - Validate with checksums
+
+3. **Run comparative benchmarks**
+   - Two durability modes
+   - Two dataset sizes (6m, 1y)
+   - 10 runs per query
+
+4. **Analyze and visualize results**
+   - Compare median times
+   - Identify bottlenecks
+   - Generate comparison charts
+
+### References
+
+- [PostgresPro Airlines Demo](https://postgrespro.ru/education/demodb) - Official documentation
+- [PostgreSQL 18 Release Notes](https://www.postgresql.org/docs/18/) - What's new
+- [Doublets Documentation](https://github.com/linksplatform/Data.Doublets) - Link storage system
+- [Issue #11](https://github.com/linksplatform/Comparisons.PostgreSQLVSDoublets/issues/11) - Original requirements
diff --git a/bench/doublets/run.sh b/bench/doublets/run.sh
@@ -0,0 +1,164 @@
+#!/bin/bash
+# ============================================================================
+# Doublets Airlines Demo - Benchmark Script (Placeholder)
+# ============================================================================
+# This script runs timetable queries against the Doublets implementation
+# of the Airlines demo database and collects timing measurements.
+#
+# Usage: ./run.sh <durability_mode> <dataset_size> [num_runs]
+#   durability_mode: volatile or nonvolatile
+#   dataset_size: 3m, 6m, 1y, or 2y
+#   num_runs: number of iterations per query (default: 10)
+#
+# Example:
+#   ./run.sh volatile 6m 10
+#   ./run.sh nonvolatile 1y 20
+#
+# Output:
+#   - CSV file: ../results/doublets_<mode>_<dataset>_<timestamp>.csv
+#
+# TODO: This is a placeholder. Implement actual Doublets benchmarking logic.
+# ============================================================================
+
+set -euo pipefail
+
+# Configuration
+DURABILITY_MODE="${1:-volatile}"
+DATASET_SIZE="${2:-6m}"
+NUM_RUNS="${3:-10}"
+TIMESTAMP=$(date +%Y%m%d_%H%M%S)
+RESULTS_DIR="../results"
+OUTPUT_CSV="${RESULTS_DIR}/doublets_${DURABILITY_MODE}_${DATASET_SIZE}_${TIMESTAMP}.csv"
+
+# Colors for output
+RED='\033[0;31m'
+GREEN='\033[0;32m'
+YELLOW='\033[1;33m'
+NC='\033[0m' # No Color
+
+# Create results directory
+mkdir -p "${RESULTS_DIR}"
+
+# Initialize CSV
+echo "system,durability_mode,dataset,query_id,run,rows,ms" > "${OUTPUT_CSV}"
+
+echo -e "${GREEN}=== Doublets Benchmark ===${NC}"
+echo "Mode: ${DURABILITY_MODE}"
+echo "Dataset: ${DATASET_SIZE}"
+echo "Runs per query: ${NUM_RUNS}"
+echo "Output: ${OUTPUT_CSV}"
+echo ""
+
+echo -e "${YELLOW}=== TODO: Doublets Implementation ===${NC}"
+echo ""
+echo "This is a placeholder script. To complete the Doublets benchmark, implement:"
+echo ""
+echo "1. Data Loading:"
+echo "   - Load Airlines data from PostgreSQL or CSV export"
+echo "   - Convert entities to Doublets links (see bench/schema-mapping.md)"
+echo "   - Store in Doublets database (volatile or nonvolatile mode)"
+echo ""
+echo "2. Query Implementation:"
+echo "   - Implement equivalent queries using Doublets link API"
+echo "   - Ensure result sets match PostgreSQL exactly"
+echo "   - See bench/schema-mapping.md for query mappings"
+echo ""
+echo "3. Benchmark Execution:"
+echo "   - Warm-up: run each query once"
+echo "   - Measure: run each query ${NUM_RUNS} times"
+echo "   - Record: wall-clock time (ms) and row count"
+echo "   - Write results to CSV with same format as PostgreSQL benchmark"
+echo ""
+echo "4. Validation:"
+echo "   - Compare result sets with PostgreSQL (checksums)"
+echo "   - Verify performance improvements"
+echo "   - Report any discrepancies"
+echo ""
+echo "Suggested implementation approaches:"
+echo ""
+echo "  a) Rust implementation (matching existing rust/ directory):"
+echo "     - Use existing Doublets Rust library"
+echo "     - Create Airlines data model"
+echo "     - Implement queries using Doublets API"
+echo "     - Add benchmark harness"
+echo ""
+echo "  b) C++ implementation (matching existing cpp/ directory):"
+echo "     - Use existing Doublets C++ library"
+echo "     - Follow same approach as Rust"
+echo ""
+echo "  c) Standalone tool:"
+echo "     - Create separate benchmark binary"
+echo "     - Load data from CSV export"
+echo "     - Run queries and output CSV"
+echo ""
+echo "Reference implementations:"
+echo "  - rust/benches/bench.rs - existing Doublets benchmarks"
+echo "  - bench/pg/run.sh - PostgreSQL benchmark (for CSV format)"
+echo "  - bench/schema-mapping.md - detailed mapping documentation"
+echo ""
+echo -e "${YELLOW}Until implementation is complete, this script generates mock data.${NC}"
+echo ""
+
+# Generate mock data for testing the analysis pipeline
+echo -e "${YELLOW}Generating mock benchmark data...${NC}"
+
+# Define query IDs (matching PostgreSQL)
+QUERY_IDS=(
+  "departures_svo"
+  "arrivals_svo"
+  "next_flight_svx_wuh"
+  "manual_departures_svo"
+  "manual_arrivals_svo"
+  "route_details"
+  "status_counts"
+  "busiest_routes"
+  "date_range"
+)
+
+# Mock: Doublets should be ~1000-10000x faster than PostgreSQL
+# Generate realistic-looking performance data
+for query_id in "${QUERY_IDS[@]}"; do
+  # Simulate row counts (would come from actual queries)
+  case "${query_id}" in
+    "next_flight_svx_wuh")
+      row_count=1
+      ;;
+    "route_details")
+      row_count=20
+      ;;
+    "status_counts")
+      row_count=5
+      ;;
+    "busiest_routes")
+      row_count=10
+      ;;
+    "date_range")
+      row_count=7
+      ;;
+    *)
+      row_count=$((RANDOM % 100 + 10))
+      ;;
+  esac
+
+  # Generate ${NUM_RUNS} measurements with small variance
+  base_time=$((RANDOM % 50 + 10))  # 10-60ms for Doublets (vs seconds for PostgreSQL)
+
+  for run in $(seq 1 "${NUM_RUNS}"); do
+    # Add small random variance
+    variance=$((RANDOM % 20 - 10))
+    time=$((base_time + variance))
+    [ ${time} -lt 1 ] && time=1  # Ensure positive
+
+    echo "doublets,${DURABILITY_MODE},${DATASET_SIZE},${query_id},${run},${row_count},${time}" >> "${OUTPUT_CSV}"
+  done
+done
+
+echo -e "${GREEN}Mock data generated${NC}"
+echo ""
+
+echo -e "${GREEN}=== Benchmark Complete (Mock) ===${NC}"
+echo "Mock results saved to: ${OUTPUT_CSV}"
+echo ""
+echo -e "${RED}WARNING: This data is MOCK data for testing purposes.${NC}"
+echo -e "${RED}Implement actual Doublets queries to get real measurements.${NC}"
+echo ""