Skip to content

Increase benchmark runs and fix fixed-iteration ceiling #348

@jathavaan

Description

@jathavaan

Summary

The current configuration (BENCHMARK_RUNS=1, BENCHMARK_MAX_FIXED_WINDOW_SECONDS=4500) produces too few samples for reliable statistical analysis. The thesis (§4.5, Table 4.5.1) acknowledges that bootstrap CI from fewer than 30 samples is noisy. The current ceiling also caps several RQ2 experiments below their target iteration count.

Proposed changes:

  • Split BENCHMARK_RUNS into per-workload counts: 30 RQ1 runs, 3 RQ2 runs
  • BENCHMARK_MAX_ITERATION_SECONDS: 4500 → 3600 (60 min)
  • BENCHMARK_MAX_FIXED_WINDOW_SECONDS: 75 * 605 * BENCHMARK_MAX_ITERATION_SECONDS (= 18,000s = 300 min)
  • Skip national-scale-spatial-join-duckdb-large — exceeds 60-min per-iteration threshold (median 70.8 min). Existing data from run 2026-05-24-HBJYYT is sufficient. Report as a finding: single-node processing becomes impractical at large scale.
  • main.py: partition experiments into RQ1 (pip, knn, bbox) and RQ2 (national-scale-spatial-join) by ID prefix, loop each group with its own run count.

Estimated wall-clock per benchmark run

RQ1 — Sequential stopping (includes 10 min ingestion delay per experiment)

Batch Experiments Wall-clock
pip-small duckdb, local, postgis 15 min
pip-large duckdb, postgis 29 min
knn-small duckdb, local, postgis 52 min
knn-large duckdb, postgis 30 min
bbox-small duckdb, local, postgis 15 min
bbox-large duckdb, postgis 38 min
RQ1 total (batched + cleanup) 15 experiments 189 min = 3.2h

RQ2 — Fixed iteration (1 warmup + 5 timed, 60 min/iter, 300 min ceiling, includes 10 min ingestion)

Experiment Per-iter Iters Total Note
duckdb-small 2.8 min 6 26.6 min
duckdb-medium 24.6 min 6 157.3 min
duckdb-large 70.8 min SKIP: exceeds 60 min threshold
postgis-small 0.2 min 6 11.4 min
postgis-medium 23.0 min 6 148.1 min
broadcast-2-nodes-medium 3.3 min 6 30.1 min
broadcast-2-nodes-large 7.5 min 6 55.1 min
broadcast-4-nodes-small 0.3 min 6 12.0 min
broadcast-4-nodes-medium 1.8 min 6 20.6 min
broadcast-4-nodes-large 4.0 min 6 34.0 min
broadcast-8-nodes-small 0.3 min 6 11.6 min
broadcast-8-nodes-medium 1.0 min 6 15.7 min
broadcast-8-nodes-large 2.1 min 6 22.6 min
broadcast-16-nodes-large 1.2 min 6 17.4 min
partitioned-4-nodes-small 18.4 min 6 120.2 min
partitioned-8-nodes-small 9.7 min 6 68.5 min
Batch Wall-clock
small-main (duckdb, postgis, broadcast-8, partitioned-8) 71 min
small-4node (broadcast-4, partitioned-4) 123 min
med-main (duckdb, postgis, broadcast-8) 160 min
med-4node (broadcast-4) 24 min
med-2node (broadcast-2) 33 min
lrg-8node (broadcast-8) 26 min
lrg-broadcast (broadcast-2, broadcast-4, broadcast-16) 58 min
RQ2 total (batched + cleanup) 505 min = 8.4h

Skipped experiments (9)

Experiment Reason
national-scale-spatial-join-duckdb-large NEW — exceeds 60-min per-iteration threshold
national-scale-spatial-join-postgis-large OOM on Azure PostgreSQL Flex Server
national-scale-spatial-join-databricks-partitioned-2-nodes-medium executor OOM
national-scale-spatial-join-databricks-partitioned-2-nodes-large executor OOM
national-scale-spatial-join-databricks-partitioned-4-nodes-medium executor OOM
national-scale-spatial-join-databricks-partitioned-4-nodes-large executor OOM
national-scale-spatial-join-databricks-partitioned-8-nodes-medium executor OOM
national-scale-spatial-join-databricks-partitioned-8-nodes-large executor OOM
national-scale-spatial-join-databricks-partitioned-16-nodes-large executor OOM

Total suite estimate

Component Per run Runs Total time Cost
RQ1 3.2h 30 94.6h ~$14.64
RQ2 8.4h 3 25.3h ~$37.97
Total 119.9h = 5.0 days ~$52.61

Per-iteration times are median estimates from run 2026-05-24-HBJYYT. Cost estimates based on Azure ACI, Databricks, and PostgreSQL pricing from the same run, adjusted for increased iteration counts.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions