Problem
The benchmark matrix for Sedona experiments is incomplete. Only 4 and 8 nodes exist at small scale, and 12-node clusters are missing entirely. The intended matrix is 2, 4, 8, 12, 16 nodes × all dataset sizes for both broadcast and partitioned strategies.
Target matrix
Broadcast join
| Nodes |
Small |
Medium |
Large |
| 2 |
new |
exists |
exists |
| 4 |
exists |
exists |
exists |
| 8 |
exists |
exists |
exists |
| 12 |
new |
new |
new |
| 16 |
new |
new |
exists |
Partitioned join
| Nodes |
Small |
Medium |
Large |
| 2 |
new |
skip:failed |
skip:failed |
| 4 |
exists |
skip:failed |
skip:failed |
| 8 |
exists |
skip:failed |
skip:failed |
| 12 |
new |
skip:failed new |
skip:failed new |
| 16 |
new |
skip:failed new |
skip:failed new |
All partitioned medium/large entries are skip: failed (executor OOM on RangeJoin spatial index).
Required changes per new entry
- Entry in
benchmarks.yml with id, image, cpu, memory_gb, dataset_size, runs, related_script_ids
- Docker image for 12-node variants (new image tag)
- Entrypoint in
src/presentation/entrypoints/ for 12-node variants
case arm in benchmark_runner.py for 12-node variants
- Batch pairing must satisfy the four constraints (same query type, same dataset_size, at most one PostGIS, Databricks vCPU sum ≤ 200)
Problem
The benchmark matrix for Sedona experiments is incomplete. Only 4 and 8 nodes exist at small scale, and 12-node clusters are missing entirely. The intended matrix is 2, 4, 8, 12, 16 nodes × all dataset sizes for both broadcast and partitioned strategies.
Target matrix
Broadcast join
Partitioned join
All partitioned medium/large entries are
skip: failed(executor OOM on RangeJoin spatial index).Required changes per new entry
benchmarks.ymlwithid,image,cpu,memory_gb,dataset_size,runs,related_script_idssrc/presentation/entrypoints/for 12-node variantscasearm inbenchmark_runner.pyfor 12-node variants