Skip to content

Add missing Sedona node counts (2, 12, 16) to benchmark matrix #357

@jathavaan

Description

@jathavaan

Problem

The benchmark matrix for Sedona experiments is incomplete. Only 4 and 8 nodes exist at small scale, and 12-node clusters are missing entirely. The intended matrix is 2, 4, 8, 12, 16 nodes × all dataset sizes for both broadcast and partitioned strategies.

Target matrix

Broadcast join

Nodes Small Medium Large
2 new exists exists
4 exists exists exists
8 exists exists exists
12 new new new
16 new new exists

Partitioned join

Nodes Small Medium Large
2 new skip:failed skip:failed
4 exists skip:failed skip:failed
8 exists skip:failed skip:failed
12 new skip:failed new skip:failed new
16 new skip:failed new skip:failed new

All partitioned medium/large entries are skip: failed (executor OOM on RangeJoin spatial index).

Required changes per new entry

  1. Entry in benchmarks.yml with id, image, cpu, memory_gb, dataset_size, runs, related_script_ids
  2. Docker image for 12-node variants (new image tag)
  3. Entrypoint in src/presentation/entrypoints/ for 12-node variants
  4. case arm in benchmark_runner.py for 12-node variants
  5. Batch pairing must satisfy the four constraints (same query type, same dataset_size, at most one PostGIS, Databricks vCPU sum ≤ 200)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions