Feature/348 increase benchmark runs and fix fixed iteration ceiling by jathavaan · Pull Request #350 · kartAI/doppa

jathavaan · 2026-05-26T09:21:43Z

This pull request updates the benchmark configuration and runner to improve experiment tracking and handling of skipped benchmarks. The most significant changes are the addition of an explicit runs parameter for all experiments in benchmarks.yml, and an enhancement to how skipped experiments and their reasons are recorded and reported in benchmark_runner.py.

Benchmark configuration improvements:

Added a runs parameter to every experiment in benchmarks.yml, specifying the number of times each benchmark should be executed (typically set to 30 for smaller experiments and 3 for larger or resource-intensive ones). [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38]
Updated the skip field for certain experiments to specify more descriptive reasons (e.g., timeout, failed), replacing simple boolean values. This provides clearer context for why experiments are skipped, such as exceeding time limits or encountering out-of-memory errors. [1] [2] [3] [4] [5] [6] [7] [8]

Benchmark runner enhancements:

Refactored the skip logic in benchmark_runner.py to use a new _get_skip_reason() function, which returns a StopReason enum value (or None) instead of a boolean. This enables more detailed reporting and handling of different skip scenarios.
Updated the main experiment runner to log and record the specific skip reason in the metadata when an experiment is skipped, rather than just marking it as failed.

…d-iteration ceiling BENCHMARK_RUNS: 1 → 30, BENCHMARK_MAX_ITERATION_SECONDS: 4500 → 3600, BENCHMARK_MAX_FIXED_WINDOW_SECONDS: 75 min → 300 min.

RQ1 experiments: runs: 30. RQ2 experiments: runs: 3. national-scale-spatial-join-duckdb-large: skip: true (exceeds 60-min threshold).

Orchestrator checks experiment["runs"] before launching a batch. When benchmark_run > runs, marks experiment and related peers as completed.

benchmarks.yml: skip: true → skip: timeout | skip: failed per experiment. StopReason.from_skip(): parse YAML skip value into enum. benchmark_runner: _is_skipped → _get_skip_reason returning StopReason. Config: LOGGING_LEVEL changed to DEBUG.

Signed-off-by: Jathavaan Shankarr <jathavaan12@gmail.com>

Copilot

Pull request overview

This PR updates the benchmarking suite configuration and orchestration to support per-experiment run counts and more descriptive skip reasons, while also adjusting benchmark time/ceiling defaults to better match the intended execution constraints.

Changes:

Add explicit runs per experiment (and richer skip reasons) in benchmarks.yml.
Update orchestration (main.py) to skip launching experiments when benchmark_run exceeds an experiment’s configured runs.
Update in-container runner (benchmark_runner.py) to persist skip reasons in run metadata; add StopReason.from_skip() helper and adjust benchmark defaults in Config.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/domain/enums/stop_reason.py	Adds helper to map `benchmarks.yml` skip values to `StopReason`.
src/config.py	Updates benchmark defaults (runs/time ceilings) and logging level.
main.py	Skips experiments (and their related batch members) once their configured `runs` limit is exceeded.
benchmarks.yml	Adds per-experiment `runs` and replaces boolean skips with descriptive reasons (e.g., `timeout`, `failed`).
benchmark_runner.py	Refactors skip handling to record a specific skip reason in persisted run metadata.

Comments suppressed due to low confidence (1)

src/config.py:57

Setting the default LOGGING_LEVEL to DEBUG will significantly increase log volume (console + file) for all runs, which can impact performance/cost and make logs noisy. Consider keeping the default at INFO and making DEBUG opt-in via an environment variable (e.g., LOGGING_LEVEL), especially for benchmark/orchestrator runs.

    # LOGGING
    LOGGING_LEVEL: int = logging.INFO
    LOG_FILE: Path = LOG_DIR / f"{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jathavaan added 4 commits May 26, 2026 10:51

#348 Increase benchmark runs, lower per-iteration timeout, raise fixe…

c9e937c

…d-iteration ceiling BENCHMARK_RUNS: 1 → 30, BENCHMARK_MAX_ITERATION_SECONDS: 4500 → 3600, BENCHMARK_MAX_FIXED_WINDOW_SECONDS: 75 min → 300 min.

#348 Add per-experiment runs field and skip duckdb-large

0acd238

RQ1 experiments: runs: 30. RQ2 experiments: runs: 3. national-scale-spatial-join-duckdb-large: skip: true (exceeds 60-min threshold).

#348 Skip experiments that exceed their per-experiment runs limit

4580606

Orchestrator checks experiment["runs"] before launching a batch. When benchmark_run > runs, marks experiment and related peers as completed.

jathavaan self-assigned this May 26, 2026

Copilot AI review requested due to automatic review settings May 26, 2026 09:21

jathavaan linked an issue May 26, 2026 that may be closed by this pull request

Increase benchmark runs and fix fixed-iteration ceiling #348

Closed

Copilot started reviewing on behalf of jathavaan May 26, 2026 09:22 View session

#348 Changed logging level to INFO

115cada

Signed-off-by: Jathavaan Shankarr <jathavaan12@gmail.com>

jathavaan enabled auto-merge May 26, 2026 09:22

Copilot AI reviewed May 26, 2026

View reviewed changes

Comment thread src/domain/enums/stop_reason.py

Comment thread main.py

Comment thread benchmark_runner.py

jathavaan merged commit 9f0be0a into main May 26, 2026
23 checks passed

jathavaan deleted the feature/348-increase-benchmark-runs-and-fix-fixed-iteration-ceiling branch May 26, 2026 09:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/348 increase benchmark runs and fix fixed iteration ceiling#350

Feature/348 increase benchmark runs and fix fixed iteration ceiling#350
jathavaan merged 5 commits into
mainfrom
feature/348-increase-benchmark-runs-and-fix-fixed-iteration-ceiling

jathavaan commented May 26, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jathavaan commented May 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants