Feature/348 increase benchmark runs and fix fixed iteration ceiling#350
Merged
jathavaan merged 5 commits intoMay 26, 2026
Merged
Conversation
…d-iteration ceiling BENCHMARK_RUNS: 1 → 30, BENCHMARK_MAX_ITERATION_SECONDS: 4500 → 3600, BENCHMARK_MAX_FIXED_WINDOW_SECONDS: 75 min → 300 min.
RQ1 experiments: runs: 30. RQ2 experiments: runs: 3. national-scale-spatial-join-duckdb-large: skip: true (exceeds 60-min threshold).
Orchestrator checks experiment["runs"] before launching a batch. When benchmark_run > runs, marks experiment and related peers as completed.
benchmarks.yml: skip: true → skip: timeout | skip: failed per experiment. StopReason.from_skip(): parse YAML skip value into enum. benchmark_runner: _is_skipped → _get_skip_reason returning StopReason. Config: LOGGING_LEVEL changed to DEBUG.
Signed-off-by: Jathavaan Shankarr <jathavaan12@gmail.com>
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the benchmarking suite configuration and orchestration to support per-experiment run counts and more descriptive skip reasons, while also adjusting benchmark time/ceiling defaults to better match the intended execution constraints.
Changes:
- Add explicit
runsper experiment (and richerskipreasons) inbenchmarks.yml. - Update orchestration (
main.py) to skip launching experiments whenbenchmark_runexceeds an experiment’s configuredruns. - Update in-container runner (
benchmark_runner.py) to persist skip reasons in run metadata; addStopReason.from_skip()helper and adjust benchmark defaults inConfig.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/domain/enums/stop_reason.py | Adds helper to map benchmarks.yml skip values to StopReason. |
| src/config.py | Updates benchmark defaults (runs/time ceilings) and logging level. |
| main.py | Skips experiments (and their related batch members) once their configured runs limit is exceeded. |
| benchmarks.yml | Adds per-experiment runs and replaces boolean skips with descriptive reasons (e.g., timeout, failed). |
| benchmark_runner.py | Refactors skip handling to record a specific skip reason in persisted run metadata. |
Comments suppressed due to low confidence (1)
src/config.py:57
- Setting the default
LOGGING_LEVELtoDEBUGwill significantly increase log volume (console + file) for all runs, which can impact performance/cost and make logs noisy. Consider keeping the default atINFOand making DEBUG opt-in via an environment variable (e.g.,LOGGING_LEVEL), especially for benchmark/orchestrator runs.
# LOGGING
LOGGING_LEVEL: int = logging.INFO
LOG_FILE: Path = LOG_DIR / f"{datetime.now().strftime('%Y%m%d_%H%M%S')}.log"
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request updates the benchmark configuration and runner to improve experiment tracking and handling of skipped benchmarks. The most significant changes are the addition of an explicit
runsparameter for all experiments inbenchmarks.yml, and an enhancement to how skipped experiments and their reasons are recorded and reported inbenchmark_runner.py.Benchmark configuration improvements:
Added a
runsparameter to every experiment inbenchmarks.yml, specifying the number of times each benchmark should be executed (typically set to 30 for smaller experiments and 3 for larger or resource-intensive ones). [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38]Updated the
skipfield for certain experiments to specify more descriptive reasons (e.g.,timeout,failed), replacing simple boolean values. This provides clearer context for why experiments are skipped, such as exceeding time limits or encountering out-of-memory errors. [1] [2] [3] [4] [5] [6] [7] [8]Benchmark runner enhancements:
Refactored the skip logic in
benchmark_runner.pyto use a new_get_skip_reason()function, which returns aStopReasonenum value (orNone) instead of a boolean. This enables more detailed reporting and handling of different skip scenarios.Updated the main experiment runner to log and record the specific skip reason in the metadata when an experiment is skipped, rather than just marking it as failed.