Skip to content

feat(harbor): validate partition task names against task_source at build time#17

Open
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-partition-name-validation
Open

feat(harbor): validate partition task names against task_source at build time#17
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-partition-name-validation

Conversation

@shehabyasser-scale

@shehabyasser-scale shehabyasser-scale commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #9; companion to #16 (defense in depth for the same live incident). A Mode B partition with bare TB2 names (break-filter-js-from-html instead of terminal-bench/break-filter-js-from-html) compiled fine, then zeroed an entire optimization trial at eval time after burning its budget. #16 makes the eval-time failure loud; this PR catches it at build time where it costs $0.

The compiler enumerates the registry task_source (best-effort, via harbor's DatasetConfig) and fails the build naming the unknown tasks + canonical examples. Offline compiles warn and continue; VERO_SKIP_TASK_NAME_CHECK=1 bypasses; local inner_task sources are exempt.

3 tests (unknown fails, canonical passes, offline warns). 11 pass.

🤖 Generated with Claude Code

Greptile Summary

This PR adds a build-time guard that validates partition task names against the registry's canonical <org>/<name> form, catching the class of bug where bare names compiled silently and zeroed an entire optimization trial at eval time. The implementation is best-effort: offline or harbor-unavailable compiles warn and continue; VERO_SKIP_TASK_NAME_CHECK=1 bypasses; inner_task local sources are exempt.

  • _resolve_task_source_names enumerates task names via DatasetConfig.get_task_configs(), handling both sync and async-caller contexts via a ThreadPoolExecutor workaround for running-loop environments.
  • _validate_partition_names raises a descriptive ValueError on unknown names, treats both None and empty-set returns as warn + continue, and is inserted into compile_task before dataset registration so a bad config costs nothing.
  • Five tests cover the unknown/canonical/offline/empty-source/bypass-env-var paths, addressing all three issues flagged in the prior review round.

Confidence Score: 5/5

Safe to merge — the new validation is best-effort and fails open when enumeration is unavailable, so it cannot block legitimate offline builds or introduce regressions on existing compile paths.

All three issues from the prior review round are addressed: the nested-event-loop bypass, the confusing empty-source error, and the missing bypass-env-var test. The validation is inserted before any filesystem side effects on the build output, and the async thread-worker approach for running-loop contexts is correct. No logic errors or unsafe paths were found.

No files require special attention.

Important Files Changed

Filename Overview
vero/src/vero/harbor/build/compiler.py Adds _resolve_task_source_names and _validate_partition_names; hooks validation into compile_task before dataset registration. Handles async-context edge case via ThreadPoolExecutor. Logic is sound and well-guarded.
vero/tests/test_harbor_build.py Adds 5-test TestPartitionNameValidation class covering all key branches (unknown fails, canonical passes, offline warns, empty-source warns, env-var bypass). Responds to all three prior review threads.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[compile_task Mode B] --> B{task_source set and no inner_task?}
    B -- No --> E[build_harbor_dataset + register]
    B -- Yes --> C{VERO_SKIP_TASK_NAME_CHECK?}
    C -- Yes --> E
    C -- No --> D[_resolve_task_source_names]
    D --> F{names returned?}
    F -- None or empty set --> G[logger.warning: skipping check]
    G --> E
    F -- non-empty set --> H{unknown names in partition?}
    H -- No --> E
    H -- Yes --> I[raise ValueError with sample canonical names]
    D --> J{Running event loop?}
    J -- No --> K[asyncio.run coro]
    J -- Yes --> L[ThreadPoolExecutor asyncio.run in worker thread]
    K --> M[return set of names]
    L --> M
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[compile_task Mode B] --> B{task_source set and no inner_task?}
    B -- No --> E[build_harbor_dataset + register]
    B -- Yes --> C{VERO_SKIP_TASK_NAME_CHECK?}
    C -- Yes --> E
    C -- No --> D[_resolve_task_source_names]
    D --> F{names returned?}
    F -- None or empty set --> G[logger.warning: skipping check]
    G --> E
    F -- non-empty set --> H{unknown names in partition?}
    H -- No --> E
    H -- Yes --> I[raise ValueError with sample canonical names]
    D --> J{Running event loop?}
    J -- No --> K[asyncio.run coro]
    J -- Yes --> L[ThreadPoolExecutor asyncio.run in worker thread]
    K --> M[return set of names]
    L --> M
Loading

Reviews (2): Last reviewed commit: "fix(harbor): partition-name validation s..." | Re-trigger Greptile

…ild time

Companion to the _collate mismatch guard: a bare or misspelled task name
in a Mode B partition compiles fine and fails only at eval time, as an
all-zero experiment that looks like total agent failure (and burns real
eval budget first). Catch it at build time, where it costs nothing.

The compiler now enumerates the registry task_source (best-effort via
harbor's DatasetConfig) and fails the build listing the unknown names
plus canonical examples. Offline compiles warn and continue;
VERO_SKIP_TASK_NAME_CHECK=1 bypasses. Local inner_task sources are
exempt (their naming is not registry-governed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread vero/src/vero/harbor/build/compiler.py
Comment thread vero/src/vero/harbor/build/compiler.py
Comment thread vero/tests/test_harbor_build.py Outdated
…ty sources

Greptile on #17: (1) asyncio.run raised RuntimeError inside a running
event loop (async caller, pytest-asyncio, notebook), the bare except
swallowed it, and validation silently became a no-op behind a misleading
'offline?' warning. Enumeration now detects a running loop and runs on
its own loop in a worker thread. (2) An EMPTY task source made every
partition name 'unknown' with the useless hint 'e.g. []'; empty now
warns-and-skips like enumeration failure. (3) The
VERO_SKIP_TASK_NAME_CHECK bypass gets a test so it stays honest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant