feat(harbor): validate partition task names against task_source at build time#17
Open
shehabyasser-scale wants to merge 2 commits into
Open
feat(harbor): validate partition task names against task_source at build time#17shehabyasser-scale wants to merge 2 commits into
shehabyasser-scale wants to merge 2 commits into
Conversation
…ild time Companion to the _collate mismatch guard: a bare or misspelled task name in a Mode B partition compiles fine and fails only at eval time, as an all-zero experiment that looks like total agent failure (and burns real eval budget first). Catch it at build time, where it costs nothing. The compiler now enumerates the registry task_source (best-effort via harbor's DatasetConfig) and fails the build listing the unknown names plus canonical examples. Offline compiles warn and continue; VERO_SKIP_TASK_NAME_CHECK=1 bypasses. Local inner_task sources are exempt (their naming is not registry-governed). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty sources Greptile on #17: (1) asyncio.run raised RuntimeError inside a running event loop (async caller, pytest-asyncio, notebook), the bare except swallowed it, and validation silently became a no-op behind a misleading 'offline?' warning. Enumeration now detects a running loop and runs on its own loop in a worker thread. (2) An EMPTY task source made every partition name 'unknown' with the useless hint 'e.g. []'; empty now warns-and-skips like enumeration failure. (3) The VERO_SKIP_TASK_NAME_CHECK bypass gets a test so it stays honest. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #9; companion to #16 (defense in depth for the same live incident). A Mode B partition with bare TB2 names (
break-filter-js-from-htmlinstead ofterminal-bench/break-filter-js-from-html) compiled fine, then zeroed an entire optimization trial at eval time after burning its budget. #16 makes the eval-time failure loud; this PR catches it at build time where it costs $0.The compiler enumerates the registry
task_source(best-effort, via harbor'sDatasetConfig) and fails the build naming the unknown tasks + canonical examples. Offline compiles warn and continue;VERO_SKIP_TASK_NAME_CHECK=1bypasses; localinner_tasksources are exempt.3 tests (unknown fails, canonical passes, offline warns). 11 pass.
🤖 Generated with Claude Code
Greptile Summary
This PR adds a build-time guard that validates partition task names against the registry's canonical
<org>/<name>form, catching the class of bug where bare names compiled silently and zeroed an entire optimization trial at eval time. The implementation is best-effort: offline or harbor-unavailable compiles warn and continue;VERO_SKIP_TASK_NAME_CHECK=1bypasses;inner_tasklocal sources are exempt._resolve_task_source_namesenumerates task names viaDatasetConfig.get_task_configs(), handling both sync and async-caller contexts via aThreadPoolExecutorworkaround for running-loop environments._validate_partition_namesraises a descriptiveValueErroron unknown names, treats bothNoneand empty-set returns as warn + continue, and is inserted intocompile_taskbefore dataset registration so a bad config costs nothing.Confidence Score: 5/5
Safe to merge — the new validation is best-effort and fails open when enumeration is unavailable, so it cannot block legitimate offline builds or introduce regressions on existing compile paths.
All three issues from the prior review round are addressed: the nested-event-loop bypass, the confusing empty-source error, and the missing bypass-env-var test. The validation is inserted before any filesystem side effects on the build output, and the async thread-worker approach for running-loop contexts is correct. No logic errors or unsafe paths were found.
No files require special attention.
Important Files Changed
Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[compile_task Mode B] --> B{task_source set and no inner_task?} B -- No --> E[build_harbor_dataset + register] B -- Yes --> C{VERO_SKIP_TASK_NAME_CHECK?} C -- Yes --> E C -- No --> D[_resolve_task_source_names] D --> F{names returned?} F -- None or empty set --> G[logger.warning: skipping check] G --> E F -- non-empty set --> H{unknown names in partition?} H -- No --> E H -- Yes --> I[raise ValueError with sample canonical names] D --> J{Running event loop?} J -- No --> K[asyncio.run coro] J -- Yes --> L[ThreadPoolExecutor asyncio.run in worker thread] K --> M[return set of names] L --> M%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[compile_task Mode B] --> B{task_source set and no inner_task?} B -- No --> E[build_harbor_dataset + register] B -- Yes --> C{VERO_SKIP_TASK_NAME_CHECK?} C -- Yes --> E C -- No --> D[_resolve_task_source_names] D --> F{names returned?} F -- None or empty set --> G[logger.warning: skipping check] G --> E F -- non-empty set --> H{unknown names in partition?} H -- No --> E H -- Yes --> I[raise ValueError with sample canonical names] D --> J{Running event loop?} J -- No --> K[asyncio.run coro] J -- Yes --> L[ThreadPoolExecutor asyncio.run in worker thread] K --> M[return set of names] L --> MReviews (2): Last reviewed commit: "fix(harbor): partition-name validation s..." | Re-trigger Greptile