Skip to content

docs(harbor): warn in the task instruction that baseline evals create no candidate#12

Open
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-compiler-instruction-warning
Open

docs(harbor): warn in the task instruction that baseline evals create no candidate#12
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-compiler-instruction-warning

Conversation

@shehabyasser-scale

@shehabyasser-scale shehabyasser-scale commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

Stacked on #9 (harbor-3-compiler-fixes). Companion to #11: the generated instruction.md (auto_best branch) now warns the optimizer that only commits other than the seeded baseline are selectable, and that evaluating the unmodified baseline spends budget without creating a candidate.

Found in the same live Mode B smoke run as #11: the optimizer spent its whole budget measuring the baseline and walked blind into finalize's empty candidate pool. #11 makes that outcome score 0.0 instead of erroring; this PR makes the agent unlikely to hit it at all.

One rendered-content test added (test_instruction_warns_baseline_not_selectable). 9 pass.

🤖 Generated with Claude Code

Greptile Summary

This PR adds a runtime warning to the auto_best branch of the Harbor task instruction template, telling optimizers that evaluating the unmodified baseline commit spends budget without creating a selectable candidate. It was motivated by a live smoke-run failure where an optimizer exhausted its entire budget on the baseline and then hit an empty candidate pool at finalize.

  • Template change (instruction.md.j2): Appends three sentences to the {% else %} (auto_best) block explaining the baseline-exclusion rule and urging the optimizer to include at least one eval of a modified commit.
  • Test coverage (test_harbor_build.py): Adds both a positive assertion (warning text present in auto_best output) and a negative assertion (warning text absent in submit-mode output), directly closing the gap flagged in the previous review thread about the missing conditional-boundary test.

Confidence Score: 5/5

Safe to merge — the change is additive text in a documentation template and two focused tests with no logic side-effects.

The template edit is confined to the {% else %} (auto_best) branch and does not touch any control-flow code. Both the positive and negative test cases pass, and the new negative test explicitly guards the conditional boundary that was missing in the previous iteration of this PR.

No files require special attention.

Important Files Changed

Filename Overview
vero/src/vero/harbor/build/templates/instruction.md.j2 Adds baseline-exclusion warning to the auto_best ({% else %}) branch of the instruction template; the warning correctly does not appear in the submit_enabled branch.
vero/tests/test_harbor_build.py Adds two tests: one positive (auto_best warning is present) and one negative (submit mode has no warning), pinning the conditional boundary against future template regressions.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[instruction.md.j2 rendered] --> B{submit_enabled?}
    B -- Yes --> C["Step 5: nominate best commit\n(vero harbor submit)"]
    B -- No/auto_best --> D["Best commit on selection_split\nselected automatically"]
    D --> E["⚠️ WARNING: Only commits\nother than seeded baseline\nare selectable"]
    E --> F["Evaluating unmodified baseline\nspends budget without creating\na candidate"]
    F --> G["Make sure ≥1 eval is of\na commit with your changes"]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[instruction.md.j2 rendered] --> B{submit_enabled?}
    B -- Yes --> C["Step 5: nominate best commit\n(vero harbor submit)"]
    B -- No/auto_best --> D["Best commit on selection_split\nselected automatically"]
    D --> E["⚠️ WARNING: Only commits\nother than seeded baseline\nare selectable"]
    E --> F["Evaluating unmodified baseline\nspends budget without creating\na candidate"]
    F --> G["Make sure ≥1 eval is of\na commit with your changes"]
Loading

Reviews (2): Last reviewed commit: "test(harbor): pin the baseline warning t..." | Re-trigger Greptile

… no candidate

Companion to the finalize no-candidate fallback (PR #11): in auto_best
mode the generated instruction now tells the optimizer that only
non-baseline commits are selectable and that evaluating the unmodified
baseline spends budget without creating a candidate. Found live: an
optimizer that spent its whole budget measuring the baseline walked into
finalize's empty candidate pool blind.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Comment thread vero/tests/test_harbor_build.py
…ranch

Greptile on #12: only the positive case was tested, so moving the warning
outside the {% else %} would pass while polluting submit-mode tasks. A
submit-mode compile now asserts both phrases are absent.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant