docs(harbor): warn in the task instruction that baseline evals create no candidate by shehabyasser-scale · Pull Request #12 · scaleapi/vero

shehabyasser-scale · 2026-07-02T10:34:53Z

Stacked on #9 (harbor-3-compiler-fixes). Companion to #11: the generated instruction.md (auto_best branch) now warns the optimizer that only commits other than the seeded baseline are selectable, and that evaluating the unmodified baseline spends budget without creating a candidate.

Found in the same live Mode B smoke run as #11: the optimizer spent its whole budget measuring the baseline and walked blind into finalize's empty candidate pool. #11 makes that outcome score 0.0 instead of erroring; this PR makes the agent unlikely to hit it at all.

One rendered-content test added (test_instruction_warns_baseline_not_selectable). 9 pass.

🤖 Generated with Claude Code

Greptile Summary

This PR adds a runtime warning to the auto_best branch of the Harbor task instruction template, telling optimizers that evaluating the unmodified baseline commit spends budget without creating a selectable candidate. It was motivated by a live smoke-run failure where an optimizer exhausted its entire budget on the baseline and then hit an empty candidate pool at finalize.

Template change (instruction.md.j2): Appends three sentences to the {% else %} (auto_best) block explaining the baseline-exclusion rule and urging the optimizer to include at least one eval of a modified commit.
Test coverage (test_harbor_build.py): Adds both a positive assertion (warning text present in auto_best output) and a negative assertion (warning text absent in submit-mode output), directly closing the gap flagged in the previous review thread about the missing conditional-boundary test.

Confidence Score: 5/5

Safe to merge — the change is additive text in a documentation template and two focused tests with no logic side-effects.

The template edit is confined to the {% else %} (auto_best) branch and does not touch any control-flow code. Both the positive and negative test cases pass, and the new negative test explicitly guards the conditional boundary that was missing in the previous iteration of this PR.

No files require special attention.

Important Files Changed

Filename	Overview
vero/src/vero/harbor/build/templates/instruction.md.j2	Adds baseline-exclusion warning to the auto_best ({% else %}) branch of the instruction template; the warning correctly does not appear in the submit_enabled branch.
vero/tests/test_harbor_build.py	Adds two tests: one positive (auto_best warning is present) and one negative (submit mode has no warning), pinning the conditional boundary against future template regressions.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[instruction.md.j2 rendered] --> B{submit_enabled?}
    B -- Yes --> C["Step 5: nominate best commit\n(vero harbor submit)"]
    B -- No/auto_best --> D["Best commit on selection_split\nselected automatically"]
    D --> E["⚠️ WARNING: Only commits\nother than seeded baseline\nare selectable"]
    E --> F["Evaluating unmodified baseline\nspends budget without creating\na candidate"]
    F --> G["Make sure ≥1 eval is of\na commit with your changes"]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[instruction.md.j2 rendered] --> B{submit_enabled?}
    B -- Yes --> C["Step 5: nominate best commit\n(vero harbor submit)"]
    B -- No/auto_best --> D["Best commit on selection_split\nselected automatically"]
    D --> E["⚠️ WARNING: Only commits\nother than seeded baseline\nare selectable"]
    E --> F["Evaluating unmodified baseline\nspends budget without creating\na candidate"]
    F --> G["Make sure ≥1 eval is of\na commit with your changes"]

_{Reviews (2): Last reviewed commit: "test(harbor): pin the baseline warning t..." | Re-trigger Greptile}

… no candidate Companion to the finalize no-candidate fallback (PR #11): in auto_best mode the generated instruction now tells the optimizer that only non-baseline commits are selectable and that evaluating the unmodified baseline spends budget without creating a candidate. Found live: an optimizer that spent its whole budget measuring the baseline walked into finalize's empty candidate pool blind. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ranch Greptile on #12: only the positive case was tested, so moving the warning outside the {% else %} would pass while polluting submit-mode tasks. A submit-mode compile now asserts both phrases are absent. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps Bot reviewed Jul 2, 2026

View reviewed changes

Comment thread vero/tests/test_harbor_build.py

This was referenced Jul 3, 2026

feat(harbor): the agent's first baseline eval is budget-free #25

Open

fix(harbor): instruction advertises the free baseline eval, gated on sidecar support #26

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs(harbor): warn in the task instruction that baseline evals create no candidate#12

docs(harbor): warn in the task instruction that baseline evals create no candidate#12
shehabyasser-scale wants to merge 2 commits into
harbor-3-compiler-fixesfrom
harbor-3-compiler-instruction-warning

shehabyasser-scale commented Jul 2, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

shehabyasser-scale commented Jul 2, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

shehabyasser-scale commented Jul 2, 2026 •

edited by greptile-apps Bot

Loading