fix(harbor): warn that sample_timeout is not enforced in Mode B#20
Open
shehabyasser-scale wants to merge 2 commits into
Open
fix(harbor): warn that sample_timeout is not enforced in Mode B#20shehabyasser-scale wants to merge 2 commits into
shehabyasser-scale wants to merge 2 commits into
Conversation
Found live: a Mode B config carried sample_timeout: 1200 through several optimization trials and it never did anything. The nested harbor run applies each task's own harbor-configured timeouts; the only vero-side cap is timeout on the whole nested run. An author who set sample_timeout expecting a per-task cap silently gets none, which matters when nested tasks can legitimately run 30-40 minutes (TB2). The sidecar now warns at startup when sample_timeout is explicitly set alongside a Mode B config, pointing at harbor.extra_args (e.g. --agent-timeout-multiplier) as the real lever. Mode A is unaffected. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Greptile on #20: .message is only set by Formatter.format(), which pytest's capture handler never calls; caplog.messages goes through getMessage() and is stable across pytest versions. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ee72f91 to
f93e25f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #19. Found live: a Mode B config carried
sample_timeout: 1200through several optimization trials and it never did anything. In Mode B the nestedharbor runapplies each task's own harbor-configured timeouts; the only vero-side cap istimeouton the whole nested run. TB2 tasks can legitimately run 30-40 minutes, so an author relying on a silent no-op cap gets surprising wall-clock and cost.The sidecar now warns at startup when
sample_timeoutis explicitly set (viamodel_fields_set, so the default doesn't warn) alongside a Mode B config, pointing atharbor.extra_args(e.g.--agent-timeout-multiplier) as the real lever. Mode A behavior unchanged.1 test (warns in Mode B when set; silent for default and for Mode A). 13 pass.
🤖 Generated with Claude Code
Greptile Summary
Adds a startup warning when
sample_timeoutis explicitly set alongside a Mode B (harbor) config, where it has no effect. The check uses Pydantic v2'smodel_fields_setso only explicit author-supplied values trigger the warning, not the 180 s default._warn_mode_b_sample_timeoutis a small, focused function placed inbuild_componentsafter the existing Mode A integrity guard; it correctly short-circuits on both the absence ofharborand the absence of an explicitsample_timeout.harbor=None.Confidence Score: 5/5
Safe to merge — adds a diagnostic-only warning with no changes to runtime behavior or data paths.
The change is purely additive: a warning emitted at startup when a no-op config key is detected. No evaluation logic, budget accounting, or API surface is modified. The model_fields_set gate correctly distinguishes explicit author intent from the default, and the test validates all three cases.
No files require special attention.
Important Files Changed
_warn_mode_b_sample_timeouthelper and calls it inbuild_components; logic is correct, uses lazy%slog formatting, andmodel_fields_setdetection is idiomatic Pydantic v2.test_mode_b_sample_timeout_warnscovers warn/no-warn paths correctly and usescaplog.messages(notr.message) per the existing review thread fix.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[build_components called] --> B{config.harbor is None?} B -- Yes, Mode A --> C{task_project set?} C -- No --> D[raise ValueError] C -- Yes --> E[_warn_mode_b_sample_timeout] B -- No, Mode B --> E E --> F{"sample_timeout in\nmodel_fields_set?"} F -- Yes --> G[logger.warning:\nsample_timeout not enforced\nin Mode B] F -- No --> H[Silent — default value,\nno author intent] G --> I[Continue: build workspace,\nledger, evaluator, engine] H --> I%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%% flowchart TD A[build_components called] --> B{config.harbor is None?} B -- Yes, Mode A --> C{task_project set?} C -- No --> D[raise ValueError] C -- Yes --> E[_warn_mode_b_sample_timeout] B -- No, Mode B --> E E --> F{"sample_timeout in\nmodel_fields_set?"} F -- Yes --> G[logger.warning:\nsample_timeout not enforced\nin Mode B] F -- No --> H[Silent — default value,\nno author intent] G --> I[Continue: build workspace,\nledger, evaluator, engine] H --> IReviews (2): Last reviewed commit: "test(harbor): use caplog.messages instea..." | Re-trigger Greptile