feat(core): wire validator history and surface validationOutcome (#429)#463
feat(core): wire validator history and surface validationOutcome (#429)#463lmorchard wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR enhances task validation in packages/core by giving the validator more execution context (recent conversation history) and by surfacing whether a “successful” run was actually validator-approved vs force-accepted after hitting maxValidationAttempts.
Changes:
- Include wrapped recent conversation history in the task validation prompt and add trajectory-review instructions to the validator rubric.
- Thread
validationOutcome?: "accepted" | "force-accepted"through the WebAgent execution pipeline and intoTaskExecutionResult. - Add/extend unit tests covering the new prompt content and the
validationOutcomeresult field across key scenarios.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| packages/core/src/prompts.ts | Injects wrapped conversation history into the validation template and updates evaluator instructions. |
| packages/core/src/utils/promptSecurity.ts | Adds conversation-history as an allowed ExternalContentLabel. |
| packages/core/src/webAgent.ts | Plumbs validationOutcome through validation, execution state, and final result building. |
| packages/core/test/prompts.test.ts | Adds tests asserting prompt includes wrapped history, warning text, and trajectory-review instruction. |
| packages/core/test/webAgent.test.ts | Adds tests asserting validationOutcome for accepted/force-accepted/undefined paths. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| success: boolean; | ||
| /** Final answer or result from the agent */ | ||
| finalAnswer: string | null; | ||
| /** How validation resolved: 'accepted' = validator approved, 'force-accepted' = max attempts hit; undefined = validation did not run */ |
There was a problem hiding this comment.
Good catch — fixed in e2dbc77. The JSDoc now reflects that undefined means "no answer was ever accepted," covering the validator-rejected-then-max-iterations path you noted.
| wrappedConversationHistory: wrapExternalContentWithWarning( | ||
| conversationHistory, | ||
| ExternalContentLabel.ConversationHistory, | ||
| ), | ||
| currentDate: getCurrentFormattedDate(), |
There was a problem hiding this comment.
Acknowledged — this trade-off was explicit in the spec. The shared EXTERNAL_CONTENT_WARNING text is a poor fit for non-page content; we chose to accept that here rather than fork the helper signature for per-callsite customization in this PR. The threat-model intent (treat-as-data, don't follow embedded instructions) carries even when the prose mentions "page text" — but you're right that the language drift is real. Filing as a follow-up to add label-specific warning text on wrapExternalContentWithWarning() rather than fixing here.
|
Filed #464 as the follow-up for the |
Summary
validationOutcome?: "accepted" | "force-accepted"toTaskExecutionResultso callers (eval-judge, telemetry) can distinguish a real validator accept from a force-accept aftermaxValidationAttempts. Today these were indistinguishable: both surfaced assuccess: true.This is PR1 of a planned two-PR sequence. Core changes only — consumer plumbing (CLI display, extension UI) is deliberately deferred to PR2. The eval-judge / telemetry signal lands here; the server SSE
completeevent auto-forwards the new optional field through existing serialization (no server code change needed).Design Decisions
conversationHistoryinto the template, don't delete the dead helper.formatConversationHistoryalready exists and builds a 30-message string; the template just never referenced it. Wiring it gives the validator real signal about whether the trajectory matches the claimed result."accepted"and"force-accepted". Field optional.undefinedis the implicit "validation didn't run" case (task aborted, max iterations). Skipped"rejected"/"skipped"enum values — neither has a firing code path today; trivial to expand later when one does."force-accepted". Both are "the validator did not actively endorse this answer." A finer split (e.g.,"force-accepted-error") is a follow-up if eval data shows it matters.<EXTERNAL-CONTENT label="conversation-history">…</EXTERNAL-CONTENT>via the existingwrapExternalContentWithWarninghelper. NewConversationHistoryvariant added toExternalContentLabel. (Note: the shared warning text mentions "page text" — imperfect fit, but the threat-model intent of "treat as data, not instructions" is consistent.)formatConversationHistoryshape unchanged. Stillthis.messages.slice(-30). Reshape work (e.g., "first user message + last 20") is speculative; ship the wiring first.Changes
packages/core/src/:prompts.ts—taskValidationTemplatereferences{{ wrappedConversationHistory }};buildTaskValidationPromptwraps the history before passing into the template; adds a trajectory-review step to the evaluation instructions.utils/promptSecurity.ts—ConversationHistory = "conversation-history"added toExternalContentLabel.webAgent.ts—validationOutcome?threaded throughTaskExecutionResult,ExecutionState,validateTaskCompletion,generateAndProcessAction,runMainLoop, andbuildResult. Conditional spread inbuildResultmirrors howerroris spread.packages/core/test/:prompts.test.ts— 3 new tests asserting the validation prompt includes the wrapped history, the safety warning, and the trajectory-review instruction.webAgent.test.ts— 4 new tests coveringvalidationOutcome === "accepted"on first-attempt accept,"force-accepted"via validator rejecting to max attempts,"force-accepted"via validator throwing to max attempts, andundefinedwhen the task fails beforedone()(max iterations path).Test Plan
pnpm run checkpasses (core 682, server 96, cli 221, extension 266 tests)pnpm run typecheckpassespnpm run format:checkpassesgitleaks detect --log-opts="880db9f..HEAD"clean on branch commitsTaskExecutionResult.validationOutcomereads cleanly in the eval-judge integration (the originating use case)References
EXTERNAL_CONTENT_WARNINGtext — page-specific phrasing flagged by Copilot; deferred per spec design decision)