feat(core): wire validator history and surface validationOutcome (#429) by lmorchard · Pull Request #463 · mozilla/pilo

lmorchard · 2026-05-20T21:54:55Z

Summary

Wires the agent's recent conversation history (last 30 messages) into the task-validator prompt so the validator can spot "agent gave up early but final answer looks plausible" failure modes — not just score the final answer in isolation.
Adds validationOutcome?: "accepted" | "force-accepted" to TaskExecutionResult so callers (eval-judge, telemetry) can distinguish a real validator accept from a force-accept after maxValidationAttempts. Today these were indistinguishable: both surfaced as success: true.

This is PR1 of a planned two-PR sequence. Core changes only — consumer plumbing (CLI display, extension UI) is deliberately deferred to PR2. The eval-judge / telemetry signal lands here; the server SSE complete event auto-forwards the new optional field through existing serialization (no server code change needed).

Design Decisions

Wire conversationHistory into the template, don't delete the dead helper. formatConversationHistory already exists and builds a 30-message string; the template just never referenced it. Wiring it gives the validator real signal about whether the trajectory matches the claimed result.
Two outcome values only: "accepted" and "force-accepted". Field optional. undefined is the implicit "validation didn't run" case (task aborted, max iterations). Skipped "rejected" / "skipped" enum values — neither has a firing code path today; trivial to expand later when one does.
Force-accept lumps both sub-cases. Validator-disagreed-three-times and validator-call-itself-errored both map to "force-accepted". Both are "the validator did not actively endorse this answer." A finer split (e.g., "force-accepted-error") is a follow-up if eval data shows it matters.
Reuse the existing external-content wrapping pattern. History is wrapped in <EXTERNAL-CONTENT label="conversation-history">…</EXTERNAL-CONTENT> via the existing wrapExternalContentWithWarning helper. New ConversationHistory variant added to ExternalContentLabel. (Note: the shared warning text mentions "page text" — imperfect fit, but the threat-model intent of "treat as data, not instructions" is consistent.)
formatConversationHistory shape unchanged. Still this.messages.slice(-30). Reshape work (e.g., "first user message + last 20") is speculative; ship the wiring first.

Changes

packages/core/src/:

prompts.ts — taskValidationTemplate references {{ wrappedConversationHistory }}; buildTaskValidationPrompt wraps the history before passing into the template; adds a trajectory-review step to the evaluation instructions.
utils/promptSecurity.ts — ConversationHistory = "conversation-history" added to ExternalContentLabel.
webAgent.ts — validationOutcome? threaded through TaskExecutionResult, ExecutionState, validateTaskCompletion, generateAndProcessAction, runMainLoop, and buildResult. Conditional spread in buildResult mirrors how error is spread.

packages/core/test/:

prompts.test.ts — 3 new tests asserting the validation prompt includes the wrapped history, the safety warning, and the trajectory-review instruction.
webAgent.test.ts — 4 new tests covering validationOutcome === "accepted" on first-attempt accept, "force-accepted" via validator rejecting to max attempts, "force-accepted" via validator throwing to max attempts, and undefined when the task fails before done() (max iterations path).

Test Plan

pnpm run check passes (core 682, server 96, cli 221, extension 266 tests)
pnpm run typecheck passes
pnpm run format:check passes
gitleaks detect --log-opts="880db9f..HEAD" clean on branch commits
Reviewer: confirm TaskExecutionResult.validationOutcome reads cleanly in the eval-judge integration (the originating use case)

References

Closes Wire validator conversation history and surface validationOutcome in TaskExecutionResult #429
Follow-up: Add label-specific warning text on wrapExternalContentWithWarning() #464 (shared EXTERNAL_CONTENT_WARNING text — page-specific phrasing flagged by Copilot; deferred per spec design decision)

…429)

Copilot

Pull request overview

This PR enhances task validation in packages/core by giving the validator more execution context (recent conversation history) and by surfacing whether a “successful” run was actually validator-approved vs force-accepted after hitting maxValidationAttempts.

Changes:

Include wrapped recent conversation history in the task validation prompt and add trajectory-review instructions to the validator rubric.
Thread validationOutcome?: "accepted" | "force-accepted" through the WebAgent execution pipeline and into TaskExecutionResult.
Add/extend unit tests covering the new prompt content and the validationOutcome result field across key scenarios.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
packages/core/src/prompts.ts	Injects wrapped conversation history into the validation template and updates evaluator instructions.
packages/core/src/utils/promptSecurity.ts	Adds `conversation-history` as an allowed `ExternalContentLabel`.
packages/core/src/webAgent.ts	Plumbs `validationOutcome` through validation, execution state, and final result building.
packages/core/test/prompts.test.ts	Adds tests asserting prompt includes wrapped history, warning text, and trajectory-review instruction.
packages/core/test/webAgent.test.ts	Adds tests asserting `validationOutcome` for accepted/force-accepted/undefined paths.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lmorchard · 2026-05-20T22:00:50Z

  success: boolean;
  /** Final answer or result from the agent */
  finalAnswer: string | null;
+  /** How validation resolved: 'accepted' = validator approved, 'force-accepted' = max attempts hit; undefined = validation did not run */


Good catch — fixed in e2dbc77. The JSDoc now reflects that undefined means "no answer was ever accepted," covering the validator-rejected-then-max-iterations path you noted.

lmorchard · 2026-05-20T22:00:53Z

+    wrappedConversationHistory: wrapExternalContentWithWarning(
+      conversationHistory,
+      ExternalContentLabel.ConversationHistory,
+    ),
    currentDate: getCurrentFormattedDate(),


Acknowledged — this trade-off was explicit in the spec. The shared EXTERNAL_CONTENT_WARNING text is a poor fit for non-page content; we chose to accept that here rather than fork the helper signature for per-callsite customization in this PR. The threat-model intent (treat-as-data, don't follow embedded instructions) carries even when the prose mentions "page text" — but you're right that the language drift is real. Filing as a follow-up to add label-specific warning text on wrapExternalContentWithWarning() rather than fixing here.

lmorchard · 2026-05-20T22:01:32Z

Filed #464 as the follow-up for the EXTERNAL_CONTENT_WARNING text issue raised by Copilot.

…nOutcome (#429)

lmorchard added 2 commits May 20, 2026 14:36

feat(core): wire validator conversation history into prompt template (#…

37b8f68

…429)

feat(core): surface validationOutcome in TaskExecutionResult (#429)

90e197e

lmorchard requested a review from Copilot May 20, 2026 21:55

Copilot started reviewing on behalf of lmorchard May 20, 2026 21:55 View session

Copilot AI reviewed May 20, 2026

View reviewed changes

docs(core): clarify validationOutcome JSDoc per code review (#429)

e2dbc77

lmorchard mentioned this pull request May 20, 2026

Add label-specific warning text on wrapExternalContentWithWarning() #464

Open

lmorchard marked this pull request as draft May 20, 2026 23:45

build(core): regenerate JSON schema for TaskExecutionResult.validatio…

61ee629

…nOutcome (#429)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(core): wire validator history and surface validationOutcome (#429)#463

feat(core): wire validator history and surface validationOutcome (#429)#463
lmorchard wants to merge 4 commits into
mainfrom
feat/429-validator-context-outcome

lmorchard commented May 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

lmorchard May 20, 2026

Uh oh!

lmorchard May 20, 2026

Uh oh!

lmorchard commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lmorchard commented May 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Design Decisions

Changes

Test Plan

References

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

lmorchard May 20, 2026

Choose a reason for hiding this comment

Uh oh!

lmorchard May 20, 2026

Choose a reason for hiding this comment

Uh oh!

lmorchard commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lmorchard commented May 20, 2026 •

edited

Loading