Skip to content

Emit compact end-of-run failure index for LLM/agent consumption #8827

Description

@Evangelink

Summary

At the end of a Microsoft.Testing.Platform test run, emit a compact, sentinel-delimited footer that lists every failed test as a single line of Test FQN | first-line error | file:line. Default-on in LLM-detected environments only (see LLMEnvironmentDetector).

Motivation

When a test run fails, an LLM agent currently has to re-read the entire terminal transcript to figure out what failed and where. The per-failure blocks are paragraph-style (message + expected/actual + stack), suitable for a human scanning visually but expensive in tokens when the agent only needs the structured summary to plan its next step.

A small machine-friendly footer at a stable position is the lowest-cost way to give the agent a "table of contents" of failures.

Proposed shape

##[failures]
TestFx.UnitTests.FooTests.WhenBarThenBaz | Expected 42 but was 41 | src/Foo.cs:128
TestFx.UnitTests.FooTests.WhenBarThenQux | NullReferenceException: Object reference not set... | src/Foo.cs:142
##[/failures]

Properties:

  • Sentinel-delimited (##[failures] ... ##[/failures]) so an agent can extract the block with a 2-line regex without parsing the rest of the transcript.
  • One line per failure. Fields separated by |. Pipe-in-value escaped as \|.
  • Test FQN comes from the test node display path.
  • First-line error: first non-empty line of the failure message, truncated to ~200 chars (configurable).
  • file:line extracted from the first user-code frame (after the filter from Honor NO_COLOR environment variable in MTP terminal output #8825 / related). Empty if not found.

Behavior

  • Default-on when LLMEnvironmentDetector.IsLLMEnvironment() returns true.
  • Default-off otherwise, to keep the human terminal transcript stable.
  • For v1 do not add a CLI option, to avoid --help/--info acceptance churn and the XLF round.
  • If desired in v2, add --failure-summary <auto|on|off>.

Implementation notes

  • TerminalTestReporter already tracks per-assembly state and emits an AppendTestRunSummary at the end (see TerminalTestReporter.Summary.cs). The failure index should be appended right after that summary block.
  • Today TerminalTestReporter.TestCompleted renders failed details and discards them. To build the index, capture a bounded list of (displayName, firstErrorLine, firstUserCodeFileLine) per failed test as we render it.
  • Cap the captured list at e.g. 5,000 entries to avoid OOM in giant runs; if exceeded, emit ## ...truncated, N more failures....
  • Verify behavior under multi-assembly runs (one footer per assembly vs one global footer at the end — global at the end is preferable for agents).

Risks

  • Adding any new lines to the terminal transcript can break existing scrapers. Mitigation: LLM-mode-gated default.
  • Capturing failure state grows reporter memory. Mitigation: cap + truncate.

Acceptance criteria

  • In LLM mode, after a failed run the transcript contains exactly one ##[failures]/##[/failures] block listing all failures.
  • In human mode (no LLM env vars), output is identical to today.
  • Tests cover: zero failures (no block), single failure, many failures, multi-line error message, missing file path.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/mtpMicrosoft.Testing.Platform core library.area/terminal-reporterConsole / terminal test reporter.needs/triageNeeds triage by a maintainer.
    No fields configured for Feature.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions