Skip to content

feat: structured progress reports on agent iteration/time limits#548

Merged
buger merged 1 commit intomainfrom
feat/agent-progress-report-on-limit
Mar 27, 2026
Merged

feat: structured progress reports on agent iteration/time limits#548
buger merged 1 commit intomainfrom
feat/agent-progress-report-on-limit

Conversation

@buger
Copy link
Copy Markdown
Collaborator

@buger buger commented Mar 27, 2026

Summary

  • When the agent hits its iteration limit or time budget, it now produces a structured progress report instead of a generic failure message
  • The report includes: Task, Completed Work, Key Findings, Attempted but Inconclusive, Not Started/Remaining, and Suggested Next Steps — enabling a follow-up agent to continue without starting over
  • Enriched _toolCallLog with result briefs, step numbers, and assistant text fragments for richer programmatic fallback reports
  • Updated all 3 shutdown paths: last-iteration prompt, time-budget exhaustion prompt, and negotiated timeout summary prompt

Test plan

  • All 3104 existing tests pass
  • Updated assertions in graceful-timeout.test.js, negotiated-timeout.test.js, code-searcher-iteration-limit.test.js
  • Manual test with Gemini at MAX_TOOL_ITERATIONS=3 — confirmed structured progress report is produced on iteration limit
  • Syntax validation passes

🤖 Generated with Claude Code

…ime limits

When the agent reaches its max iteration or time budget limit, instead of
a generic "unable to complete" message, it now produces a structured
progress report (Task, Completed Work, Key Findings, Attempted but
Inconclusive, Not Started/Remaining, Suggested Next Steps) so that a
follow-up agent can continue without starting from scratch.

Also enriches _toolCallLog with result briefs and step numbers, and
improves tool arg capture for the programmatic fallback report.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@buger buger merged commit 88434b1 into main Mar 27, 2026
13 checks passed
@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Mar 27, 2026

PR Overview: Structured Progress Reports on Agent Iteration/Time Limits

Summary

This PR enhances the ProbeAgent's shutdown behavior when hitting iteration limits or time budgets. Instead of generic failure messages, the agent now produces structured progress reports that enable follow-up agents to continue work without starting from scratch.

Key Changes

1. Enhanced Tool Call Logging (_toolCallLog)

Location: npm/src/agent/ProbeAgent.js:4208-4230

The tool call tracking now captures richer metadata:

  • resultBrief: First 500 chars of tool result output
  • step: Iteration number when the tool was called
  • _assistant_text entries: Captures AI's text output per step (up to 1000 chars)

Before:

_toolCallLog.push({ name: tc.toolName, args: tc.args || {} });

After:

const resultBrief = tr ? raw.substring(0, 500) : '';
_toolCallLog.push({ name: tc.toolName, args: tcArgs, resultBrief, step: currentIteration });
// Also tracks assistant text fragments
_toolCallLog.push({ name: '_assistant_text', args: {}, resultBrief: text.substring(0, 1000), step: currentIteration });

2. Structured Progress Report Prompts

Three shutdown paths now request structured progress reports:

A. Last Iteration Warning (prepareStep)

Location: npm/src/agent/ProbeAgent.js:4097-4145

When stepNumber === maxIterations - 1, the agent receives a detailed prompt requesting:

## Task
What was the original request / goal.

## Completed Work
What you successfully accomplished — include ALL findings, code snippets, file paths, data, and conclusions gathered.

## Key Findings
Concrete facts, answers, or data points you discovered. Include file paths with line numbers, code snippets, configuration values, etc.

## Attempted but Inconclusive
What you tried that did not yield clear results — include the approach and why it was inconclusive.

## Not Started / Remaining
What parts of the task you did not get to, and any recommendations for how to approach them.

## Suggested Next Steps
Specific, actionable steps for a follow-up agent to continue this work efficiently.

The prompt also includes a detailed activity log showing all tool calls with arguments and result briefs:

Tool activity so far:
  [step 1] search(query="authentication" path=src) → Found 15 files...
  [step 2] extract(targets=["src/auth.ts:42-80"]) → Extracted function...

B. Graceful Timeout Wind-Down

Location: npm/src/agent/ProbeAgent.js:4073-4095

When gracefulTimeoutState.triggered and first bonus step, the same structured report format is requested with the emphasis that this is a time budget constraint, not a system error.

C. Negotiated Timeout Summary

Location: npm/src/agent/ProbeAgent.js:4632-4650

When the timeout observer declines extension and tools are aborted, the summary prompt now requests the same structured format.

3. Enhanced Fallback Report on Max Iterations

Location: npm/src/agent/ProbeAgent.js:4785-4828

When the agent hits the hard iteration limit, the fallback message now includes:

## Progress Report (iteration limit reached after N steps)

### Tool Usage Summary
search: 5x, extract: 3x

### Search Queries Attempted
"authentication", "verify_credentials", "login"

### Step-by-Step Activity Log
  [step 1] search(query="authentication" exact) → Found 15 files...
  [step 2] extract(targets=["src/auth.ts:42-80"]) → Extracted function...

### Partial Findings
[Assistant text fragments from each step]

### Recommendation for Follow-Up
The iteration limit was reached before the task could be completed...

Key improvements:

  • Shows step numbers for each tool call
  • Includes result briefs (first 200 chars of output)
  • Lists assistant text fragments as partial findings
  • Provides actionable recommendations for follow-up agents

4. Test Updates

All three test files were updated to expect the new "PROGRESS REPORT" language:

  • npm/tests/unit/code-searcher-iteration-limit.test.js:87
  • npm/tests/unit/graceful-timeout.test.js:123
  • npm/tests/unit/negotiated-timeout.test.js:448, 486

Architecture & Impact Assessment

What This PR Accomplishes

  1. Enables agent handoff: When an agent runs out of iterations or time, it produces a structured report that a follow-up agent can use to continue efficiently without repeating work.

  2. Improves debugging: The enhanced _toolCallLog with result briefs and step numbers provides better visibility into what the agent actually did.

  3. Better user experience: Instead of generic "I couldn't complete your request" messages, users get detailed progress reports showing what was accomplished and what remains.

Key Technical Changes

Component Change Impact
_toolCallLog structure Added resultBrief, step fields Enables detailed activity logs in shutdown messages
_assistant_text tracking New entry type for AI text output Captures reasoning and partial findings
prepareStep last-iteration prompt Structured report format + activity log Better handoff for iteration limit
Graceful timeout prompt Structured report format Better handoff for time budget exhaustion
Negotiated timeout summary Structured report format Better handoff when observer declines extension
Max iterations fallback Enhanced with timeline and findings Fallback still useful even when LLM doesn't respond

Affected System Components

graph TD
    A[ProbeAgent Main Loop] --> B{Check Limits}
    B -->|Last Iteration| C[prepareStep - Structured Report Prompt]
    B -->|Time Budget Exhausted| D[Graceful Wind-Down - Structured Report]
    B -->|Negotiated Timeout Declined| E[Abort Summary - Structured Report]
    B -->|Hard Limit Hit| F[Fallback Report with Timeline]
    
    C --> G[Enhanced _toolCallLog]
    D --> G
    E --> G
    F --> G
    
    G --> H[resultBrief + step Numbers]
    G --> I[_assistant_text Tracking]
    
    H --> J[Structured Progress Report]
    I --> J
    
    J --> K[Follow-up Agent Can Continue]
Loading

Component Relationships

sequenceDiagram
    participant Main as ProbeAgent Main Loop
    participant prepareStep as prepareStep Callback
    participant Logger as _toolCallLog
    participant LLM as AI Model
    
    Main->>Logger: Track tool calls with resultBrief
    Main->>Logger: Track assistant text fragments
    
    Note over Main: Check iteration/time limits
    
    alt Last Iteration
        prepareStep->>LLM: Inject structured report prompt + activity log
        LLM->>Main: Generate progress report
    else Graceful Timeout
        prepareStep->>LLM: Inject structured report prompt
        LLM->>Main: Generate progress report
    else Negotiated Timeout Declined
        Main->>LLM: Request structured summary
        LLM->>Main: Generate progress report
    else Hard Limit Hit
        Main->>Main: Build fallback report from _toolCallLog
    end
    
    Main->>Main: Return structured progress report
Loading

Scope Discovery & Context Expansion

Direct Impact

  • Core agent behavior: All three shutdown paths (iteration limit, graceful timeout, negotiated timeout) now produce structured reports
  • Tool call tracking: Enhanced logging affects every tool execution in the main loop
  • Memory overhead: _toolCallLog now stores more data (result briefs up to 500 chars, assistant text up to 1000 chars)

Related Components (Inferred)

  1. Subagent delegation (src/delegate.js): Subagents inherit timeout behavior and may produce structured reports that parent agents receive

  2. Code-searcher subagents: Special handling for promptType === 'code-searcher' - they output structured JSON even on partial results (confidence: "low")

  3. MCP servers: Two-phase graceful stop coordination (_initiateGracefulStop) signals MCP servers to wind down

  4. Telemetry/tracing: The enhanced _toolCallLog data could be exposed via tracer events for debugging

  5. Configuration:

    • MAX_TOOL_ITERATIONS: Default 30 (range 1-200)
    • gracefulTimeoutBonusSteps: Default 4 (range 1-20)
    • negotiatedTimeoutBudget: Default 1800000ms (30 min)

Potential Follow-Up Areas

  1. Progress report parsing: Follow-up agents could be enhanced to parse and understand the structured report format

  2. Report schema: Could define a formal JSON schema for progress reports to enable programmatic consumption

  3. Compression: For long-running agents, _toolCallLog could grow large - consider compression or summarization

  4. UI display: Frontend could render structured progress reports in a user-friendly format

Files Changed Analysis

File Additions Deletions Net Change Purpose
npm/src/agent/ProbeAgent.js +86 -26 +60 Core implementation of structured reports and enhanced logging
npm/tests/unit/code-searcher-iteration-limit.test.js +1 -1 0 Update assertion to expect "PROGRESS REPORT"
npm/tests/unit/graceful-timeout.test.js +1 -1 0 Update assertion to expect "PROGRESS REPORT"
npm/tests/unit/negotiated-timeout.test.js +2 -2 0 Update assertions to expect "PROGRESS REPORT"

Total: 90 additions, 30 deletions across 4 files

Notable Patterns

  1. Consistent structure: All three shutdown paths use the same 5-section format (Task, Completed Work, Key Findings, Attempted but Inconclusive, Not Started/Remaining, Suggested Next Steps)

  2. Data inclusion emphasis: Prompts explicitly instruct the AI to "include ALL useful data you gathered inline — do not just say 'I found X', actually include X"

  3. Activity log injection: The tool activity log is dynamically generated from _toolCallLog and injected into prompts to provide context

  4. Backward compatibility: Code-searcher subagents still output structured JSON (unchanged), just with enhanced search details

References

Code Locations

  • Enhanced tool call logging: npm/src/agent/ProbeAgent.js:4208-4230
  • Last iteration structured prompt: npm/src/agent/ProbeAgent.js:4097-4145
  • Graceful timeout structured prompt: npm/src/agent/ProbeAgent.js:4073-4095
  • Negotiated timeout structured summary: npm/src/agent/ProbeAgent.js:4632-4650
  • Enhanced fallback report: npm/src/agent/ProbeAgent.js:4785-4828
  • Test updates:
    • npm/tests/unit/code-searcher-iteration-limit.test.js:87
    • npm/tests/unit/graceful-timeout.test.js:123
    • npm/tests/unit/negotiated-timeout.test.js:448, 486

Related Documentation

  • Timeout modes: docs/probe-agent/sdk/timeout-modes.md:55-85
  • Limits reference: docs/reference/limits.md
  • Code-searcher prompt: npm/src/agent/shared/prompts.js:36-80
  • CODE_SEARCH_SCHEMA: npm/src/tools/vercel.js:89-130
Metadata
  • Review Effort: 2 / 5
  • Primary Label: feature

Powered by Visor from Probelabs

Last updated: 2026-03-27T21:43:25.733Z | Triggered by: pr_opened | Commit: 3157d21

💡 TIP: You can chat with Visor using /visor ask <your question>

@probelabs
Copy link
Copy Markdown
Contributor

probelabs Bot commented Mar 27, 2026

Security Issues (7)

Severity Location Issue
🟢 Info npm/src/agent/ProbeAgent.js:4222-4228
The _toolCallLog array grows unbounded with each tool call and assistant text. Each entry stores up to 500 chars of resultBrief plus args, and with long-running agents this could consume significant memory.
💡 SuggestionImplement a size limit for _toolCallLog. Consider: if (_toolCallLog.length > MAX_LOG_ENTRIES) { _toolCallLog.shift(); } or use a circular buffer with fixed capacity
🟡 Warning npm/src/agent/ProbeAgent.js:4222-4228
Tool result data is stored without sanitization before being embedded in progress reports. The resultBrief field captures raw tool output (up to 500 chars) which may contain malicious content, HTML, JavaScript, or control characters that could cause injection attacks when the progress report is displayed in web UIs or logged.
💡 SuggestionSanitize tool result data before storing in resultBrief. Strip HTML tags, escape special characters, or use a dedicated sanitization library. Consider: resultBrief: sanitizeHtml(raw).substring(0, 500) or resultBrief: escapeHtmlEntities(raw).substring(0, 500)
🟡 Warning npm/src/agent/ProbeAgent.js:4231-4232
Assistant text output is stored without sanitization before being embedded in progress reports. The text field captures raw LLM output (up to 1000 chars) which may contain markdown, HTML, or malicious content that could cause XSS when displayed in web UIs.
💡 SuggestionSanitize assistant text before storing in _toolCallLog. Use a markdown sanitizer or HTML entity escaping: resultBrief: escapeHtmlEntities(text.substring(0, 1000))
🟡 Warning npm/src/agent/ProbeAgent.js:4110-4118
Tool activity log is constructed by directly embedding tool arguments and results without sanitization. The argStr and brief variables concatenate raw data that could contain injection payloads, which are then embedded in user messages sent to the LLM.
💡 SuggestionSanitize tool arguments and results before embedding in activity log. Use JSON.stringify with proper escaping or a sanitization function: const brief = tc.resultBrief ? ` → ${sanitizeText(tc.resultBrief.substring(0, 150))}` : ''
🟡 Warning npm/src/agent/ProbeAgent.js:4807-4813
Tool timeline construction directly embeds tool arguments and results without sanitization. The argStr and brief variables concatenate raw data using JSON.stringify().substring() which could truncate in the middle of escape sequences, creating malformed JSON or injection vectors.
💡 SuggestionUse proper JSON serialization with error handling instead of substring truncation. Consider: const argStr = tc.name === 'search' ? `query="${escapeString(tc.args.query || '')}"${tc.args.exact ? ' exact' : ''}` : truncateSafely(JSON.stringify(tc.args || {}), 150)
🟡 Warning npm/src/agent/ProbeAgent.js:4818-4820
Assistant text fragments are collected and embedded in progress reports without sanitization. The assistantTexts array directly concatenates resultBrief values which may contain malicious content.
💡 SuggestionSanitize assistant text fragments before including in progress report: const assistantTexts = _toolCallLog.filter(tc => tc.name === '_assistant_text' && tc.resultBrief).map(tc => sanitizeText(tc.resultBrief))
🟡 Warning npm/src/agent/ProbeAgent.js:4222-4228
Tool results are logged in plaintext without checking for sensitive data. The resultBrief field captures up to 500 characters of tool output which may contain API keys, passwords, tokens, or other sensitive information that gets logged and potentially exposed.
💡 SuggestionImplement sensitive data filtering before logging. Check for common patterns (API keys, tokens, passwords) and redact them: resultBrief: redactSensitiveData(raw).substring(0, 500)

Performance Issues (1)

Severity Location Issue
🟠 Error contract:0
Output schema validation failed: must have required property 'issues'

Quality Issues (13)

Severity Location Issue
🟠 Error npm/src/agent/ProbeAgent.js:4217-4232
Tool result extraction lacks error handling for malformed tool results. The code assumes toolResults[i] exists and has a result property, but doesn't handle cases where toolResults array might be shorter than toolCalls or results might be null/undefined.
💡 SuggestionAdd null checks and bounds checking before accessing toolResults[i].result. Consider using optional chaining and default values.
🟠 Error npm/src/agent/ProbeAgent.js:4224-4225
JSON.parse wrapped in IIFE without proper error handling. If tc.input contains invalid JSON, the parse will fail silently and return {}, which could mask data corruption issues.
💡 SuggestionAdd try-catch around the JSON.parse and log a warning when parsing fails, rather than silently returning {}.
🟡 Warning npm/src/agent/ProbeAgent.js:4227-4232
Magic numbers used for string truncation (500, 1000) without named constants. These arbitrary limits make it difficult to understand the intent and modify consistently.
💡 SuggestionExtract these to named constants like RESULT_BRIEF_MAX_LENGTH = 500 and ASSISTANT_TEXT_MAX_LENGTH = 1000.
🟡 Warning npm/src/agent/ProbeAgent.js:4102-4109
Multiple magic numbers (200, 150) for string truncation in tool activity log generation. These limits are arbitrary and lack documentation explaining why these specific values were chosen.
💡 SuggestionExtract to named constants with documentation: TOOL_ARG_MAX_LENGTH = 200, RESULT_BRIEF_DISPLAY_MAX = 150.
🟡 Warning npm/src/agent/ProbeAgent.js:4235-4239
Assistant text tracking duplicates the pattern used for tool call logging. Both sections push to _toolCallLog with similar structure but different field names, creating maintenance burden.
💡 SuggestionExtract a helper function like logToolActivity(name, args, resultBrief, step) to reduce duplication and ensure consistent formatting.
🟡 Warning npm/src/agent/ProbeAgent.js:4800-4836
Iteration limit fallback handler has high cyclomatic complexity with nested conditionals for code-searcher vs regular agent paths. The logic for building toolTimeline and assistantTexts arrays is intertwined with the summary generation.
💡 SuggestionExtract the tool timeline and assistant text collection into separate helper functions. Consider using a strategy pattern for code-searcher vs regular agent summary generation.
🟡 Warning npm/tests/unit/code-searcher-iteration-limit.test.js:87
Test assertion checks for 'PROGRESS REPORT' string which is implementation detail rather than behavioral requirement. The test would break if the wording changes even if functionality remains correct.
💡 SuggestionTest for the structural requirement (e.g., that the message contains sections like 'Task', 'Completed Work', 'Key Findings') rather than the exact phrase 'PROGRESS REPORT'.
🟡 Warning npm/tests/unit/graceful-timeout.test.js:123
Test assertion checks for 'PROGRESS REPORT' string which is an implementation detail. The test is brittle and would fail if the prompt wording changes.
💡 SuggestionTest for the functional requirement that the message instructs the AI to produce a structured report with specific sections, rather than checking for the exact phrase.
🟡 Warning npm/tests/unit/negotiated-timeout.test.js:448
Test assertion checks for 'PROGRESS REPORT' string which is an implementation detail. This makes the test fragile to wording changes.
💡 SuggestionTest for the presence of key sections (Task, Completed Work, Key Findings, etc.) rather than the exact 'PROGRESS REPORT' phrase.
🟡 Warning npm/src/agent/ProbeAgent.js:4217-4232
No tests verify the behavior when toolResults array is shorter than toolCalls array (index out of bounds scenario). The code accesses toolResults[i] without checking if it exists.
💡 SuggestionAdd a test case where toolResults.length < toolCalls.length to verify the code handles this gracefully without throwing.
🟡 Warning npm/src/agent/ProbeAgent.js:4800-4836
No tests verify the behavior when _toolCallLog contains malformed entries (e.g., missing resultBrief, null args, or invalid step numbers). The code assumes well-formed data.
💡 SuggestionAdd tests that verify graceful handling of corrupted _toolCallLog entries, including null/undefined values and missing properties.
🟡 Warning npm/src/agent/ProbeAgent.js:4102-4109
Tool activity log truncates argStr to 200 characters and resultBrief to 150 characters without indicating truncation. Users may not know that critical information was cut off.
💡 SuggestionAdd an ellipsis (...) indicator when content is truncated, or consider logging a warning when truncation occurs.
🟡 Warning npm/src/agent/ProbeAgent.js:4227-4232
Result brief truncated to 500 characters and assistant text to 1000 characters without truncation indicators. Critical information may be silently dropped.
💡 SuggestionAdd '...' suffix when truncating to indicate data was cut off, or log the truncation for debugging.

Powered by Visor from Probelabs

Last updated: 2026-03-27T21:42:30.338Z | Triggered by: pr_opened | Commit: 3157d21

💡 TIP: You can chat with Visor using /visor ask <your question>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant