feat: structured progress reports on agent iteration/time limits by buger · Pull Request #548 · probelabs/probe

buger · 2026-03-27T21:25:08Z

Summary

When the agent hits its iteration limit or time budget, it now produces a structured progress report instead of a generic failure message
The report includes: Task, Completed Work, Key Findings, Attempted but Inconclusive, Not Started/Remaining, and Suggested Next Steps — enabling a follow-up agent to continue without starting over
Enriched _toolCallLog with result briefs, step numbers, and assistant text fragments for richer programmatic fallback reports
Updated all 3 shutdown paths: last-iteration prompt, time-budget exhaustion prompt, and negotiated timeout summary prompt

Test plan

All 3104 existing tests pass
Updated assertions in graceful-timeout.test.js, negotiated-timeout.test.js, code-searcher-iteration-limit.test.js
Manual test with Gemini at MAX_TOOL_ITERATIONS=3 — confirmed structured progress report is produced on iteration limit
Syntax validation passes

🤖 Generated with Claude Code

…ime limits When the agent reaches its max iteration or time budget limit, instead of a generic "unable to complete" message, it now produces a structured progress report (Task, Completed Work, Key Findings, Attempted but Inconclusive, Not Started/Remaining, Suggested Next Steps) so that a follow-up agent can continue without starting from scratch. Also enriches _toolCallLog with result briefs and step numbers, and improves tool arg capture for the programmatic fallback report. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

probelabs · 2026-03-27T21:29:54Z

PR Overview: Structured Progress Reports on Agent Iteration/Time Limits

Summary

This PR enhances the ProbeAgent's shutdown behavior when hitting iteration limits or time budgets. Instead of generic failure messages, the agent now produces structured progress reports that enable follow-up agents to continue work without starting from scratch.

Key Changes

1. Enhanced Tool Call Logging (`_toolCallLog`)

Location: npm/src/agent/ProbeAgent.js:4208-4230

The tool call tracking now captures richer metadata:

resultBrief: First 500 chars of tool result output
step: Iteration number when the tool was called
_assistant_text entries: Captures AI's text output per step (up to 1000 chars)

Before:

_toolCallLog.push({ name: tc.toolName, args: tc.args || {} });

After:

const resultBrief = tr ? raw.substring(0, 500) : '';
_toolCallLog.push({ name: tc.toolName, args: tcArgs, resultBrief, step: currentIteration });
// Also tracks assistant text fragments
_toolCallLog.push({ name: '_assistant_text', args: {}, resultBrief: text.substring(0, 1000), step: currentIteration });

2. Structured Progress Report Prompts

Three shutdown paths now request structured progress reports:

A. Last Iteration Warning (`prepareStep`)

Location: npm/src/agent/ProbeAgent.js:4097-4145

When stepNumber === maxIterations - 1, the agent receives a detailed prompt requesting:

## Task
What was the original request / goal.

## Completed Work
What you successfully accomplished — include ALL findings, code snippets, file paths, data, and conclusions gathered.

## Key Findings
Concrete facts, answers, or data points you discovered. Include file paths with line numbers, code snippets, configuration values, etc.

## Attempted but Inconclusive
What you tried that did not yield clear results — include the approach and why it was inconclusive.

## Not Started / Remaining
What parts of the task you did not get to, and any recommendations for how to approach them.

## Suggested Next Steps
Specific, actionable steps for a follow-up agent to continue this work efficiently.

The prompt also includes a detailed activity log showing all tool calls with arguments and result briefs:

Tool activity so far:
  [step 1] search(query="authentication" path=src) → Found 15 files...
  [step 2] extract(targets=["src/auth.ts:42-80"]) → Extracted function...

B. Graceful Timeout Wind-Down

Location: npm/src/agent/ProbeAgent.js:4073-4095

When gracefulTimeoutState.triggered and first bonus step, the same structured report format is requested with the emphasis that this is a time budget constraint, not a system error.

C. Negotiated Timeout Summary

Location: npm/src/agent/ProbeAgent.js:4632-4650

When the timeout observer declines extension and tools are aborted, the summary prompt now requests the same structured format.

3. Enhanced Fallback Report on Max Iterations

Location: npm/src/agent/ProbeAgent.js:4785-4828

When the agent hits the hard iteration limit, the fallback message now includes:

## Progress Report (iteration limit reached after N steps)

### Tool Usage Summary
search: 5x, extract: 3x

### Search Queries Attempted
"authentication", "verify_credentials", "login"

### Step-by-Step Activity Log
  [step 1] search(query="authentication" exact) → Found 15 files...
  [step 2] extract(targets=["src/auth.ts:42-80"]) → Extracted function...

### Partial Findings
[Assistant text fragments from each step]

### Recommendation for Follow-Up
The iteration limit was reached before the task could be completed...

Key improvements:

Shows step numbers for each tool call
Includes result briefs (first 200 chars of output)
Lists assistant text fragments as partial findings
Provides actionable recommendations for follow-up agents

4. Test Updates

All three test files were updated to expect the new "PROGRESS REPORT" language:

npm/tests/unit/code-searcher-iteration-limit.test.js:87
npm/tests/unit/graceful-timeout.test.js:123
npm/tests/unit/negotiated-timeout.test.js:448, 486

Architecture & Impact Assessment

What This PR Accomplishes

Enables agent handoff: When an agent runs out of iterations or time, it produces a structured report that a follow-up agent can use to continue efficiently without repeating work.
Improves debugging: The enhanced _toolCallLog with result briefs and step numbers provides better visibility into what the agent actually did.
Better user experience: Instead of generic "I couldn't complete your request" messages, users get detailed progress reports showing what was accomplished and what remains.

Key Technical Changes

Component	Change	Impact
`_toolCallLog` structure	Added `resultBrief`, `step` fields	Enables detailed activity logs in shutdown messages
`_assistant_text` tracking	New entry type for AI text output	Captures reasoning and partial findings
`prepareStep` last-iteration prompt	Structured report format + activity log	Better handoff for iteration limit
Graceful timeout prompt	Structured report format	Better handoff for time budget exhaustion
Negotiated timeout summary	Structured report format	Better handoff when observer declines extension
Max iterations fallback	Enhanced with timeline and findings	Fallback still useful even when LLM doesn't respond

Affected System Components

graph TD
    A[ProbeAgent Main Loop] --> B{Check Limits}
    B -->|Last Iteration| C[prepareStep - Structured Report Prompt]
    B -->|Time Budget Exhausted| D[Graceful Wind-Down - Structured Report]
    B -->|Negotiated Timeout Declined| E[Abort Summary - Structured Report]
    B -->|Hard Limit Hit| F[Fallback Report with Timeline]
    
    C --> G[Enhanced _toolCallLog]
    D --> G
    E --> G
    F --> G
    
    G --> H[resultBrief + step Numbers]
    G --> I[_assistant_text Tracking]
    
    H --> J[Structured Progress Report]
    I --> J
    
    J --> K[Follow-up Agent Can Continue]

Component Relationships

sequenceDiagram
    participant Main as ProbeAgent Main Loop
    participant prepareStep as prepareStep Callback
    participant Logger as _toolCallLog
    participant LLM as AI Model
    
    Main->>Logger: Track tool calls with resultBrief
    Main->>Logger: Track assistant text fragments
    
    Note over Main: Check iteration/time limits
    
    alt Last Iteration
        prepareStep->>LLM: Inject structured report prompt + activity log
        LLM->>Main: Generate progress report
    else Graceful Timeout
        prepareStep->>LLM: Inject structured report prompt
        LLM->>Main: Generate progress report
    else Negotiated Timeout Declined
        Main->>LLM: Request structured summary
        LLM->>Main: Generate progress report
    else Hard Limit Hit
        Main->>Main: Build fallback report from _toolCallLog
    end
    
    Main->>Main: Return structured progress report

Scope Discovery & Context Expansion

Direct Impact

Core agent behavior: All three shutdown paths (iteration limit, graceful timeout, negotiated timeout) now produce structured reports
Tool call tracking: Enhanced logging affects every tool execution in the main loop
Memory overhead: _toolCallLog now stores more data (result briefs up to 500 chars, assistant text up to 1000 chars)

Related Components (Inferred)

Subagent delegation (src/delegate.js): Subagents inherit timeout behavior and may produce structured reports that parent agents receive
Code-searcher subagents: Special handling for promptType === 'code-searcher' - they output structured JSON even on partial results (confidence: "low")
MCP servers: Two-phase graceful stop coordination (_initiateGracefulStop) signals MCP servers to wind down
Telemetry/tracing: The enhanced _toolCallLog data could be exposed via tracer events for debugging
Configuration:
- MAX_TOOL_ITERATIONS: Default 30 (range 1-200)
- gracefulTimeoutBonusSteps: Default 4 (range 1-20)
- negotiatedTimeoutBudget: Default 1800000ms (30 min)

Potential Follow-Up Areas

Progress report parsing: Follow-up agents could be enhanced to parse and understand the structured report format
Report schema: Could define a formal JSON schema for progress reports to enable programmatic consumption
Compression: For long-running agents, _toolCallLog could grow large - consider compression or summarization
UI display: Frontend could render structured progress reports in a user-friendly format

Files Changed Analysis

File	Additions	Deletions	Net Change	Purpose
`npm/src/agent/ProbeAgent.js`	+86	-26	+60	Core implementation of structured reports and enhanced logging
`npm/tests/unit/code-searcher-iteration-limit.test.js`	+1	-1	0	Update assertion to expect "PROGRESS REPORT"
`npm/tests/unit/graceful-timeout.test.js`	+1	-1	0	Update assertion to expect "PROGRESS REPORT"
`npm/tests/unit/negotiated-timeout.test.js`	+2	-2	0	Update assertions to expect "PROGRESS REPORT"

Total: 90 additions, 30 deletions across 4 files

Notable Patterns

Consistent structure: All three shutdown paths use the same 5-section format (Task, Completed Work, Key Findings, Attempted but Inconclusive, Not Started/Remaining, Suggested Next Steps)
Data inclusion emphasis: Prompts explicitly instruct the AI to "include ALL useful data you gathered inline — do not just say 'I found X', actually include X"
Activity log injection: The tool activity log is dynamically generated from _toolCallLog and injected into prompts to provide context
Backward compatibility: Code-searcher subagents still output structured JSON (unchanged), just with enhanced search details

References

Code Locations

Enhanced tool call logging: npm/src/agent/ProbeAgent.js:4208-4230
Last iteration structured prompt: npm/src/agent/ProbeAgent.js:4097-4145
Graceful timeout structured prompt: npm/src/agent/ProbeAgent.js:4073-4095
Negotiated timeout structured summary: npm/src/agent/ProbeAgent.js:4632-4650
Enhanced fallback report: npm/src/agent/ProbeAgent.js:4785-4828
Test updates:
- npm/tests/unit/code-searcher-iteration-limit.test.js:87
- npm/tests/unit/graceful-timeout.test.js:123
- npm/tests/unit/negotiated-timeout.test.js:448, 486

Security Issues (7)

Severity	Location	Issue
🟢 Info	`npm/src/agent/ProbeAgent.js:4222-4228`	The _toolCallLog array grows unbounded with each tool call and assistant text. Each entry stores up to 500 chars of resultBrief plus args, and with long-running agents this could consume significant memory. 💡 Suggestion Implement a size limit for _toolCallLog. Consider: if (_toolCallLog.length > MAX_LOG_ENTRIES) { _toolCallLog.shift(); } or use a circular buffer with fixed capacity
🟡 Warning	`npm/src/agent/ProbeAgent.js:4222-4228`	Tool result data is stored without sanitization before being embedded in progress reports. The resultBrief field captures raw tool output (up to 500 chars) which may contain malicious content, HTML, JavaScript, or control characters that could cause injection attacks when the progress report is displayed in web UIs or logged. 💡 Suggestion Sanitize tool result data before storing in resultBrief. Strip HTML tags, escape special characters, or use a dedicated sanitization library. Consider: resultBrief: sanitizeHtml(raw).substring(0, 500) or resultBrief: escapeHtmlEntities(raw).substring(0, 500)
🟡 Warning	`npm/src/agent/ProbeAgent.js:4231-4232`	Assistant text output is stored without sanitization before being embedded in progress reports. The text field captures raw LLM output (up to 1000 chars) which may contain markdown, HTML, or malicious content that could cause XSS when displayed in web UIs. 💡 Suggestion Sanitize assistant text before storing in _toolCallLog. Use a markdown sanitizer or HTML entity escaping: resultBrief: escapeHtmlEntities(text.substring(0, 1000))
🟡 Warning	`npm/src/agent/ProbeAgent.js:4110-4118`	Tool activity log is constructed by directly embedding tool arguments and results without sanitization. The argStr and brief variables concatenate raw data that could contain injection payloads, which are then embedded in user messages sent to the LLM. 💡 Suggestion Sanitize tool arguments and results before embedding in activity log. Use JSON.stringify with proper escaping or a sanitization function: const brief = tc.resultBrief ? ` → ${sanitizeText(tc.resultBrief.substring(0, 150))}` : ''
🟡 Warning	`npm/src/agent/ProbeAgent.js:4807-4813`	Tool timeline construction directly embeds tool arguments and results without sanitization. The argStr and brief variables concatenate raw data using JSON.stringify().substring() which could truncate in the middle of escape sequences, creating malformed JSON or injection vectors. 💡 Suggestion Use proper JSON serialization with error handling instead of substring truncation. Consider: const argStr = tc.name === 'search' ? `query="${escapeString(tc.args.query \|\| '')}"${tc.args.exact ? ' exact' : ''}` : truncateSafely(JSON.stringify(tc.args \|\| {}), 150)
🟡 Warning	`npm/src/agent/ProbeAgent.js:4818-4820`	Assistant text fragments are collected and embedded in progress reports without sanitization. The assistantTexts array directly concatenates resultBrief values which may contain malicious content. 💡 Suggestion Sanitize assistant text fragments before including in progress report: const assistantTexts = _toolCallLog.filter(tc => tc.name === '_assistant_text' && tc.resultBrief).map(tc => sanitizeText(tc.resultBrief))
🟡 Warning	`npm/src/agent/ProbeAgent.js:4222-4228`	Tool results are logged in plaintext without checking for sensitive data. The resultBrief field captures up to 500 characters of tool output which may contain API keys, passwords, tokens, or other sensitive information that gets logged and potentially exposed. 💡 Suggestion Implement sensitive data filtering before logging. Check for common patterns (API keys, tokens, passwords) and redact them: resultBrief: redactSensitiveData(raw).substring(0, 500)

Performance Issues (1)

Severity	Location	Issue
🟠 Error	`contract:0`	Output schema validation failed: must have required property 'issues'

Quality Issues (13)

Severity	Location	Issue
🟠 Error	`npm/src/agent/ProbeAgent.js:4217-4232`	Tool result extraction lacks error handling for malformed tool results. The code assumes toolResults[i] exists and has a result property, but doesn't handle cases where toolResults array might be shorter than toolCalls or results might be null/undefined. 💡 Suggestion Add null checks and bounds checking before accessing toolResults[i].result. Consider using optional chaining and default values.
🟠 Error	`npm/src/agent/ProbeAgent.js:4224-4225`	JSON.parse wrapped in IIFE without proper error handling. If tc.input contains invalid JSON, the parse will fail silently and return {}, which could mask data corruption issues. 💡 Suggestion Add try-catch around the JSON.parse and log a warning when parsing fails, rather than silently returning {}.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4227-4232`	Magic numbers used for string truncation (500, 1000) without named constants. These arbitrary limits make it difficult to understand the intent and modify consistently. 💡 Suggestion Extract these to named constants like RESULT_BRIEF_MAX_LENGTH = 500 and ASSISTANT_TEXT_MAX_LENGTH = 1000.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4102-4109`	Multiple magic numbers (200, 150) for string truncation in tool activity log generation. These limits are arbitrary and lack documentation explaining why these specific values were chosen. 💡 Suggestion Extract to named constants with documentation: TOOL_ARG_MAX_LENGTH = 200, RESULT_BRIEF_DISPLAY_MAX = 150.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4235-4239`	Assistant text tracking duplicates the pattern used for tool call logging. Both sections push to _toolCallLog with similar structure but different field names, creating maintenance burden. 💡 Suggestion Extract a helper function like logToolActivity(name, args, resultBrief, step) to reduce duplication and ensure consistent formatting.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4800-4836`	Iteration limit fallback handler has high cyclomatic complexity with nested conditionals for code-searcher vs regular agent paths. The logic for building toolTimeline and assistantTexts arrays is intertwined with the summary generation. 💡 Suggestion Extract the tool timeline and assistant text collection into separate helper functions. Consider using a strategy pattern for code-searcher vs regular agent summary generation.
🟡 Warning	`npm/tests/unit/code-searcher-iteration-limit.test.js:87`	Test assertion checks for 'PROGRESS REPORT' string which is implementation detail rather than behavioral requirement. The test would break if the wording changes even if functionality remains correct. 💡 Suggestion Test for the structural requirement (e.g., that the message contains sections like 'Task', 'Completed Work', 'Key Findings') rather than the exact phrase 'PROGRESS REPORT'.
🟡 Warning	`npm/tests/unit/graceful-timeout.test.js:123`	Test assertion checks for 'PROGRESS REPORT' string which is an implementation detail. The test is brittle and would fail if the prompt wording changes. 💡 Suggestion Test for the functional requirement that the message instructs the AI to produce a structured report with specific sections, rather than checking for the exact phrase.
🟡 Warning	`npm/tests/unit/negotiated-timeout.test.js:448`	Test assertion checks for 'PROGRESS REPORT' string which is an implementation detail. This makes the test fragile to wording changes. 💡 Suggestion Test for the presence of key sections (Task, Completed Work, Key Findings, etc.) rather than the exact 'PROGRESS REPORT' phrase.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4217-4232`	No tests verify the behavior when toolResults array is shorter than toolCalls array (index out of bounds scenario). The code accesses toolResults[i] without checking if it exists. 💡 Suggestion Add a test case where toolResults.length < toolCalls.length to verify the code handles this gracefully without throwing.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4800-4836`	No tests verify the behavior when _toolCallLog contains malformed entries (e.g., missing resultBrief, null args, or invalid step numbers). The code assumes well-formed data. 💡 Suggestion Add tests that verify graceful handling of corrupted _toolCallLog entries, including null/undefined values and missing properties.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4102-4109`	Tool activity log truncates argStr to 200 characters and resultBrief to 150 characters without indicating truncation. Users may not know that critical information was cut off. 💡 Suggestion Add an ellipsis (...) indicator when content is truncated, or consider logging a warning when truncation occurs.
🟡 Warning	`npm/src/agent/ProbeAgent.js:4227-4232`	Result brief truncated to 500 characters and assistant text to 1000 characters without truncation indicators. Critical information may be silently dropped. 💡 Suggestion Add '...' suffix when truncating to indicate data was cut off, or log the truncation for debugging.

Powered by Visor from Probelabs

Last updated: 2026-03-27T21:42:30.338Z | Triggered by: pr_opened | Commit: 3157d21

💡 TIP: You can chat with Visor using /visor ask <your question>

buger merged commit 88434b1 into main Mar 27, 2026
13 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: structured progress reports on agent iteration/time limits#548

feat: structured progress reports on agent iteration/time limits#548
buger merged 1 commit intomainfrom
feat/agent-progress-report-on-limit

buger commented Mar 27, 2026

Uh oh!

Uh oh!

probelabs Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

probelabs Bot commented Mar 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

buger commented Mar 27, 2026

Summary

Test plan

Uh oh!

Uh oh!

probelabs Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Overview: Structured Progress Reports on Agent Iteration/Time Limits

Summary

Key Changes

1. Enhanced Tool Call Logging (_toolCallLog)

2. Structured Progress Report Prompts

A. Last Iteration Warning (prepareStep)

B. Graceful Timeout Wind-Down

C. Negotiated Timeout Summary

3. Enhanced Fallback Report on Max Iterations

4. Test Updates

Architecture & Impact Assessment

What This PR Accomplishes

Key Technical Changes

Affected System Components

Component Relationships

Scope Discovery & Context Expansion

Direct Impact

Related Components (Inferred)

Potential Follow-Up Areas

Files Changed Analysis

Notable Patterns

References

Code Locations

Related Documentation

Uh oh!

probelabs Bot commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Security Issues (7)

Performance Issues (1)

Quality Issues (13)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

probelabs Bot commented Mar 27, 2026 •

edited

Loading

1. Enhanced Tool Call Logging (`_toolCallLog`)

A. Last Iteration Warning (`prepareStep`)

probelabs Bot commented Mar 27, 2026 •

edited

Loading