-
Notifications
You must be signed in to change notification settings - Fork 348
Description
Problem Statement
The FastAgent orchestrator accumulates all text content from every agent response (including tool calls, raw tool results, and intermediate processing steps) and passes this ever-growing context to each subsequent sub-agent. This design causes context to grow at O(N) for the orchestrator and O(N²) for workers with history enabled, leading to performance degradation, API token exhaustion, and system failures.
Evidence
1. Unbounded Text Accumulation
The orchestrator's _execute_step
method captures all agent output without filtering:
# src/mcp_agent/agents/workflow/orchestrator_agent.py, lines 343-347
result = await future_obj
result_text = result.all_text() # Extracts ALL text content
The all_text()
method concatenates everything:
- Direct LLM responses
- Tool call arguments and results (including large file contents, database dumps, etc.)
- Error messages and stack traces
- Any embedded resource text
2. Full Context Propagation
Every sub-agent receives the complete execution history:
# src/mcp_agent/agents/workflow/orchestrator_agent.py, lines 321-324
task_description = TASK_PROMPT_TEMPLATE.format(
objective=previous_result.objective,
task=task.description,
context=context # Contains ALL accumulated results from ALL previous steps
)
The format_plan_result()
function serializes the entire PlanResult
object, including all StepResult
and TaskWithResult
entries, into XML format without truncation.
3. Quadratic Growth with Worker History
Worker agents default to use_history=True
, causing them to retain their own message history. Since each message contains the full orchestrator context at that point:
- Iteration 1: Worker receives context of size X
- Iteration 2: Worker receives context of size 2X, but also retains the previous X in history
- Iteration N: Worker's effective prompt size ≈ Σ(k=1 to N) k*X = O(N²)
4. No Context Management
The codebase lacks any mechanisms for:
- Context size limits or token counting
- Result summarization or compression
- Selective context sharing
- Context window management
- Tool result filtering
Implications
1. Performance Degradation
- Each agent invocation takes progressively longer as context grows
- API latency increases non-linearly with workflow complexity
- Simple 5-step workflows can take 10x longer than necessary
2. Context Window Exhaustion
- Modern LLMs have context limits (8k-128k tokens)
- A single large tool result (e.g., reading a 50KB file) gets replicated in every subsequent prompt
- Workflows fail mid-execution when context exceeds model limits
3. Cost Explosion
- API costs scale with token usage
- A workflow with 5 steps and 3 iterations can consume 15x more tokens than the actual content requires
- Each worker processes redundant historical data
4. Quality Degradation
- Relevant task information gets buried in historical noise
- LLMs exhibit "lost in the middle" problems with large contexts
- Agents may focus on irrelevant details from unrelated previous steps
Reproducible Example
# Simple workflow that demonstrates the issue:
1. Agent A: Read a 10KB configuration file
2. Agent B: Analyze the configuration
3. Agent C: Generate a report
# Actual context sizes:
- Agent A receives: objective (0.1KB)
- Agent B receives: objective + Agent A's full output including raw file content (10.1KB)
- Agent C receives: objective + Agent A output + Agent B output (20.2KB+)
# If any agent is reused, their history compounds the problem quadratically
System Impact
This is not an edge case but a fundamental architectural issue that affects:
- Every orchestrator workflow (context always accumulates)
- Every multi-step plan (more steps = more accumulation)
- Every agent reuse (history compounds the problem)
- Every tool that returns substantial output (multiplies the impact)
The current design makes it effectively impossible to run complex, multi-step workflows with multiple iterations without hitting context limits or experiencing severe performance degradation.