Skip to content

[Phase 4] Evaluate Task-Scoped Context vs. Agent-Level Context Tiers #72

@frankbria

Description

@frankbria

Summary

The current tiered memory system operates at the agent level, but context relevance is task-specific. Accumulating context across 10 tasks may carry irrelevant information from early tasks into later ones.

Problem Analysis

Current model (hypothesis):

  • Worker agent has a context store with HOT/WARM/COLD tiers
  • Context accumulates as worker handles multiple tasks in a session
  • Tier promotion/demotion happens based on access patterns across all tasks

The problem:

  • Task 3 involved debugging authentication. Worker loaded auth-related context into HOT tier.
  • Task 10 involves building a data export feature. Auth context is still HOT (recent access).
  • Worker's context window now has authentication details that are completely irrelevant to data export.

This is cross-task contamination: context that was relevant to a previous task pollutes the context for the current task.

State of the Art Comparison

From the subagent pattern:

"Each subagent gets a clean, isolated context window."

From 12-Factor Agents:

"Small focused agents, less than 100 tools, less than 20 steps."

The emphasis is on task-scoped context, not agent-scoped context.

Impact

  1. Relevance dilution: Truly relevant context competes with stale context for attention
  2. Token waste: Paying for context that provides no value to current task
  3. Potential confusion: Model may reference old patterns/code that have since changed
  4. Tier gaming: Frequently accessed across tasks ≠ relevant to current task

Proposed Solutions

Option A: Task-Scoped Tiers

Instead of agent-level tiers, maintain per-task context:

Worker
├── Project Context (shared, stable)
│   └── Architecture, patterns, conventions
├── Task 10 Context (active)
│   └── HOT/WARM/COLD specific to task 10
└── Task Archive (compressed)
    └── Summaries of tasks 1-9

At task start: fresh HOT/WARM/COLD tiers
At task end: summarize, archive, clear

Option B: Relevance Filtering

Keep agent-level tiers but filter by task relevance:

def get_context_for_task(agent_context, current_task):
    relevance_scores = compute_relevance(agent_context, current_task.description)
    return agent_context.filter(relevance > THRESHOLD)

Each LLM call gets only context scored as relevant to this task.

Option C: Hybrid Approach

  • Project tier: Always included, never cleared (architecture, conventions)
  • Session tier: Accumulates within session, cleared at session end
  • Task tier: Fresh per task, aggressive management

Investigation Tasks

  1. Measure cross-task contamination

    • Sample worker contexts at task N vs. task N+5
    • Identify context items from old tasks still in HOT tier
    • Quantify % of context that's not relevant to current task
  2. Correlation analysis

    • Does task success rate decrease as workers handle more tasks?
    • Does token usage increase over session duration?
    • Do later tasks show more references to irrelevant files/code?
  3. Prototype task-scoped context

    • Implement Option A or C for one worker type
    • Compare task success rate and token usage
    • Measure any downsides (lost cross-task learning?)
  4. Define "project context" boundary

    • What should persist across all tasks? (Architecture docs, coding conventions)
    • What should be task-local? (File contents, error messages, intermediate results)

Success Criteria

  • Quantified cross-task contamination rate
  • Implemented task-scoped context management
  • Measured improvement in later-task success rates
  • Token usage doesn't grow linearly with tasks handled
  • No regression in cross-task learning (project-level context preserved)

Trade-offs to Consider

Learning vs. Isolation: Some cross-task context is valuable ("we fixed a similar bug in task 3"). The solution should preserve this while eliminating noise.

Complexity vs. Benefit: Task-scoped tiers add complexity. Need to measure if the benefit justifies the engineering cost.

Summarization accuracy: If archiving old tasks via summarization, summary quality matters. Bad summaries = lost learning.

References

Metadata

Metadata

Assignees

Labels

FutureDeferred - beyond v1/v2 scope, consider for future versionsarchitectureSystem architecture and design patternscontext-engineeringContext window management and optimizationenhancementNew feature or requestphase-4Phase 4: Multi-Agent Coordinationpriority:medium

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions