[Phase 4] Evaluate Task-Scoped Context vs. Agent-Level Context Tiers

## Summary

The current tiered memory system operates at the agent level, but context relevance is task-specific. Accumulating context across 10 tasks may carry irrelevant information from early tasks into later ones.

## Problem Analysis

**Current model** (hypothesis): 
- Worker agent has a context store with HOT/WARM/COLD tiers
- Context accumulates as worker handles multiple tasks in a session
- Tier promotion/demotion happens based on access patterns across all tasks

**The problem**:
- Task 3 involved debugging authentication. Worker loaded auth-related context into HOT tier.
- Task 10 involves building a data export feature. Auth context is still HOT (recent access).
- Worker's context window now has authentication details that are completely irrelevant to data export.

This is **cross-task contamination**: context that was relevant to a previous task pollutes the context for the current task.

## State of the Art Comparison

From the subagent pattern:
> "Each subagent gets a clean, isolated context window."

From 12-Factor Agents:
> "Small focused agents, less than 100 tools, less than 20 steps."

The emphasis is on *task-scoped* context, not *agent-scoped* context.

## Impact

1. **Relevance dilution**: Truly relevant context competes with stale context for attention
2. **Token waste**: Paying for context that provides no value to current task
3. **Potential confusion**: Model may reference old patterns/code that have since changed
4. **Tier gaming**: Frequently accessed across tasks ≠ relevant to current task

## Proposed Solutions

### Option A: Task-Scoped Tiers

Instead of agent-level tiers, maintain per-task context:
```
Worker
├── Project Context (shared, stable)
│   └── Architecture, patterns, conventions
├── Task 10 Context (active)
│   └── HOT/WARM/COLD specific to task 10
└── Task Archive (compressed)
    └── Summaries of tasks 1-9
```

At task start: fresh HOT/WARM/COLD tiers
At task end: summarize, archive, clear

### Option B: Relevance Filtering

Keep agent-level tiers but filter by task relevance:
```python
def get_context_for_task(agent_context, current_task):
    relevance_scores = compute_relevance(agent_context, current_task.description)
    return agent_context.filter(relevance > THRESHOLD)
```

Each LLM call gets only context scored as relevant to *this* task.

### Option C: Hybrid Approach

- **Project tier**: Always included, never cleared (architecture, conventions)
- **Session tier**: Accumulates within session, cleared at session end
- **Task tier**: Fresh per task, aggressive management

## Investigation Tasks

1. **Measure cross-task contamination**
   - Sample worker contexts at task N vs. task N+5
   - Identify context items from old tasks still in HOT tier
   - Quantify % of context that's not relevant to current task

2. **Correlation analysis**
   - Does task success rate decrease as workers handle more tasks?
   - Does token usage increase over session duration?
   - Do later tasks show more references to irrelevant files/code?

3. **Prototype task-scoped context**
   - Implement Option A or C for one worker type
   - Compare task success rate and token usage
   - Measure any downsides (lost cross-task learning?)

4. **Define "project context" boundary**
   - What should persist across all tasks? (Architecture docs, coding conventions)
   - What should be task-local? (File contents, error messages, intermediate results)

## Success Criteria

- [ ] Quantified cross-task contamination rate
- [ ] Implemented task-scoped context management
- [ ] Measured improvement in later-task success rates
- [ ] Token usage doesn't grow linearly with tasks handled
- [ ] No regression in cross-task learning (project-level context preserved)

## Trade-offs to Consider

**Learning vs. Isolation**: Some cross-task context is valuable ("we fixed a similar bug in task 3"). The solution should preserve this while eliminating noise.

**Complexity vs. Benefit**: Task-scoped tiers add complexity. Need to measure if the benefit justifies the engineering cost.

**Summarization accuracy**: If archiving old tasks via summarization, summary quality matters. Bad summaries = lost learning.

## References

- [The Rise of Subagents](https://www.philschmid.de/the-rise-of-subagents) - Isolated context windows
- [Context Engineering](https://www.philschmid.de/context-engineering) - Right information at right time
- CodeFRAME tiered memory architecture - existing implementation to extend


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Phase 4] Evaluate Task-Scoped Context vs. Agent-Level Context Tiers #72

Summary

Problem Analysis

State of the Art Comparison

Impact

Proposed Solutions

Option A: Task-Scoped Tiers

Option B: Relevance Filtering

Option C: Hybrid Approach

Investigation Tasks

Success Criteria

Trade-offs to Consider

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Phase 4] Evaluate Task-Scoped Context vs. Agent-Level Context Tiers #72

Description

Summary

Problem Analysis

State of the Art Comparison

Impact

Proposed Solutions

Option A: Task-Scoped Tiers

Option B: Relevance Filtering

Option C: Hybrid Approach

Investigation Tasks

Success Criteria

Trade-offs to Consider

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions