-
Notifications
You must be signed in to change notification settings - Fork 22
integration guides self improving workflow
Prerequisites: Tier 1: Bare MCP complete · Claude Code auto-memory enabled · Recommended: Tier 5: Output Styles for full orchestration context (not required)
Cross-references: Quick Start · API Reference · Workflow Guide · Output Styles
- Claude monitors its own MCP tool usage in real time and flags inefficiencies
- Tool friction, bugs, and optimization opportunities are logged as persistent MCP items
- Agent discipline violations are corrected via auto-memory — corrections apply in the next session
- Analysis reports surface patterns after every response involving MCP calls
- A continuous feedback loop: each session's learnings improve the next
Most agent workflows are open-loop: the agent uses tools, produces output, and forgets what worked and what did not. The self-improving workflow closes the loop — the agent actively observes its own tool interactions, detects inefficiencies, and corrects its behavior for future sessions.
This creates two feedback channels:
Session N Session N+1
───────── ───────────
Tool calls → observations detected Memory loaded → better tool calls
↓ ↑
MCP items (friction/bugs/optimization) Read observations, apply corrections
↓ ↑
Memory writes (discipline corrections) ──────────→
Channel 1: Tool observations → MCP items. When the agent detects a tool-side issue (API confusion, missing capability, optimization opportunity), it logs an MCP work item tagged agent-observation. These persist across sessions and can be reviewed, prioritized, and addressed as product improvements.
Channel 2: Agent discipline → auto-memory. When the agent detects a mistake in its own behavior (wrong query pattern, forgotten parameter, sequencing violation), it writes a correction to its persistent memory file. The next session loads this correction automatically.
The agent watches for patterns during MCP interactions:
Token waste patterns:
| Pattern | Detection signal | Preferred alternative |
|---|---|---|
| Over-fetching |
query_items(get) for a status check |
query_items(overview) — 85-90% fewer tokens |
| Missing batch | 3+ individual manage_items calls in one turn |
Use the items array for bulk creates |
| Redundant queries | Same entity queried twice in a turn | Cache the first result |
| Unfiltered search |
query_items(search) with no filters |
Add role, tags, or parentId to narrow |
| Note body waste |
query_notes(includeBody=true) when only keys/roles needed |
Use includeBody=false for metadata-only reads |
Friction patterns:
| Category | What to watch |
|---|---|
| Tool failures | MCP calls that return errors or unexpected empty results |
| Excessive round-trips | Workflows requiring 3+ sequential calls where 1-2 should suffice |
| API confusion | Parameter naming inconsistencies, unclear error messages |
| Silent failures | Operations that succeed but produce no useful effect |
| Missing capability | Gaps requiring workarounds |
When the agent detects an issue, it follows a three-step protocol:
Step 1 — Dedup check. Search for existing observations to avoid duplicates:
query_items(operation="search", tags="agent-observation", query="<topic keyword>")
Step 2 — Create (only if no match found):
manage_items(operation="create", items=[{
title: "[optimization] Use overview instead of get for status dashboards",
summary: "Detected query_items(get) used for a status check. query_items(overview) returns item summaries without note bodies — sufficient for dashboards and 85-90% cheaper.",
tags: "agent-observation,optimization",
priority: "low"
}])
No parentId — observations are standalone top-level items. Default priority="low" keeps them from competing with implementation items in get_next_item.
Step 3 — Fill the observation note:
manage_notes(operation="upsert", notes=[{
itemId: "<new-item-uuid>",
key: "observation-detail",
role: "queue",
body: "**Observed:** query_items(get) followed by query_notes(includeBody=true) to render a dashboard.\n**Expected:** query_items(overview) returns item summaries without note bodies — sufficient for dashboards.\n**Suggestion:** Use overview for any read that does not need note content."
}])
Every observation gets agent-observation as primary tag plus exactly one type tag:
| Type tag | When |
|---|---|
optimization |
Token waste, redundant queries, missed batch opportunities |
friction |
Confusing APIs, excessive round-trips, unclear errors |
bug |
Actual defects in tool behavior |
missing-capability |
Gaps requiring workarounds |
If an observation is immediately actionable — for example, a bug blocking current work — add action-item as an additional tag so it surfaces in dashboards.
The agent-observation note schema (in .taskorchestrator/config.yaml) gives observations a lightweight gate:
note_schemas:
agent-observation:
- key: observation-detail
role: queue
required: true
description: "Expected vs actual behavior and suggested improvement."
guidance: "Describe what you observed (tool call, parameters, result). State what you expected instead. Suggest a concrete improvement — API change, error message improvement, or new capability."One note, queue phase only. Logging should be fast — two MCP calls after the dedup check. Observations never advance past queue; they are reference items, not implementation work.
When the agent detects a mistake in its own orchestration behavior — not a tool issue, but an agent discipline issue — it writes a correction to persistent memory.
| Category | Example signal | Memory entry pattern |
|---|---|---|
| Data integrity | Used a truncated UUID | Always use full UUIDs from query responses |
| Delegation format | Subagent returned raw JSON instead of summary | Specify exact return format in delegation prompts |
| Sequencing | Dispatched agent before materializing items | Materialize all MCP items before dispatching agents |
| Missing parameter | Forgot model on Agent dispatch |
Always set model explicitly: haiku / sonnet / opus |
| Prompt deficiency | Subagent failed due to missing context | Record what context was needed |
This distinction determines where the correction goes:
| Issue type | Persistence | Who fixes it |
|---|---|---|
| Tool friction, bug, missing capability | MCP observation item | Product development |
| Agent discipline, sequencing, format | Auto-memory file | The agent (next session) |
- Detect the issue during the current session
- Check existing memory for similar coverage to avoid duplicates
- Write a concise correction: pattern + correct behavior
-
Report in the session output:
↳ [self-correction] Updated memory: always set model parameter on Agent dispatch
The agent appends an analysis block to the end of every response that involved MCP tool calls or subagent activity.
---
◆ **Analysis** — 2 MCP calls | clean
Or with an issue:
---
◆ **Analysis** — 3 MCP calls | over-fetch: used get+notes for status check
---
### ◆ Workflow Analysis
**MCP Call Efficiency**
↳ 12 calls, 2 batched, no redundant queries
**Friction Points** (0 this session)
↳ None detected
**Observations Logged** (1 new)
↳ [optimization] `a1b2c3d4` — batch dependency creation for linear chains
**Suggestions**
↳ Consider using query_items(overview) for the work-summary dashboard
Inline analysis uses the ↳ [analysis] prefix for real-time visibility during the response body. The end-of-response block is the aggregate summary.
The self-improving workflow can be implemented at several depths:
Add observation and self-correction instructions directly to your CLAUDE.md:
## Self-Improvement Protocol
After every MCP interaction, check:
1. Did any call return more data than needed? Log as agent-observation optimization.
2. Did any call fail unexpectedly? Log as agent-observation friction.
3. Did I repeat a mistake from a previous session? Check and update memory.
When logging observations, always dedup-check first:
query_items(operation="search", tags="agent-observation", query="<topic>")This approach adds the behavior without requiring a custom output style. The CLAUDE.md instructions are active in every session.
For deeper integration, create a custom output style with an analysis zone. Output styles have more behavioral influence than CLAUDE.md because they shape every response, not just tool usage.
Structure your output style in three zones:
- Zone 1 — Orchestration core: delegation rules, planning, phase transitions (mirrors the Workflow Orchestrator)
- Zone 2 — Extended orchestration: any workflow enhancements specific to your setup
- Zone 3 — Analysis layer: monitoring patterns, observation logging protocol, self-correction rules, reporting format
Place your output style in ~/.claude/output-styles/ and activate via .claude/settings.local.json:
{
"outputStyle": "My Custom Orchestrator"
}The workflow-analyst.md output style included with this project is an example of this pattern — Zone 1 mirrors the plugin's Workflow Orchestrator, Zone 2 adds parallel dispatch guidance, and Zone 3 contains the full analysis layer.
The lightest approach: add the agent-observation schema to .taskorchestrator/config.yaml and instruct the agent (via CLAUDE.md or output style) to log observations when it detects issues. No analysis reporting, no memory corrections — just persistent tracking.
Turn 1. Agent dispatches an implementation subagent without setting model:
Agent(prompt="Implement the search API...", isolation="worktree")
// model parameter missing — defaults to opus for sonnet-eligible work
Turn 2. Agent detects the issue when reviewing the subagent's token usage:
↳ [analysis] Delegation without model param — haiku-eligible work ran on opus
Turn 3. Agent checks memory — no existing correction for this pattern. Writes one:
Memory update: "Always set model parameter explicitly on Agent dispatch —
haiku for MCP bulk ops, sonnet for implementation, opus for architecture.
Omitting it wastes tokens by inheriting the orchestrator's model."
Agent logs an MCP observation:
manage_items(create, items=[{
title: "[optimization] Set model parameter on all Agent dispatches",
summary: "Omitting model causes sonnet-eligible work to run on opus. Always set model explicitly.",
tags: "agent-observation,optimization",
priority: "low"
}])
Next session. Memory loads the correction. Agent sets model="sonnet" on the first dispatch without being reminded.
After several sessions, run:
query_items(operation="search", tags="agent-observation")
Group by type tag to see patterns:
- Multiple
optimizationobservations pointing to the same tool → proposal for a new operation mode - Multiple
frictionobservations about the same parameter → documentation or error message improvement - A
bugobservation that recurs → escalate priority
The /session-retrospective skill can aggregate these with session-tracking notes to produce a structured improvement report.
Getting Started
Integration Guides
- Overview
- Bare MCP
- CLAUDE.md-Driven
- Note Schemas
- Plugin: Skills & Hooks
- Output Styles
- Self-Improving Workflow
Reference
Operations
Project