A thinking-first agent framework that inserts a thinking layer between goal and plan, generating three ranked execution paths before any work begins.
Every agent framework does this:
Goal → Plan → Execute → Output
The plan is a single path. If it fails, you get retries. If the approach was wrong, you get a faster wrong answer.
Nobody stops to ask: is the plan itself any good?
XYGLUE inserts a thinking layer before planning. Three agents analyze the task, classify what's actually required vs. what's just convention, and generate three alternative paths — ranked from most creative to most safe.
┌─────────────────────────────────────────┐
│ THINKING LAYER │
│ │
│ ┌─────────┐ Every task has a │
Goal ────►│ │ Scout │ "standard" way. │
│ └────┬────┘ Scout maps it. │
│ │ │
│ ┌────▼──────┐ Which parts are physics │
│ │ Physicist │ (must do) vs habit │
│ └────┬──────┘ (just convention)? │
│ │ │
│ ┌────▼──────┐ Three paths: │
│ │ Architect │ A = break all habits │
│ └──┬──┬──┬──┘ B = break some │
│ │ │ │ C = optimized standard │
└─────┼──┼──┼─────────────────────────────┘
│ │ │
┌───────────────┼──┼──┼──────────────────────┐
│ EXECUTION LAYER │
│ │ │ │ │
│ Path A B C (with fallthrough) │
│ │ │ │ │
│ Self-healing pipeline │
│ 20 seeded failure patterns │
│ Checkpointing + rollback │
└───────────────┼──┼──┼──────────────────────┘
│ │ │
┌───────────────┼──┼──┼──────────────────────┐
│ CAPTURE LAYER │
│ │
│ Ground Tracker — records every run │
│ Template Library — quality ≥ 0.95, │
│ proven 2-3×, single failure resets │
└─────────────────────────────────────────────┘
│
▼
Brief
The key insight: Most of what developers assume is "required" is actually just convention. A REST API doesn't need Express. Auth doesn't need JWT. XYGLUE's Physicist separates what's actually constrained (physics) from what's just popular (habit) — then the Architect generates paths that break the habits.
Path A breaks everything. Path C plays it safe. Path B is the pragmatic middle. The engine runs A first, and if it fails, self-heals and falls through to B, then C. You get the upside of creative approaches with the safety net of conventional ones.
pip install xyglueFor AI-powered thinking (optional):
pip install xyglue[ai]import asyncio
from xyglue import Engine
async def main():
engine = Engine()
result = await engine.run("Build a REST API for user management", mode="auto")
print(result.brief.format())
asyncio.run(main())✅ Complete. Path A succeeded.
🔍 Scout benchmark: web. Standard: code, database.
2 habits broken. 0 physics respected.
No API key needed — every AI component has a rule-based fallback that works out of the box.
| Mode | What happens |
|---|---|
semi (default) |
Jarvis-style. Shows you the Brief with all three paths. You pick one. |
auto |
Full autonomous. Runs Path A → heals → falls through to B → C. |
# Semi: review before executing
engine = Engine()
result = await engine.run("Build a REST API")
print(result.brief.format()) # See the thinking
final = await engine.approve(result, path="A")
# Auto: just run it
result = await engine.run("Build a REST API", mode="auto")| Control | Options | What it does |
|---|---|---|
mode |
semi, auto |
Pause for approval or run autonomously |
divergence |
low, medium, high |
How aggressively Path A breaks from convention |
learning |
supervised, auto |
Whether proven templates auto-promote or need your approval |
engine = Engine(
api_key="sk-ant-...", # Optional: AI-powered thinking
mode="semi",
divergence="high",
learning="supervised",
agents={ # Custom agent names (shown in Brief)
"scout": "Recon",
"physicist": "Newton",
"architect": "Blueprint",
},
)Three agents run in sequence before any execution happens:
Maps the benchmark — how is this task typically done? What tools, patterns, and sequence does everyone use? Also identifies cross-domain components that could apply but typically don't.
Classifies every element the Scout found as either:
- Physics — can't change. Auth is required. TCP handshakes happen. Rate limits exist.
- Habit — just convention. REST is a choice. Express is a choice. PostgreSQL is a choice.
The Physicist is aggressive. Most things developers assume are necessary are just conventional.
Generates three paths from the classified map:
| Path | Strategy | Risk | Confidence |
|---|---|---|---|
| A | Maximum divergence. Break every habit. Pull cross-domain solutions. | High | Lower |
| B | Informed hybrid. Break habits selectively where the payoff is clear. | Medium | Medium |
| C | Optimized benchmark. The conventional approach, but enhanced. The floor. | Low | Higher |
When a step fails during execution, XYGLUE doesn't just retry blindly:
Failure → Detect → Classify → Match Pattern → Apply Fix → Retry
- Detector classifies the error (timeout, auth, rate limit, network, etc.)
- Pattern Library checks 20 seeded patterns for a known fix (fast path)
- Diagnoser runs rule-based or AI diagnosis if no pattern matches (slow path)
- Fix is applied and the step retries
Successful AI-diagnosed fixes get learned as new patterns. Patterns that fail lose confidence and eventually get evicted. The library improves over time.
Every run is recorded by the Ground Tracker — what was the benchmark, which constraints were classified, which path won, what failed and why.
The Template Library is intentionally strict:
- Quality threshold: 0.95 (non-negotiable)
- Must score 0.95+ on 2-3 separate runs to be promoted
- A single failure resets the validation counter
A library of 50 proven templates beats a library of 5,000 unvalidated ones.
from xyglue import Tool, ToolResult
class MyTool(Tool):
@property
def name(self) -> str:
return "my_tool"
async def execute(self, params: dict) -> ToolResult:
return ToolResult(success=True, output="done")
engine = Engine()
engine.register_tool(MyTool())xyglue/
__init__.py # Public API: Engine, RunResult, Brief, Tool, etc.
config.py # EngineConfig with all settings
engine.py # Main orchestrator (thinking → execution → capture)
sdk.py # Public Engine wrapper (one import, one class)
storage.py # SQLite persistence layer
thinking/ # Scout, Physicist, Architect agents
goal/ # Goal processing, normalization, entity extraction
plan/ # Plan models + checkpoint management
execute/ # Step executor with timeout, retry, fallback
heal/ # Failure detection → diagnosis → pattern match → fix
learn/ # Pattern library + 20 seed patterns
capture/ # Ground tracker + template library
brief/ # Brief output formatting
tools/ # Tool ABC, ToolResult, MockTool, ToolRegistry
- Python 3.11+
aiosqlite>=0.19.0(only required dependency)anthropic>=0.25.0(optional, for AI-powered thinking)
- Fork the repo and create a feature branch
- Install dev dependencies:
pip install -e ".[dev]" - Make changes
- Run checks:
pytest # 263 tests ruff check xyglue/ # Lint mypy xyglue/ # Type check
- Open a pull request
See CLAUDE.md for code conventions and how to add features.
MIT -- Copyright (c) 2026 Luke H