Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What This Is

Agent Kernel is a framework-agnostic foundation for AI agent systems. It separates reasoning (LLM planning) from execution (deterministic tool running), with strict schema contracts, pluggable storage backends, and immutable audit trails. Python 3.11+, local-first (SQLite default, PostgreSQL optional).

## Commands

```bash
make install-dev # Install dev dependencies (uses uv)
make lint # Ruff linting
make format # Ruff format + autofix
make typecheck # mypy src/
make test # All tests
make test-unit # Unit tests only
make test-cov # Tests with coverage report

# Single test file or specific test:
pytest tests/unit/engine/test_thinking_policy.py -v
pytest tests/unit/engine/test_thinking_policy.py::test_name -v
```

## Architecture

### Core Flow

```
ContextAssembler → AgentEngine.propose() → Plan → DeterministicExecutor → ToolBroker → TraceStore
```

1. **Context Assembler** builds a `ContextPacket` from memory stores
2. **AgentEngine** (LLM) produces a `Plan` from context + `AgentProfile`
3. **Executor** validates the plan, gates approvals, executes actions
4. **Tool Broker** is the ONLY component that executes tools (via capability adapters)
5. **Trace Store** writes immutable audit trails for every decision

### Key Subsystems (`src/agent_kernel/`)

- **`core/schemas/`** — Pydantic models that define ALL data contracts. Schema version 1.1.3 with migration support. Everything flows through these models.
- **`core/config.py`** — Pydantic Settings, environment-driven. Database, vector store, LLM provider, tool broker settings.
- **`core/errors.py`** — Exception hierarchy rooted at `AgentKernelError`. Use these, don't invent new base exceptions.
- **`memory/`** — Pluggable stores: document (full-text), graph (nodes/edges), vector (embeddings), event log (append-only), entity, experience. Each has SQLite default + optional PostgreSQL backend.
- **`tools/`** — `CapabilityRegistry` loads YAML from `configs/capabilities/`. `ToolBroker` validates, executes, logs. Includes circuit breaker, rate limiter, adaptive timeout, idempotency.
- **`engine/`** — `AgentEngine` protocol: `async def propose(context_packet, agent_profile) -> Plan`. Pluggable via entry points. Thinking policy controller with 4-tier escalation (routing → standard → deep → deep+critic).
- **`executor/`** — `DeterministicExecutor` + `ApprovalGate` + `QualityGateRunner`. Plans are validated before execution.
- **`workflows/`** — YAML-defined workflow specs compiled to runners. `WorkflowRunner`: `async def run(workflow_id, intent, **kwargs) -> WorkflowResult`.
- **`tracing/`** — Immutable trace sinks (SQLite, JSONL). Canonical audit trail.

### Non-Negotiable Rules

1. **Schema-first**: All data crosses boundaries as Pydantic models. New data = new schema first.
2. **Tool Broker is sacred**: No tool execution outside the broker. Ever.
3. **Engines don't call tools**: They propose plans. The executor calls tools via the broker.
4. **Traces are portable**: Never depend on framework-specific logging.
5. **Framework agnosticism**: Orchestration frameworks (LangGraph, etc.) are optional adapters. They consume kernel schemas, call the Tool Broker, and never bypass the executor.

### Thinking Policy (Adaptive Reasoning)

Default to standard tier. Escalate on evidence (quality gate failures, low confidence, high risk), not predictions. See `configs/thinking_tiers.yaml` and `docs/design/11-thinking-policy.md`.

## Conventions

- **Line length**: 88 chars (ruff)
- **Linting**: Strict ruff ruleset (E, F, B, W, I, N, UP, ANN, S, C4, DTZ, etc.). Tests exempt from ANN/S101/PLR2004.
- **Async**: Use `async/await` for I/O operations. `pytest-asyncio` with `asyncio_mode = "auto"`.
- **IDs**: ULID-based (`core/ids.py`). Generate once, never regenerate.
- **Config**: Environment variables via pydantic-settings, never hardcoded.
- **Capabilities**: Defined as YAML in `configs/capabilities/`, implemented as adapters in `tools/adapters/`.
- **Entry points**: Engines, stores are pluggable via `pyproject.toml` entry points.

## Adding New Components

- **New schema**: Define in `core/schemas/`, export in `__init__.py`, add tests in `tests/unit/core/schemas/`.
- **New capability**: YAML in `configs/capabilities/`, adapter in `tools/adapters/`, register in capability registry.
- **New engine**: Implement `AgentEngine` protocol (`propose(context_packet, agent_profile) -> Plan`). Must NOT call tools directly.
- **New store backend**: Implement the store protocol, register via entry point in `pyproject.toml`.

## Design Docs

Detailed specs live in `docs/design/` (00-overview through 25-cli-first-evaluation). Read the relevant doc before modifying a subsystem.
24 changes: 24 additions & 0 deletions configs/agents/default.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
agent_profile_id: default
name: Default Agent
description: General purpose agent for routine tasks
engine: custom
llm_config:
provider: openai
model: gpt-4o
temperature: 0.3
max_tokens: 4096
prompt_config:
format: markdown
enable_toon: true
allowed_capabilities:
- obsidian.*
- tasks.*
- notes.*
context_policy:
must_cite: false
approval_policy:
require_approval_for: []
auto_approve_side_effects:
- none
- read
max_auto_approve_risk: medium
36 changes: 31 additions & 5 deletions src/agent_kernel/cli/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ class OutputFormat(str, Enum):
JSON = "json"

from agent_kernel.context.assembler import ContextAssembler
from agent_kernel.context_graph.hooks import ContextGraphHooks
from agent_kernel.core.config import Settings, get_settings
from agent_kernel.engine.cost_anomaly import CostAnomalyDetector
from agent_kernel.engine.custom_engine import CustomEngine
Expand Down Expand Up @@ -312,7 +313,7 @@ async def _run_workflow_async(
enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
timeout_manager=timeout_manager,
)
register_builtin_tools(broker)
register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
_configure_library_tools(broker, settings.configs_dir)
_configure_skill_scripts(broker, registry, settings)
await _configure_mcp_adapter(broker, settings.configs_dir)
Expand Down Expand Up @@ -377,6 +378,10 @@ async def _run_workflow_async(
)

# Set up workflow runner with persistent store
context_graph_hooks = ContextGraphHooks(
graph_store=graph_store,
event_log=event_log,
)
workflow_store = SQLiteWorkflowRunStore(data_dir / "workflows" / "workflows.db")
runner = WorkflowRunner(
context_assembler=assembler,
Expand All @@ -387,6 +392,7 @@ async def _run_workflow_async(
trace_store=multi_trace_store,
cost_anomaly_detector=cost_anomaly_detector,
experience_miner=experience_miner,
context_graph_hooks=context_graph_hooks,
)

engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
Expand Down Expand Up @@ -1127,7 +1133,7 @@ async def _resume_workflow_async(
enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
timeout_manager=timeout_manager,
)
register_builtin_tools(broker)
register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
_configure_library_tools(broker, settings.configs_dir)
_configure_skill_scripts(broker, registry, settings)
await _configure_mcp_adapter(broker, settings.configs_dir)
Expand Down Expand Up @@ -1183,6 +1189,10 @@ async def _resume_workflow_async(
llm_service=llm_service,
)

context_graph_hooks = ContextGraphHooks(
graph_store=graph_store,
event_log=event_log,
)
runner = WorkflowRunner(
context_assembler=assembler,
executor=executor,
Expand All @@ -1192,6 +1202,7 @@ async def _resume_workflow_async(
trace_store=multi_trace_store,
cost_anomaly_detector=cost_anomaly_detector,
experience_miner=experience_miner,
context_graph_hooks=context_graph_hooks,
)

engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
Expand Down Expand Up @@ -2745,7 +2756,7 @@ async def _run() -> None:
enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
timeout_manager=timeout_manager,
)
register_builtin_tools(broker, doc_store, graph_store)
register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
_configure_library_tools(broker, settings.configs_dir)
_configure_skill_scripts(broker, registry, settings)
await _configure_mcp_adapter(broker, settings.configs_dir)
Expand Down Expand Up @@ -2800,6 +2811,10 @@ async def _run() -> None:
llm_service=llm_service,
)

context_graph_hooks = ContextGraphHooks(
graph_store=graph_store,
event_log=event_log,
)
workflow_store = SQLiteWorkflowRunStore(data_dir / "workflows" / "workflows.db")
runner = WorkflowRunner(
context_assembler=assembler,
Expand All @@ -2810,6 +2825,7 @@ async def _run() -> None:
workflow_store=workflow_store,
cost_anomaly_detector=cost_anomaly_detector,
experience_miner=experience_miner,
context_graph_hooks=context_graph_hooks,
)

engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
Expand Down Expand Up @@ -3829,7 +3845,7 @@ def serve(
enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
timeout_manager=timeout_manager,
)
register_builtin_tools(broker)
register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
_configure_library_tools(broker, settings.configs_dir)
_configure_skill_scripts(broker, registry, settings)

Expand Down Expand Up @@ -3869,6 +3885,10 @@ def serve(
event_log=event_log,
trace_store=multi_trace_store,
)
context_graph_hooks = ContextGraphHooks(
graph_store=graph_store,
event_log=event_log,
)

workflow_runner = WorkflowRunner(
context_assembler=assembler,
Expand All @@ -3878,6 +3898,7 @@ def serve(
workflow_store=workflow_store,
trace_store=multi_trace_store,
cost_anomaly_detector=cost_anomaly_detector,
context_graph_hooks=context_graph_hooks,
)

engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
Expand Down Expand Up @@ -4314,7 +4335,7 @@ async def _scheduler_start_async(
enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
timeout_manager=timeout_manager,
)
register_builtin_tools(broker)
register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
_configure_library_tools(broker, settings.configs_dir)
_configure_skill_scripts(broker, registry, settings)
await _configure_mcp_adapter(broker, settings.configs_dir)
Expand Down Expand Up @@ -4364,6 +4385,10 @@ async def _scheduler_start_async(
llm_service=llm_service,
)

context_graph_hooks = ContextGraphHooks(
graph_store=graph_store,
event_log=event_log,
)
workflow_store = SQLiteWorkflowRunStore(data_dir / "workflows" / "workflows.db")
runner = WorkflowRunner(
context_assembler=assembler,
Expand All @@ -4374,6 +4399,7 @@ async def _scheduler_start_async(
trace_store=multi_trace_store,
cost_anomaly_detector=cost_anomaly_detector,
experience_miner=experience_miner,
context_graph_hooks=context_graph_hooks,
)

engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
Expand Down
11 changes: 8 additions & 3 deletions src/agent_kernel/context/assembler.py
Original file line number Diff line number Diff line change
Expand Up @@ -580,16 +580,19 @@ def _search_documents(

items = []
for doc in results:
doc_metadata = doc.get("metadata", {})
ref = ContextRef(
ref_type=RefType.DOCUMENT,
ref_id=doc["doc_id"],
hash=self._content_hash(doc.get("content", "")),
metadata=doc.get("metadata", {}),
metadata=doc_metadata,
)
base_score = abs(doc.get("rank", 0))
importance = float(doc_metadata.get("auto_importance", 0.0))
item = ContextItem(
ref=ref,
excerpt=doc.get("content", "")[:500],
relevance_score=abs(doc.get("rank", 0)),
relevance_score=base_score * (1.0 + importance),
included_reason="keyword_search",
)
items.append(item)
Expand Down Expand Up @@ -631,10 +634,12 @@ def _search_vectors(
ref_id=result["item_id"],
metadata=metadata,
)
base_score = result.get("score", 0)
importance = float(metadata.get("auto_importance", 0.0))
item = ContextItem(
ref=ref,
excerpt=metadata.get("excerpt", ""),
relevance_score=result.get("score", 0),
relevance_score=base_score * (1.0 + importance),
included_reason="semantic_search",
)
items.append(item)
Expand Down
3 changes: 3 additions & 0 deletions src/agent_kernel/core/schemas/graph.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,9 @@ class NodeType(str, Enum):
LABEL = "label" # Semantic label for tasks
SECTION = "section" # Section within a project

# v1.0.7: File ingestion
FILE = "file" # Non-text file (pointer-only storage)

# v1.0.6: Business knowledge (semantic memory)
DOMAIN = "domain" # Business domain
SYSTEM = "system" # Technical/business system
Expand Down
57 changes: 40 additions & 17 deletions src/agent_kernel/engine/custom_engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,9 +44,9 @@
{capabilities_section}

Important rules:
- Only use capabilities from the allowed list above
- Only use capabilities from the allowed list above, using the EXACT name including the @version suffix (e.g., "tasks.create@v1")
- Use ONLY the parameter names shown in each capability's schema - do NOT invent parameters
- Always cite context items you reference (evidence only)
- You MUST include context_refs_used in your response with ref_type and ref_id from the provided context items
- Be specific about action arguments
- Keep summaries concise (1-5 sentences)
- Mark external writes as requiring approval when appropriate
Expand Down Expand Up @@ -412,28 +412,51 @@ def _ensure_list(value: Any) -> list[Any]:
if isinstance(ref_data, dict):
context_refs.append(ContextRef(**ref_data))
if not context_refs and agent_profile.context_policy.must_cite:
from agent_kernel.prompting.system_prompts import is_system_prompt_ref

evidence_refs = [
item.ref
for item in context_packet.items
if not is_system_prompt_ref(item.ref)
]
if evidence_refs:
context_refs = evidence_refs[:3]
logger.warning(
"plan_missing_citations",
intent=context_packet.intent[:100],
context_items=len(context_packet.items),
msg="LLM did not produce context_refs_used; quality gates will catch this",
)

# Build actions
actions = []
for action_data in _ensure_list(data.get("actions", [])):
if isinstance(action_data, dict):
capability_name = (
action_data.get("capability_name")
or action_data.get("capability")
or action_data.get("tool")
or action_data.get("name")
)
capability_name = action_data.get("capability_name")
if not capability_name:
# Log when LLM uses wrong key names
alt_keys = [k for k in ("capability", "tool", "name") if k in action_data]
if alt_keys:
logger.warning(
"action_used_wrong_key",
keys_found=alt_keys,
msg="LLM used wrong key for capability_name; action skipped",
)
continue
# Validate capability name format
if "@" not in capability_name:
# Try to find matching capability in allowed list
matched = None
for allowed in agent_profile.allowed_capabilities:
base = allowed.split("@")[0]
if base == capability_name or base.endswith(capability_name):
matched = allowed
break
if matched:
logger.info(
"capability_name_normalized",
original=capability_name,
normalized=matched,
)
capability_name = matched
else:
logger.warning(
"invalid_capability_name",
capability_name=capability_name,
msg="Capability name missing @version suffix; action skipped",
)
continue
# Parse side_effect - handle string descriptions by defaulting to NONE
side_effect_str = action_data.get("side_effect", "none")
if isinstance(side_effect_str, str):
Expand Down
Loading
Loading