ronsse · ronsse · Mar 10, 2026 · Mar 10, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,82 @@
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## What This Is
+
+Agent Kernel is a framework-agnostic foundation for AI agent systems. It separates reasoning (LLM planning) from execution (deterministic tool running), with strict schema contracts, pluggable storage backends, and immutable audit trails. Python 3.11+, local-first (SQLite default, PostgreSQL optional).
+
+## Commands
+
+```bash
+make install-dev      # Install dev dependencies (uses uv)
+make lint             # Ruff linting
+make format           # Ruff format + autofix
+make typecheck        # mypy src/
+make test             # All tests
+make test-unit        # Unit tests only
+make test-cov         # Tests with coverage report
+
+# Single test file or specific test:
+pytest tests/unit/engine/test_thinking_policy.py -v
+pytest tests/unit/engine/test_thinking_policy.py::test_name -v
+```
+
+## Architecture
+
+### Core Flow
+
+```
+ContextAssembler → AgentEngine.propose() → Plan → DeterministicExecutor → ToolBroker → TraceStore
+```
+
+1. **Context Assembler** builds a `ContextPacket` from memory stores
+2. **AgentEngine** (LLM) produces a `Plan` from context + `AgentProfile`
+3. **Executor** validates the plan, gates approvals, executes actions
+4. **Tool Broker** is the ONLY component that executes tools (via capability adapters)
+5. **Trace Store** writes immutable audit trails for every decision
+
+### Key Subsystems (`src/agent_kernel/`)
+
+- **`core/schemas/`** — Pydantic models that define ALL data contracts. Schema version 1.1.3 with migration support. Everything flows through these models.
+- **`core/config.py`** — Pydantic Settings, environment-driven. Database, vector store, LLM provider, tool broker settings.
+- **`core/errors.py`** — Exception hierarchy rooted at `AgentKernelError`. Use these, don't invent new base exceptions.
+- **`memory/`** — Pluggable stores: document (full-text), graph (nodes/edges), vector (embeddings), event log (append-only), entity, experience. Each has SQLite default + optional PostgreSQL backend.
+- **`tools/`** — `CapabilityRegistry` loads YAML from `configs/capabilities/`. `ToolBroker` validates, executes, logs. Includes circuit breaker, rate limiter, adaptive timeout, idempotency.
+- **`engine/`** — `AgentEngine` protocol: `async def propose(context_packet, agent_profile) -> Plan`. Pluggable via entry points. Thinking policy controller with 4-tier escalation (routing → standard → deep → deep+critic).
+- **`executor/`** — `DeterministicExecutor` + `ApprovalGate` + `QualityGateRunner`. Plans are validated before execution.
+- **`workflows/`** — YAML-defined workflow specs compiled to runners. `WorkflowRunner`: `async def run(workflow_id, intent, **kwargs) -> WorkflowResult`.
+- **`tracing/`** — Immutable trace sinks (SQLite, JSONL). Canonical audit trail.
+
+### Non-Negotiable Rules
+
+1. **Schema-first**: All data crosses boundaries as Pydantic models. New data = new schema first.
+2. **Tool Broker is sacred**: No tool execution outside the broker. Ever.
+3. **Engines don't call tools**: They propose plans. The executor calls tools via the broker.
+4. **Traces are portable**: Never depend on framework-specific logging.
+5. **Framework agnosticism**: Orchestration frameworks (LangGraph, etc.) are optional adapters. They consume kernel schemas, call the Tool Broker, and never bypass the executor.
+
+### Thinking Policy (Adaptive Reasoning)
+
+Default to standard tier. Escalate on evidence (quality gate failures, low confidence, high risk), not predictions. See `configs/thinking_tiers.yaml` and `docs/design/11-thinking-policy.md`.
+
+## Conventions
+
+- **Line length**: 88 chars (ruff)
+- **Linting**: Strict ruff ruleset (E, F, B, W, I, N, UP, ANN, S, C4, DTZ, etc.). Tests exempt from ANN/S101/PLR2004.
+- **Async**: Use `async/await` for I/O operations. `pytest-asyncio` with `asyncio_mode = "auto"`.
+- **IDs**: ULID-based (`core/ids.py`). Generate once, never regenerate.
+- **Config**: Environment variables via pydantic-settings, never hardcoded.
+- **Capabilities**: Defined as YAML in `configs/capabilities/`, implemented as adapters in `tools/adapters/`.
+- **Entry points**: Engines, stores are pluggable via `pyproject.toml` entry points.
+
+## Adding New Components
+
+- **New schema**: Define in `core/schemas/`, export in `__init__.py`, add tests in `tests/unit/core/schemas/`.
+- **New capability**: YAML in `configs/capabilities/`, adapter in `tools/adapters/`, register in capability registry.
+- **New engine**: Implement `AgentEngine` protocol (`propose(context_packet, agent_profile) -> Plan`). Must NOT call tools directly.
+- **New store backend**: Implement the store protocol, register via entry point in `pyproject.toml`.
+
+## Design Docs
+
+Detailed specs live in `docs/design/` (00-overview through 25-cli-first-evaluation). Read the relevant doc before modifying a subsystem.
diff --git a/configs/agents/default.yaml b/configs/agents/default.yaml
@@ -0,0 +1,24 @@
+agent_profile_id: default
+name: Default Agent
+description: General purpose agent for routine tasks
+engine: custom
+llm_config:
+  provider: openai
+  model: gpt-4o
+  temperature: 0.3
+  max_tokens: 4096
+prompt_config:
+  format: markdown
+  enable_toon: true
+allowed_capabilities:
+  - obsidian.*
+  - tasks.*
+  - notes.*
+context_policy:
+  must_cite: false
+approval_policy:
+  require_approval_for: []
+  auto_approve_side_effects:
+    - none
+    - read
+  max_auto_approve_risk: medium
diff --git a/src/agent_kernel/cli/main.py b/src/agent_kernel/cli/main.py
@@ -23,6 +23,7 @@ class OutputFormat(str, Enum):
     JSON = "json"
 
 from agent_kernel.context.assembler import ContextAssembler
+from agent_kernel.context_graph.hooks import ContextGraphHooks
 from agent_kernel.core.config import Settings, get_settings
 from agent_kernel.engine.cost_anomaly import CostAnomalyDetector
 from agent_kernel.engine.custom_engine import CustomEngine
@@ -312,7 +313,7 @@ async def _run_workflow_async(
         enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
         timeout_manager=timeout_manager,
     )
-    register_builtin_tools(broker)
+    register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
     _configure_library_tools(broker, settings.configs_dir)
     _configure_skill_scripts(broker, registry, settings)
     await _configure_mcp_adapter(broker, settings.configs_dir)
@@ -377,6 +378,10 @@ async def _run_workflow_async(
     )
 
     # Set up workflow runner with persistent store
+    context_graph_hooks = ContextGraphHooks(
+        graph_store=graph_store,
+        event_log=event_log,
+    )
     workflow_store = SQLiteWorkflowRunStore(data_dir / "workflows" / "workflows.db")
     runner = WorkflowRunner(
         context_assembler=assembler,
@@ -387,6 +392,7 @@ async def _run_workflow_async(
         trace_store=multi_trace_store,
         cost_anomaly_detector=cost_anomaly_detector,
         experience_miner=experience_miner,
+        context_graph_hooks=context_graph_hooks,
     )
 
     engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
@@ -1127,7 +1133,7 @@ async def _resume_workflow_async(
         enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
         timeout_manager=timeout_manager,
     )
-    register_builtin_tools(broker)
+    register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
     _configure_library_tools(broker, settings.configs_dir)
     _configure_skill_scripts(broker, registry, settings)
     await _configure_mcp_adapter(broker, settings.configs_dir)
@@ -1183,6 +1189,10 @@ async def _resume_workflow_async(
         llm_service=llm_service,
     )
 
+    context_graph_hooks = ContextGraphHooks(
+        graph_store=graph_store,
+        event_log=event_log,
+    )
     runner = WorkflowRunner(
         context_assembler=assembler,
         executor=executor,
@@ -1192,6 +1202,7 @@ async def _resume_workflow_async(
         trace_store=multi_trace_store,
         cost_anomaly_detector=cost_anomaly_detector,
         experience_miner=experience_miner,
+        context_graph_hooks=context_graph_hooks,
     )
 
     engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
@@ -2745,7 +2756,7 @@ async def _run() -> None:
                 enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
                 timeout_manager=timeout_manager,
             )
-            register_builtin_tools(broker, doc_store, graph_store)
+            register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
             _configure_library_tools(broker, settings.configs_dir)
             _configure_skill_scripts(broker, registry, settings)
             await _configure_mcp_adapter(broker, settings.configs_dir)
@@ -2800,6 +2811,10 @@ async def _run() -> None:
                 llm_service=llm_service,
             )
 
+            context_graph_hooks = ContextGraphHooks(
+                graph_store=graph_store,
+                event_log=event_log,
+            )
             workflow_store = SQLiteWorkflowRunStore(data_dir / "workflows" / "workflows.db")
             runner = WorkflowRunner(
                 context_assembler=assembler,
@@ -2810,6 +2825,7 @@ async def _run() -> None:
                 workflow_store=workflow_store,
                 cost_anomaly_detector=cost_anomaly_detector,
                 experience_miner=experience_miner,
+                context_graph_hooks=context_graph_hooks,
             )
 
             engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
@@ -3829,7 +3845,7 @@ def serve(
         enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
         timeout_manager=timeout_manager,
     )
-    register_builtin_tools(broker)
+    register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
     _configure_library_tools(broker, settings.configs_dir)
     _configure_skill_scripts(broker, registry, settings)
 
@@ -3869,6 +3885,10 @@ def serve(
             event_log=event_log,
             trace_store=multi_trace_store,
         )
+        context_graph_hooks = ContextGraphHooks(
+            graph_store=graph_store,
+            event_log=event_log,
+        )
 
         workflow_runner = WorkflowRunner(
             context_assembler=assembler,
@@ -3878,6 +3898,7 @@ def serve(
             workflow_store=workflow_store,
             trace_store=multi_trace_store,
             cost_anomaly_detector=cost_anomaly_detector,
+            context_graph_hooks=context_graph_hooks,
         )
 
         engine = CustomEngine(llm_service=llm_service, capability_registry=registry)
@@ -4314,7 +4335,7 @@ async def _scheduler_start_async(
         enable_circuit_breaker=settings.tool_broker_circuit_breaker_enabled,
         timeout_manager=timeout_manager,
     )
-    register_builtin_tools(broker)
+    register_builtin_tools(broker, graph_store=graph_store, event_log=event_log)
     _configure_library_tools(broker, settings.configs_dir)
     _configure_skill_scripts(broker, registry, settings)
     await _configure_mcp_adapter(broker, settings.configs_dir)
@@ -4364,6 +4385,10 @@ async def _scheduler_start_async(
         llm_service=llm_service,
     )
 
+    context_graph_hooks = ContextGraphHooks(
+        graph_store=graph_store,
+        event_log=event_log,
+    )
     workflow_store = SQLiteWorkflowRunStore(data_dir / "workflows" / "workflows.db")
     runner = WorkflowRunner(
         context_assembler=assembler,
@@ -4374,6 +4399,7 @@ async def _scheduler_start_async(
         trace_store=multi_trace_store,
         cost_anomaly_detector=cost_anomaly_detector,
         experience_miner=experience_miner,
+        context_graph_hooks=context_graph_hooks,
     )
 
     engine = CustomEngine(llm_service=llm_service, capability_registry=registry)

diff --git a/src/agent_kernel/context/assembler.py b/src/agent_kernel/context/assembler.py
@@ -580,16 +580,19 @@ def _search_documents(
 
         items = []
         for doc in results:
+            doc_metadata = doc.get("metadata", {})
             ref = ContextRef(
                 ref_type=RefType.DOCUMENT,
                 ref_id=doc["doc_id"],
                 hash=self._content_hash(doc.get("content", "")),
-                metadata=doc.get("metadata", {}),
+                metadata=doc_metadata,
             )
+            base_score = abs(doc.get("rank", 0))
+            importance = float(doc_metadata.get("auto_importance", 0.0))
             item = ContextItem(
                 ref=ref,
                 excerpt=doc.get("content", "")[:500],
-                relevance_score=abs(doc.get("rank", 0)),
+                relevance_score=base_score * (1.0 + importance),
                 included_reason="keyword_search",
             )
             items.append(item)
@@ -631,10 +634,12 @@ def _search_vectors(
                 ref_id=result["item_id"],
                 metadata=metadata,
             )
+            base_score = result.get("score", 0)
+            importance = float(metadata.get("auto_importance", 0.0))
             item = ContextItem(
                 ref=ref,
                 excerpt=metadata.get("excerpt", ""),
-                relevance_score=result.get("score", 0),
+                relevance_score=base_score * (1.0 + importance),
                 included_reason="semantic_search",
             )
             items.append(item)

diff --git a/src/agent_kernel/core/schemas/graph.py b/src/agent_kernel/core/schemas/graph.py
@@ -67,6 +67,9 @@ class NodeType(str, Enum):
     LABEL = "label"        # Semantic label for tasks
     SECTION = "section"    # Section within a project
 
+    # v1.0.7: File ingestion
+    FILE = "file"              # Non-text file (pointer-only storage)
+
     # v1.0.6: Business knowledge (semantic memory)
     DOMAIN = "domain"              # Business domain
     SYSTEM = "system"              # Technical/business system

diff --git a/src/agent_kernel/engine/custom_engine.py b/src/agent_kernel/engine/custom_engine.py
@@ -44,9 +44,9 @@
 {capabilities_section}
 
 Important rules:
-- Only use capabilities from the allowed list above
+- Only use capabilities from the allowed list above, using the EXACT name including the @version suffix (e.g., "tasks.create@v1")
 - Use ONLY the parameter names shown in each capability's schema - do NOT invent parameters
-- Always cite context items you reference (evidence only)
+- You MUST include context_refs_used in your response with ref_type and ref_id from the provided context items
 - Be specific about action arguments
 - Keep summaries concise (1-5 sentences)
 - Mark external writes as requiring approval when appropriate
@@ -412,28 +412,51 @@ def _ensure_list(value: Any) -> list[Any]:
                 if isinstance(ref_data, dict):
                     context_refs.append(ContextRef(**ref_data))
             if not context_refs and agent_profile.context_policy.must_cite:
-                from agent_kernel.prompting.system_prompts import is_system_prompt_ref
-
-                evidence_refs = [
-                    item.ref
-                    for item in context_packet.items
-                    if not is_system_prompt_ref(item.ref)
-                ]
-                if evidence_refs:
-                    context_refs = evidence_refs[:3]
+                logger.warning(
+                    "plan_missing_citations",
+                    intent=context_packet.intent[:100],
+                    context_items=len(context_packet.items),
+                    msg="LLM did not produce context_refs_used; quality gates will catch this",
+                )
 
             # Build actions
             actions = []
             for action_data in _ensure_list(data.get("actions", [])):
                 if isinstance(action_data, dict):
-                    capability_name = (
-                        action_data.get("capability_name")
-                        or action_data.get("capability")
-                        or action_data.get("tool")
-                        or action_data.get("name")
-                    )
+                    capability_name = action_data.get("capability_name")
                     if not capability_name:
+                        # Log when LLM uses wrong key names
+                        alt_keys = [k for k in ("capability", "tool", "name") if k in action_data]
+                        if alt_keys:
+                            logger.warning(
+                                "action_used_wrong_key",
+                                keys_found=alt_keys,
+                                msg="LLM used wrong key for capability_name; action skipped",
+                            )
                         continue
+                    # Validate capability name format
+                    if "@" not in capability_name:
+                        # Try to find matching capability in allowed list
+                        matched = None
+                        for allowed in agent_profile.allowed_capabilities:
+                            base = allowed.split("@")[0]
+                            if base == capability_name or base.endswith(capability_name):
+                                matched = allowed
+                                break
+                        if matched:
+                            logger.info(
+                                "capability_name_normalized",
+                                original=capability_name,
+                                normalized=matched,
+                            )
+                            capability_name = matched
+                        else:
+                            logger.warning(
+                                "invalid_capability_name",
+                                capability_name=capability_name,
+                                msg="Capability name missing @version suffix; action skipped",
+                            )
+                            continue
                     # Parse side_effect - handle string descriptions by defaulting to NONE
                     side_effect_str = action_data.get("side_effect", "none")
                     if isinstance(side_effect_str, str):