shoom1 · shoom1 · Apr 19, 2026 · Apr 17, 2026 · Apr 17, 2026 · Apr 17, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -5,6 +5,32 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
+## [0.5.1] - 2026-04-18
+
+### Added
+- **Capability-based permission engine** (PR #72): Framework-independent engine that replaces the ConfirmationPlugin-based HITL. Tools declare capabilities (e.g. `filesystem.write(path=...)`, `http.read`, `shell.exec`) via `@register_tool(capabilities=...)`. Rules are evaluated from four sources — builtin defaults, user `~/.{app_name}/settings.json`, project `./.{app_name}/settings.json`, and in-memory session. When no rule matches, the user is prompted with `Allow once / for session / always (save to project) / Deny`; "always" grants persist into project settings.
+- **Matchers**: `PathMatcher` (with `**` glob), `URLMatcher`, `ShellMatcher`, and `StringGlobMatcher`.
+- **ADK `PermissionPlugin`**: capability gating for ADK agents.
+- **LangGraph `wrap_tool_for_permission`**: per-tool permission wrapping for LangGraph `ToolNode`.
+- **`permissions` and `permissions_enabled` settings** for runtime control.
+- **`EXEMPT` sentinel**: opts a tool out of the permission engine (used for backend state tools and ADK `transfer_to_agent`).
+- **Real BM25 backends**: `bm25s` (preferred) and `rank_bm25` fallback.
+
+### Changed
+- **`@register_tool`** now requires the `capabilities=` kwarg.
+- **Default grants broadened**: `filesystem.*` auto-extends to the parent directory; `memory.*` and `kb.*` are allowed by default.
+- **Ask prompt** now shows the directory-scope target.
+- **Workflow init**: `_ensure_managers_initialized` runs in a worker thread to avoid blocking the event loop.
+- **HITL confirmation** extracted into a backend-neutral module during the permissions migration.
+- **Memory tools**: deduped registry-bound and factory-bound entry points.
+
+### Fixed
+- Matcher canonicalize/matches preserves the `*` wildcard.
+- `KnowledgeBaseManager` concurrency contract tightened.
+
+### Removed
+- **`PermissionLevel`** (SAFE / CAUTION / DANGEROUS), `ConfirmationPlugin`, `_wrap_for_confirmation`, `hitl_tools`, and the `hitl_enabled` setting — superseded by the permission engine.
+
 ## [0.5.0] - 2026-04-17
 
 ### Added

diff --git a/CLAUDE.md b/CLAUDE.md
@@ -47,7 +47,7 @@ agentic-cli/
 │   │       ├── persistence/  # Checkpointers, stores
 │   │       └── tools/        # LangChain-compatible wrappers
 │   ├── tools/
-│   │   ├── registry.py       # ToolRegistry, @register_tool, ToolCategory, PermissionLevel
+│   │   ├── registry.py       # ToolRegistry, @register_tool, ToolCategory
 │   │   ├── executor.py       # SafePythonExecutor
 │   │   ├── knowledge_tools.py # kb_search, kb_ingest, kb_list, kb_read
 │   │   ├── arxiv_tools.py    # search_arxiv, fetch_arxiv_paper, analyze_arxiv_paper
@@ -62,7 +62,7 @@ agentic-cli/
 │   │   ├── memory_tools.py   # save_memory, search_memory + MemoryStore
 │   │   ├── planning_tools.py # save_plan, get_plan + PlanStore
 │   │   ├── task_tools.py     # save_tasks, get_tasks + TaskStore
-│   │   ├── hitl_tools.py     # request_approval + ApprovalManager, HITLConfig
+│   │   ├── reflection_tools.py # save_reflection + ToolReflectionStore
 │   │   ├── shell/            # 8-layer shell security
 │   │   └── webfetch/         # Fetcher, converter, validator, robots
 │   ├── knowledge_base/
@@ -114,6 +114,9 @@ Workflow:
 2. For features: create `feature/<name>` (or `fix/<name>` or `refactor/<name>`) from `develop`, work there, merge back to `develop`
 3. When ready to release: merge `develop` → `main` and tag the release
 
+### What NOT to commit
+- `docs/` is gitignored on purpose (see `.gitignore`). It is a scratchpad for review notes, plans, and internal analysis. **Never `git add docs/…` or suggest committing anything under `docs/`.** If a document belongs in the repo, it lives elsewhere (README, CHANGELOG, top-level `*.md`).
+
 ## Development Principles
 
 ### Code Style
@@ -130,7 +133,8 @@ Workflow:
 
 ### Key Design Patterns
 - **Tool error handling**: All tools return `{"success": bool, ...}` dicts. Never raise `ToolError`.
-- **Tool registration**: Use `@register_tool(category=..., permission_level=..., description=...)` decorator. Tools are auto-discovered via the global `ToolRegistry`.
+- **Tool registration**: Use `@register_tool(category=..., capabilities=..., description=...)` decorator. `capabilities=` is required — pass `EXEMPT` for tools that need no permission check or a list of `Capability(name, target_arg=...)` tuples the engine matches against rules. Tools are auto-discovered via the global `ToolRegistry`.
+- **Permissions**: `workflow/permissions/` holds a framework-independent engine that evaluates declared capabilities against rules from four sources (builtin, user `~/.{app_name}/settings.json`, project `./.{app_name}/settings.json`, in-memory session). ADK + LangGraph gate tool calls via `workflow/adk/permission_plugin.py::PermissionPlugin` and `workflow/langgraph/permission_wrap.py::wrap_tool_for_permission`. See `docs/superpowers/specs/2026-04-18-permissions-system-design.md`.
 - **Service registry**: Tools access services and shared state via `get_service(key)` from `workflow.service_registry`. A single ContextVar holds a `dict[str, Any]` set by the workflow manager during processing. Complex services (KBManager, SandboxManager, MemoryStore) are lazily created; simple state (plan string, task list) lives directly in the registry dict.
 - **Manager detection**: Tools decorated with `@requires("kb_manager")` etc. are scanned by `BaseWorkflowManager._detect_required_managers()` which lazily creates only the needed services.
 - **Atomic writes**: Use `atomic_write_json`/`atomic_write_text` from `persistence/_utils.py` for file persistence.

diff --git a/README.md b/README.md
@@ -38,7 +38,7 @@ Agentic CLI provides the core infrastructure for building interactive CLI applic
 ├─────────────────────────────┬───────────────────────────────────────┤
 │   GoogleADKWorkflowManager  │     LangGraphWorkflowManager          │
 │   (default)                 │     (optional: langgraph extra)       │
-│   + ConfirmationPlugin      │     + confirmation tool wrapper       │
+│   + PermissionPlugin        │     + wrap_tool_for_permission        │
 │   + LLMLoggingPlugin        │     + native ToolNode                 │
 │   + TaskProgressPlugin      │                                       │
 └─────────────────────────────┴───────────────────────────────────────┘
@@ -145,7 +145,7 @@ manager = GoogleADKWorkflowManager(
 ```
 
 ADK integrations:
-- **ConfirmationPlugin** — intercepts `DANGEROUS` tools and prompts the user for approval
+- **PermissionPlugin** — evaluates each tool's declared capabilities against the permission engine and prompts the user when no rule matches
 - **LLMLoggingPlugin** — structured logging of LLM requests/responses
 - **TaskProgressPlugin** — streams the task checklist into its own thinking box
 
@@ -171,7 +171,7 @@ Features:
 - **Explicit provider support**: Uses `langchain-google-genai` for Gemini (not VertexAI)
 - **Thinking mode**: Native support for Claude and Gemini thinking/reasoning
 - **Retry policies**: Automatic retry with exponential backoff
-- **Confirmation wrapper**: Wraps `DANGEROUS` tools to request HITL approval
+- **Permission wrapper**: Wraps each tool to gate execution through the permission engine
 - **Event streaming**: Real-time workflow events via `WorkflowEvent`
 
 Requires: `pip install agentic-cli[langgraph]`
@@ -186,7 +186,7 @@ Requires: `pip install agentic-cli[langgraph]`
 | State persistence | In-memory | Memory, PostgreSQL, or SQLite |
 | Thinking support | Native (Gemini) | Native (Claude & Gemini) |
 | Retry handling | Built-in | Built-in with backoff |
-| HITL confirmation | ConfirmationPlugin | Tool wrapper |
+| Permission gate | PermissionPlugin | wrap_tool_for_permission |
 | Context trimming | Native | Native |
 
 ### Auto-selection via Settings
@@ -320,11 +320,12 @@ configs = [coordinator, researcher, analyst]
 Tools are regular Python functions with type hints and docstrings. **All tools return `{"success": bool, ...}` dicts** — never raise exceptions.
 
 ```python
-from agentic_cli.tools import register_tool, ToolCategory, PermissionLevel
+from agentic_cli.tools import register_tool, ToolCategory
+from agentic_cli.workflow.permissions import Capability, EXEMPT
 
 @register_tool(
-    category=ToolCategory.DATA,
-    permission_level=PermissionLevel.SAFE,
+    category=ToolCategory.NETWORK,
+    capabilities=[Capability("http.read")],
     description="Search the database for matching records.",
 )
 def search_database(query: str, limit: int = 10) -> dict:
@@ -341,15 +342,14 @@ def search_database(query: str, limit: int = 10) -> dict:
     return {"success": True, "results": results, "count": len(results)}
 ```
 
-Registering via `@register_tool` is optional — you can pass raw callables into `AgentConfig.tools`. Registering gives the tool metadata for the registry, permission-aware HITL wrapping, and tool-summary formatting.
+Registering via `@register_tool` is optional — you can pass raw callables into `AgentConfig.tools`. Registering gives the tool metadata for the registry, capability-aware permission gating, and tool-summary formatting.
 
-### Permission Levels
+### Capabilities
 
-| Level | Behavior |
-|-------|----------|
-| `SAFE` | Runs silently |
-| `CAUTION` | Runs; result is surfaced prominently |
-| `DANGEROUS` | Intercepted by HITL — user must approve before execution |
+Tool access is gated by the **permission engine** (see the HITL section below). Each registered tool declares what it touches via `capabilities=`:
+
+- **`Capability("namespace.action", target_arg="...")`** — e.g. `Capability("filesystem.write", target_arg="path")`, `Capability("http.read", target_arg="url")`, `Capability("shell.exec", target_arg="command")`. The engine resolves `target_arg` against the actual tool-call arguments and matches against rules.
+- **`EXEMPT`** — opts the tool out of the engine entirely. Reserved for backend-internal tools (e.g. ADK `transfer_to_agent`, backend state tools).
 
 ### Built-in Tools
 
@@ -430,7 +430,7 @@ list_dir("src/", include_hidden=False)
 diff_compare(source_a="old.txt", source_b="new.txt")
 ```
 
-**WRITE tools (caution)**
+**WRITE tools**
 
 ```python
 from agentic_cli.tools import write_file, edit_file
@@ -534,15 +534,9 @@ Each tool keeps at most N reflections (FIFO eviction). Reflections can be inject
 
 #### HITL (Human-in-the-Loop)
 
-Two mechanisms:
-
-1. **Automatic confirmation for `DANGEROUS` tools** — handled by ADK's `ConfirmationPlugin` and LangGraph's tool wrapper. No code changes needed; just mark the tool `DANGEROUS`.
-2. **Explicit `request_approval` tool** — for domain-level checkpoints:
+Tool calls are gated by the **permission engine** (`workflow/permissions/`). Each tool declares a list of capabilities (e.g. `filesystem.write(path=...)`); the engine evaluates them against rules from four sources (builtin defaults, user `~/.{app_name}/settings.json`, project `./.{app_name}/settings.json`, in-memory session). When no rule matches, the user is prompted with `Allow once / Allow for session / Allow always (save to project) / Deny`. Always-grants persist into the project settings file so the next run picks them up automatically.
 
-   ```python
-   from agentic_cli.tools import hitl_tools
-   # request_approval(message, options) -> {"success": True, "choice": "..."}
-   ```
+See `docs/superpowers/specs/2026-04-18-permissions-system-design.md` for the full design.
 
 ## CLI Commands
 
@@ -718,15 +712,24 @@ agentic-cli/
 │   │   ├── adk/
 │   │   │   ├── manager.py                # GoogleADKWorkflowManager
 │   │   │   ├── event_processor.py
-│   │   │   ├── plugins.py                # ConfirmationPlugin, LLMLoggingPlugin
+│   │   │   ├── plugins.py                # LLMLoggingPlugin
+│   │   │   ├── permission_plugin.py      # PermissionPlugin (capability gating)
 │   │   │   └── task_progress_plugin.py
-│   │   └── langgraph/
-│   │       ├── manager.py                # LangGraphWorkflowManager
-│   │       ├── graph_builder.py
-│   │       ├── state.py
-│   │       └── persistence/              # Checkpointers and stores
+│   │   ├── langgraph/
+│   │   │   ├── manager.py                # LangGraphWorkflowManager
+│   │   │   ├── graph_builder.py
+│   │   │   ├── permission_wrap.py        # wrap_tool_for_permission
+│   │   │   ├── state.py
+│   │   │   └── persistence/              # Checkpointers and stores
+│   │   └── permissions/                  # Framework-independent permission engine
+│   │       ├── capabilities.py           # Capability, ResolvedCapability, EXEMPT
+│   │       ├── rules.py                  # Rule, Effect, RuleSource, CheckResult, AskScope
+│   │       ├── matchers.py               # Path/URL/Shell/StringGlob matchers
+│   │       ├── store.py                  # PermissionContext, BUILTIN_RULES, JSON load/save
+│   │       ├── prompt.py                 # build_request + parse_response
+│   │       └── engine.py                 # PermissionEngine
 │   ├── tools/
-│   │   ├── registry.py           # ToolRegistry, ToolCategory, PermissionLevel
+│   │   ├── registry.py           # ToolRegistry, ToolCategory, register_tool
 │   │   ├── factories.py          # Backend-aware tool factories
 │   │   ├── executor.py           # SafePythonExecutor
 │   │   ├── arxiv_tools.py        # search_arxiv, fetch_arxiv_paper, ingest_arxiv_paper
@@ -744,7 +747,6 @@ agentic-cli/
 │   │   ├── pdf_utils.py          # PDF text extraction helpers
 │   │   ├── memory_tools.py       # save/search/update/delete + MemoryStore
 │   │   ├── reflection_tools.py   # save_reflection + ToolReflectionStore
-│   │   ├── hitl_tools.py         # request_approval
 │   │   ├── _core/                # Shared planning/task logic
 │   │   │   ├── planning.py
 │   │   │   └── tasks.py

diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "agentic-cli"
-version = "0.5.0"
+version = "0.5.1"
 description = "A framework for building domain-specific agentic CLI applications"
 readme = "README.md"
 license = "MIT"
@@ -46,6 +46,8 @@ kb = [
     "torch>=2.2.0",
     "sentence-transformers>=2.0.0,<3.0.0",
     "faiss-cpu>=1.7.0",
+    "bm25s>=0.2.0",
+    "rank-bm25>=0.2.2",
 ]
 langgraph = [
     "langgraph>=0.2.0",

diff --git a/src/agentic_cli/__init__.py b/src/agentic_cli/__init__.py
@@ -92,4 +92,4 @@ def __getattr__(name: str):
     "CLISettingsMixin",
 ]
 
-__version__ = "0.5.0"
+__version__ = "0.5.1"
diff --git a/src/agentic_cli/knowledge_base/_bm25_backends.py b/src/agentic_cli/knowledge_base/_bm25_backends.py
@@ -0,0 +1,137 @@
+"""Real BM25 index backends (bm25s, rank_bm25).
+
+Both implement the same interface as MockBM25Index so create_bm25_index()
+can return any of them interchangeably. Tokenization is lowercase whitespace
+split to match MockBM25Index; the scoring model is the library's own BM25.
+
+Neither underlying library supports true incremental add/remove, so we keep
+tokenized docs + chunk_ids in memory and rebuild the model lazily on search.
+"""
+
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+from agentic_cli.file_utils import atomic_write_json
+
+
+def _tokenize(text: str) -> list[str]:
+    return text.lower().split()
+
+
+class _BM25BackendBase:
+    _INDEX_FILE: str = ""
+
+    def __init__(self):
+        self._chunk_ids: list[str] = []
+        self._tokenized: list[list[str]] = []
+        self._model = None
+
+    @property
+    def size(self) -> int:
+        return len(self._chunk_ids)
+
+    def add_documents(self, chunk_ids: list[str], texts: list[str]) -> None:
+        for cid, text in zip(chunk_ids, texts):
+            self._chunk_ids.append(cid)
+            self._tokenized.append(_tokenize(text))
+        self._model = None
+
+    def remove_documents(self, chunk_ids: list[str]) -> None:
+        remove_set = set(chunk_ids)
+        keep = [
+            (cid, toks)
+            for cid, toks in zip(self._chunk_ids, self._tokenized)
+            if cid not in remove_set
+        ]
+        if keep:
+            self._chunk_ids, self._tokenized = map(list, zip(*keep))
+        else:
+            self._chunk_ids, self._tokenized = [], []
+        self._model = None
+
+    def rebuild(self, chunk_ids: list[str], texts: list[str]) -> None:
+        self._chunk_ids = []
+        self._tokenized = []
+        self._model = None
+        self.add_documents(chunk_ids, texts)
+
+    def save(self, path: Path) -> None:
+        path.mkdir(parents=True, exist_ok=True)
+        atomic_write_json(
+            path / self._INDEX_FILE,
+            {"chunk_ids": self._chunk_ids, "tokenized": self._tokenized},
+        )
+
+    def load(self, path: Path) -> None:
+        index_path = path / self._INDEX_FILE
+        if not index_path.exists():
+            return
+        data = json.loads(index_path.read_text())
+        self._chunk_ids = data["chunk_ids"]
+        self._tokenized = data["tokenized"]
+        self._model = None
+
+
+class RankBM25Index(_BM25BackendBase):
+    """BM25 backed by rank_bm25.BM25Plus (pure Python).
+
+    BM25Plus rather than BM25Okapi because Okapi's IDF can go zero or
+    negative when a term appears in most of the corpus; BM25Plus adds a
+    delta offset that guarantees positive contributions on real matches.
+    """
+
+    _INDEX_FILE = "bm25_rank.json"
+
+    def search(self, query: str, top_k: int = 10) -> list[tuple[str, float]]:
+        if not self._chunk_ids:
+            return []
+        query_tokens = _tokenize(query)
+        if not query_tokens:
+            return []
+        query_set = set(query_tokens)
+        if self._model is None:
+            from rank_bm25 import BM25Plus
+
+            self._model = BM25Plus(self._tokenized)
+        scores = self._model.get_scores(query_tokens)
+        scored: list[tuple[str, float]] = []
+        for cid, doc_tokens, score in zip(
+            self._chunk_ids, self._tokenized, scores
+        ):
+            if query_set.intersection(doc_tokens):
+                scored.append((cid, float(score)))
+        scored.sort(key=lambda x: x[1], reverse=True)
+        return scored[:top_k]
+
+
+class BM25sIndex(_BM25BackendBase):
+    """BM25 backed by bm25s (NumPy/C-accelerated)."""
+
+    _INDEX_FILE = "bm25s_sidecar.json"
+
+    def _build_model(self) -> None:
+        import bm25s
+
+        model = bm25s.BM25()
+        model.index(self._tokenized, show_progress=False)
+        self._model = model
+
+    def search(self, query: str, top_k: int = 10) -> list[tuple[str, float]]:
+        if not self._chunk_ids:
+            return []
+        query_tokens = _tokenize(query)
+        if not query_tokens:
+            return []
+        if self._model is None:
+            self._build_model()
+        k = min(top_k, len(self._chunk_ids))
+        docs, scores = self._model.retrieve(
+            [query_tokens], k=k, show_progress=False
+        )
+        results: list[tuple[str, float]] = []
+        for idx, score in zip(docs[0], scores[0]):
+            if score > 0:
+                results.append((self._chunk_ids[int(idx)], float(score)))
+        return results