Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
e8dfc74
feat(kb): implement real BM25 backends (bm25s, rank_bm25)
shoom1 Apr 17, 2026
de3e9a9
docs(claude): note that docs/ is gitignored and must not be committed
shoom1 Apr 17, 2026
3836622
refactor(workflow): extract HITL confirmation to backend-neutral module
shoom1 Apr 17, 2026
2ac3a09
refactor(tools/memory): dedupe registry-bound and factory-bound entry…
shoom1 Apr 17, 2026
caa57d1
fix(kb): tighten concurrency contract in KnowledgeBaseManager
shoom1 Apr 17, 2026
132ccaa
feat(permissions): scaffold permissions package
shoom1 Apr 18, 2026
6ef9dac
feat(permissions): Capability, ResolvedCapability, EXEMPT sentinel
shoom1 Apr 18, 2026
ae2b5e9
feat(permissions): Rule, Effect, RuleSource, AskScope, CheckResult
shoom1 Apr 18, 2026
163b3eb
feat(permissions): PermissionContext + variable substitution
shoom1 Apr 18, 2026
569b751
feat(permissions): Matcher protocol + StringGlobMatcher
shoom1 Apr 18, 2026
78114e5
feat(permissions): PathMatcher with ** glob support
shoom1 Apr 18, 2026
ffccb77
feat(permissions): URLMatcher for http.* capabilities
shoom1 Apr 18, 2026
8d2ad3d
feat(permissions): ShellMatcher for shell.* capabilities
shoom1 Apr 18, 2026
67ae761
feat(permissions): matcher registry + capability-name glob
shoom1 Apr 18, 2026
a4b2a3a
feat(permissions): BUILTIN_RULES default policy
shoom1 Apr 18, 2026
ecf8d96
feat(permissions): load_rules from settings.json permissions section
shoom1 Apr 18, 2026
cb83c61
feat(permissions): append_project_rule with atomic JSON merge
shoom1 Apr 18, 2026
6c2886d
feat(permissions): build_request + parse_response for ask dialog
shoom1 Apr 18, 2026
dab2191
feat(permissions): PermissionEngine init + rule loading + disabled sh…
shoom1 Apr 18, 2026
d0ae502
feat(permissions): engine rule-based allow/deny path
shoom1 Apr 18, 2026
d243545
feat(permissions): engine ask flow (once/session/always/deny)
shoom1 Apr 18, 2026
96a9af4
test(permissions): serialise concurrent ask prompts
shoom1 Apr 18, 2026
9a6f1bf
refactor(permissions): drop try/except scaffolding from __init__.py n…
shoom1 Apr 18, 2026
230c59b
feat(tools): add ToolDefinition.capabilities field + register_tool kw…
shoom1 Apr 18, 2026
2628cf7
feat(tools): declare capabilities on every registered tool
shoom1 Apr 18, 2026
a720c04
feat(permissions): ADK PermissionPlugin
shoom1 Apr 18, 2026
de905c0
feat(permissions): LangGraph wrap_tool_for_permission
shoom1 Apr 18, 2026
110ae22
feat(permissions): construct PermissionEngine in BaseWorkflowManager …
shoom1 Apr 18, 2026
2de1afc
feat(permissions): add permissions + permissions_enabled settings (st…
shoom1 Apr 18, 2026
2bf7950
feat(permissions): ADK manager uses PermissionPlugin
shoom1 Apr 18, 2026
353edc4
feat(permissions): LangGraph builder uses wrap_tool_for_permission
shoom1 Apr 18, 2026
5b2965b
refactor(permissions): delete retired code (PermissionLevel, Confirma…
shoom1 Apr 18, 2026
7c84fc6
docs(permissions): drop transient 'once Task N' phrasing from adapter…
shoom1 Apr 18, 2026
af1c551
docs(permissions): update CLAUDE.md + README.md to describe the new p…
shoom1 Apr 18, 2026
4e929bb
fix(permissions): preserve '*' wildcard in matcher canonicalize + mat…
shoom1 Apr 18, 2026
fc74e9a
fix(permissions): register backend state tools as EXEMPT
shoom1 Apr 18, 2026
0227b2d
feat(permissions): allow memory.* and kb.* by default; register ADK t…
shoom1 Apr 18, 2026
c1c9b77
perf(workflow): offload _ensure_managers_initialized to a worker thread
shoom1 Apr 18, 2026
73ca202
feat(permissions): broaden filesystem.* grants to the parent directory
shoom1 Apr 18, 2026
74c0db1
feat(permissions): show directory-scope target in ask prompt
shoom1 Apr 18, 2026
a07aca7
Merge pull request #72 from shoom1/feature/permissions-system
shoom1 Apr 18, 2026
fffdee1
Release v0.5.1
shoom1 Apr 19, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 26 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,32 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.5.1] - 2026-04-18

### Added
- **Capability-based permission engine** (PR #72): Framework-independent engine that replaces the ConfirmationPlugin-based HITL. Tools declare capabilities (e.g. `filesystem.write(path=...)`, `http.read`, `shell.exec`) via `@register_tool(capabilities=...)`. Rules are evaluated from four sources — builtin defaults, user `~/.{app_name}/settings.json`, project `./.{app_name}/settings.json`, and in-memory session. When no rule matches, the user is prompted with `Allow once / for session / always (save to project) / Deny`; "always" grants persist into project settings.
- **Matchers**: `PathMatcher` (with `**` glob), `URLMatcher`, `ShellMatcher`, and `StringGlobMatcher`.
- **ADK `PermissionPlugin`**: capability gating for ADK agents.
- **LangGraph `wrap_tool_for_permission`**: per-tool permission wrapping for LangGraph `ToolNode`.
- **`permissions` and `permissions_enabled` settings** for runtime control.
- **`EXEMPT` sentinel**: opts a tool out of the permission engine (used for backend state tools and ADK `transfer_to_agent`).
- **Real BM25 backends**: `bm25s` (preferred) and `rank_bm25` fallback.

### Changed
- **`@register_tool`** now requires the `capabilities=` kwarg.
- **Default grants broadened**: `filesystem.*` auto-extends to the parent directory; `memory.*` and `kb.*` are allowed by default.
- **Ask prompt** now shows the directory-scope target.
- **Workflow init**: `_ensure_managers_initialized` runs in a worker thread to avoid blocking the event loop.
- **HITL confirmation** extracted into a backend-neutral module during the permissions migration.
- **Memory tools**: deduped registry-bound and factory-bound entry points.

### Fixed
- Matcher canonicalize/matches preserves the `*` wildcard.
- `KnowledgeBaseManager` concurrency contract tightened.

### Removed
- **`PermissionLevel`** (SAFE / CAUTION / DANGEROUS), `ConfirmationPlugin`, `_wrap_for_confirmation`, `hitl_tools`, and the `hitl_enabled` setting — superseded by the permission engine.

## [0.5.0] - 2026-04-17

### Added
Expand Down
10 changes: 7 additions & 3 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ agentic-cli/
│ │ ├── persistence/ # Checkpointers, stores
│ │ └── tools/ # LangChain-compatible wrappers
│ ├── tools/
│ │ ├── registry.py # ToolRegistry, @register_tool, ToolCategory, PermissionLevel
│ │ ├── registry.py # ToolRegistry, @register_tool, ToolCategory
│ │ ├── executor.py # SafePythonExecutor
│ │ ├── knowledge_tools.py # kb_search, kb_ingest, kb_list, kb_read
│ │ ├── arxiv_tools.py # search_arxiv, fetch_arxiv_paper, analyze_arxiv_paper
Expand All @@ -62,7 +62,7 @@ agentic-cli/
│ │ ├── memory_tools.py # save_memory, search_memory + MemoryStore
│ │ ├── planning_tools.py # save_plan, get_plan + PlanStore
│ │ ├── task_tools.py # save_tasks, get_tasks + TaskStore
│ │ ├── hitl_tools.py # request_approval + ApprovalManager, HITLConfig
│ │ ├── reflection_tools.py # save_reflection + ToolReflectionStore
│ │ ├── shell/ # 8-layer shell security
│ │ └── webfetch/ # Fetcher, converter, validator, robots
│ ├── knowledge_base/
Expand Down Expand Up @@ -114,6 +114,9 @@ Workflow:
2. For features: create `feature/<name>` (or `fix/<name>` or `refactor/<name>`) from `develop`, work there, merge back to `develop`
3. When ready to release: merge `develop` → `main` and tag the release

### What NOT to commit
- `docs/` is gitignored on purpose (see `.gitignore`). It is a scratchpad for review notes, plans, and internal analysis. **Never `git add docs/…` or suggest committing anything under `docs/`.** If a document belongs in the repo, it lives elsewhere (README, CHANGELOG, top-level `*.md`).

## Development Principles

### Code Style
Expand All @@ -130,7 +133,8 @@ Workflow:

### Key Design Patterns
- **Tool error handling**: All tools return `{"success": bool, ...}` dicts. Never raise `ToolError`.
- **Tool registration**: Use `@register_tool(category=..., permission_level=..., description=...)` decorator. Tools are auto-discovered via the global `ToolRegistry`.
- **Tool registration**: Use `@register_tool(category=..., capabilities=..., description=...)` decorator. `capabilities=` is required — pass `EXEMPT` for tools that need no permission check or a list of `Capability(name, target_arg=...)` tuples the engine matches against rules. Tools are auto-discovered via the global `ToolRegistry`.
- **Permissions**: `workflow/permissions/` holds a framework-independent engine that evaluates declared capabilities against rules from four sources (builtin, user `~/.{app_name}/settings.json`, project `./.{app_name}/settings.json`, in-memory session). ADK + LangGraph gate tool calls via `workflow/adk/permission_plugin.py::PermissionPlugin` and `workflow/langgraph/permission_wrap.py::wrap_tool_for_permission`. See `docs/superpowers/specs/2026-04-18-permissions-system-design.md`.
- **Service registry**: Tools access services and shared state via `get_service(key)` from `workflow.service_registry`. A single ContextVar holds a `dict[str, Any]` set by the workflow manager during processing. Complex services (KBManager, SandboxManager, MemoryStore) are lazily created; simple state (plan string, task list) lives directly in the registry dict.
- **Manager detection**: Tools decorated with `@requires("kb_manager")` etc. are scanned by `BaseWorkflowManager._detect_required_managers()` which lazily creates only the needed services.
- **Atomic writes**: Use `atomic_write_json`/`atomic_write_text` from `persistence/_utils.py` for file persistence.
Expand Down
64 changes: 33 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ Agentic CLI provides the core infrastructure for building interactive CLI applic
├─────────────────────────────┬───────────────────────────────────────┤
│ GoogleADKWorkflowManager │ LangGraphWorkflowManager │
│ (default) │ (optional: langgraph extra) │
│ + ConfirmationPlugin │ + confirmation tool wrapper
│ + PermissionPlugin │ + wrap_tool_for_permission
│ + LLMLoggingPlugin │ + native ToolNode │
│ + TaskProgressPlugin │ │
└─────────────────────────────┴───────────────────────────────────────┘
Expand Down Expand Up @@ -145,7 +145,7 @@ manager = GoogleADKWorkflowManager(
```

ADK integrations:
- **ConfirmationPlugin** — intercepts `DANGEROUS` tools and prompts the user for approval
- **PermissionPlugin** — evaluates each tool's declared capabilities against the permission engine and prompts the user when no rule matches
- **LLMLoggingPlugin** — structured logging of LLM requests/responses
- **TaskProgressPlugin** — streams the task checklist into its own thinking box

Expand All @@ -171,7 +171,7 @@ Features:
- **Explicit provider support**: Uses `langchain-google-genai` for Gemini (not VertexAI)
- **Thinking mode**: Native support for Claude and Gemini thinking/reasoning
- **Retry policies**: Automatic retry with exponential backoff
- **Confirmation wrapper**: Wraps `DANGEROUS` tools to request HITL approval
- **Permission wrapper**: Wraps each tool to gate execution through the permission engine
- **Event streaming**: Real-time workflow events via `WorkflowEvent`

Requires: `pip install agentic-cli[langgraph]`
Expand All @@ -186,7 +186,7 @@ Requires: `pip install agentic-cli[langgraph]`
| State persistence | In-memory | Memory, PostgreSQL, or SQLite |
| Thinking support | Native (Gemini) | Native (Claude & Gemini) |
| Retry handling | Built-in | Built-in with backoff |
| HITL confirmation | ConfirmationPlugin | Tool wrapper |
| Permission gate | PermissionPlugin | wrap_tool_for_permission |
| Context trimming | Native | Native |

### Auto-selection via Settings
Expand Down Expand Up @@ -320,11 +320,12 @@ configs = [coordinator, researcher, analyst]
Tools are regular Python functions with type hints and docstrings. **All tools return `{"success": bool, ...}` dicts** — never raise exceptions.

```python
from agentic_cli.tools import register_tool, ToolCategory, PermissionLevel
from agentic_cli.tools import register_tool, ToolCategory
from agentic_cli.workflow.permissions import Capability, EXEMPT

@register_tool(
category=ToolCategory.DATA,
permission_level=PermissionLevel.SAFE,
category=ToolCategory.NETWORK,
capabilities=[Capability("http.read")],
description="Search the database for matching records.",
)
def search_database(query: str, limit: int = 10) -> dict:
Expand All @@ -341,15 +342,14 @@ def search_database(query: str, limit: int = 10) -> dict:
return {"success": True, "results": results, "count": len(results)}
```

Registering via `@register_tool` is optional — you can pass raw callables into `AgentConfig.tools`. Registering gives the tool metadata for the registry, permission-aware HITL wrapping, and tool-summary formatting.
Registering via `@register_tool` is optional — you can pass raw callables into `AgentConfig.tools`. Registering gives the tool metadata for the registry, capability-aware permission gating, and tool-summary formatting.

### Permission Levels
### Capabilities

| Level | Behavior |
|-------|----------|
| `SAFE` | Runs silently |
| `CAUTION` | Runs; result is surfaced prominently |
| `DANGEROUS` | Intercepted by HITL — user must approve before execution |
Tool access is gated by the **permission engine** (see the HITL section below). Each registered tool declares what it touches via `capabilities=`:

- **`Capability("namespace.action", target_arg="...")`** — e.g. `Capability("filesystem.write", target_arg="path")`, `Capability("http.read", target_arg="url")`, `Capability("shell.exec", target_arg="command")`. The engine resolves `target_arg` against the actual tool-call arguments and matches against rules.
- **`EXEMPT`** — opts the tool out of the engine entirely. Reserved for backend-internal tools (e.g. ADK `transfer_to_agent`, backend state tools).

### Built-in Tools

Expand Down Expand Up @@ -430,7 +430,7 @@ list_dir("src/", include_hidden=False)
diff_compare(source_a="old.txt", source_b="new.txt")
```

**WRITE tools (caution)**
**WRITE tools**

```python
from agentic_cli.tools import write_file, edit_file
Expand Down Expand Up @@ -534,15 +534,9 @@ Each tool keeps at most N reflections (FIFO eviction). Reflections can be inject

#### HITL (Human-in-the-Loop)

Two mechanisms:

1. **Automatic confirmation for `DANGEROUS` tools** — handled by ADK's `ConfirmationPlugin` and LangGraph's tool wrapper. No code changes needed; just mark the tool `DANGEROUS`.
2. **Explicit `request_approval` tool** — for domain-level checkpoints:
Tool calls are gated by the **permission engine** (`workflow/permissions/`). Each tool declares a list of capabilities (e.g. `filesystem.write(path=...)`); the engine evaluates them against rules from four sources (builtin defaults, user `~/.{app_name}/settings.json`, project `./.{app_name}/settings.json`, in-memory session). When no rule matches, the user is prompted with `Allow once / Allow for session / Allow always (save to project) / Deny`. Always-grants persist into the project settings file so the next run picks them up automatically.

```python
from agentic_cli.tools import hitl_tools
# request_approval(message, options) -> {"success": True, "choice": "..."}
```
See `docs/superpowers/specs/2026-04-18-permissions-system-design.md` for the full design.

## CLI Commands

Expand Down Expand Up @@ -718,15 +712,24 @@ agentic-cli/
│ │ ├── adk/
│ │ │ ├── manager.py # GoogleADKWorkflowManager
│ │ │ ├── event_processor.py
│ │ │ ├── plugins.py # ConfirmationPlugin, LLMLoggingPlugin
│ │ │ ├── plugins.py # LLMLoggingPlugin
│ │ │ ├── permission_plugin.py # PermissionPlugin (capability gating)
│ │ │ └── task_progress_plugin.py
│ │ └── langgraph/
│ │ ├── manager.py # LangGraphWorkflowManager
│ │ ├── graph_builder.py
│ │ ├── state.py
│ │ └── persistence/ # Checkpointers and stores
│ │ ├── langgraph/
│ │ │ ├── manager.py # LangGraphWorkflowManager
│ │ │ ├── graph_builder.py
│ │ │ ├── permission_wrap.py # wrap_tool_for_permission
│ │ │ ├── state.py
│ │ │ └── persistence/ # Checkpointers and stores
│ │ └── permissions/ # Framework-independent permission engine
│ │ ├── capabilities.py # Capability, ResolvedCapability, EXEMPT
│ │ ├── rules.py # Rule, Effect, RuleSource, CheckResult, AskScope
│ │ ├── matchers.py # Path/URL/Shell/StringGlob matchers
│ │ ├── store.py # PermissionContext, BUILTIN_RULES, JSON load/save
│ │ ├── prompt.py # build_request + parse_response
│ │ └── engine.py # PermissionEngine
│ ├── tools/
│ │ ├── registry.py # ToolRegistry, ToolCategory, PermissionLevel
│ │ ├── registry.py # ToolRegistry, ToolCategory, register_tool
│ │ ├── factories.py # Backend-aware tool factories
│ │ ├── executor.py # SafePythonExecutor
│ │ ├── arxiv_tools.py # search_arxiv, fetch_arxiv_paper, ingest_arxiv_paper
Expand All @@ -744,7 +747,6 @@ agentic-cli/
│ │ ├── pdf_utils.py # PDF text extraction helpers
│ │ ├── memory_tools.py # save/search/update/delete + MemoryStore
│ │ ├── reflection_tools.py # save_reflection + ToolReflectionStore
│ │ ├── hitl_tools.py # request_approval
│ │ ├── _core/ # Shared planning/task logic
│ │ │ ├── planning.py
│ │ │ └── tasks.py
Expand Down
4 changes: 3 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "agentic-cli"
version = "0.5.0"
version = "0.5.1"
description = "A framework for building domain-specific agentic CLI applications"
readme = "README.md"
license = "MIT"
Expand Down Expand Up @@ -46,6 +46,8 @@ kb = [
"torch>=2.2.0",
"sentence-transformers>=2.0.0,<3.0.0",
"faiss-cpu>=1.7.0",
"bm25s>=0.2.0",
"rank-bm25>=0.2.2",
]
langgraph = [
"langgraph>=0.2.0",
Expand Down
2 changes: 1 addition & 1 deletion src/agentic_cli/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -92,4 +92,4 @@ def __getattr__(name: str):
"CLISettingsMixin",
]

__version__ = "0.5.0"
__version__ = "0.5.1"
137 changes: 137 additions & 0 deletions src/agentic_cli/knowledge_base/_bm25_backends.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
"""Real BM25 index backends (bm25s, rank_bm25).

Both implement the same interface as MockBM25Index so create_bm25_index()
can return any of them interchangeably. Tokenization is lowercase whitespace
split to match MockBM25Index; the scoring model is the library's own BM25.

Neither underlying library supports true incremental add/remove, so we keep
tokenized docs + chunk_ids in memory and rebuild the model lazily on search.
"""

from __future__ import annotations

import json
from pathlib import Path

from agentic_cli.file_utils import atomic_write_json


def _tokenize(text: str) -> list[str]:
return text.lower().split()


class _BM25BackendBase:
_INDEX_FILE: str = ""

def __init__(self):
self._chunk_ids: list[str] = []
self._tokenized: list[list[str]] = []
self._model = None

@property
def size(self) -> int:
return len(self._chunk_ids)

def add_documents(self, chunk_ids: list[str], texts: list[str]) -> None:
for cid, text in zip(chunk_ids, texts):
self._chunk_ids.append(cid)
self._tokenized.append(_tokenize(text))
self._model = None

def remove_documents(self, chunk_ids: list[str]) -> None:
remove_set = set(chunk_ids)
keep = [
(cid, toks)
for cid, toks in zip(self._chunk_ids, self._tokenized)
if cid not in remove_set
]
if keep:
self._chunk_ids, self._tokenized = map(list, zip(*keep))
else:
self._chunk_ids, self._tokenized = [], []
self._model = None

def rebuild(self, chunk_ids: list[str], texts: list[str]) -> None:
self._chunk_ids = []
self._tokenized = []
self._model = None
self.add_documents(chunk_ids, texts)

def save(self, path: Path) -> None:
path.mkdir(parents=True, exist_ok=True)
atomic_write_json(
path / self._INDEX_FILE,
{"chunk_ids": self._chunk_ids, "tokenized": self._tokenized},
)

def load(self, path: Path) -> None:
index_path = path / self._INDEX_FILE
if not index_path.exists():
return
data = json.loads(index_path.read_text())
self._chunk_ids = data["chunk_ids"]
self._tokenized = data["tokenized"]
self._model = None


class RankBM25Index(_BM25BackendBase):
"""BM25 backed by rank_bm25.BM25Plus (pure Python).

BM25Plus rather than BM25Okapi because Okapi's IDF can go zero or
negative when a term appears in most of the corpus; BM25Plus adds a
delta offset that guarantees positive contributions on real matches.
"""

_INDEX_FILE = "bm25_rank.json"

def search(self, query: str, top_k: int = 10) -> list[tuple[str, float]]:
if not self._chunk_ids:
return []
query_tokens = _tokenize(query)
if not query_tokens:
return []
query_set = set(query_tokens)
if self._model is None:
from rank_bm25 import BM25Plus

self._model = BM25Plus(self._tokenized)
scores = self._model.get_scores(query_tokens)
scored: list[tuple[str, float]] = []
for cid, doc_tokens, score in zip(
self._chunk_ids, self._tokenized, scores
):
if query_set.intersection(doc_tokens):
scored.append((cid, float(score)))
scored.sort(key=lambda x: x[1], reverse=True)
return scored[:top_k]


class BM25sIndex(_BM25BackendBase):
"""BM25 backed by bm25s (NumPy/C-accelerated)."""

_INDEX_FILE = "bm25s_sidecar.json"

def _build_model(self) -> None:
import bm25s

model = bm25s.BM25()
model.index(self._tokenized, show_progress=False)
self._model = model

def search(self, query: str, top_k: int = 10) -> list[tuple[str, float]]:
if not self._chunk_ids:
return []
query_tokens = _tokenize(query)
if not query_tokens:
return []
if self._model is None:
self._build_model()
k = min(top_k, len(self._chunk_ids))
docs, scores = self._model.retrieve(
[query_tokens], k=k, show_progress=False
)
results: list[tuple[str, float]] = []
for idx, score in zip(docs[0], scores[0]):
if score > 0:
results.append((self._chunk_ids[int(idx)], float(score)))
return results
Loading