Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .claude-plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
"plugins": [
{
"name": "kbagent",
"version": "0.30.4",
"version": "0.30.5",
"source": "./plugins/kbagent",
"description": "AI-friendly interface to Keboola Connection projects — explore configs, jobs, lineage, call MCP tools, manage dev branches, and debug SQL in workspaces",
"category": "development"
Expand Down
2 changes: 1 addition & 1 deletion plugins/kbagent/.claude-plugin/plugin.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "kbagent",
"version": "0.30.4",
"version": "0.30.5",
"description": "AI-friendly interface to Keboola Connection projects — explore configs, jobs, lineage, call MCP tools, manage dev branches, and debug SQL in workspaces",
"author": {
"name": "Keboola",
Expand Down
1 change: 1 addition & 0 deletions plugins/kbagent/skills/kbagent/references/gotchas.md
Original file line number Diff line number Diff line change
Expand Up @@ -504,6 +504,7 @@ One project failing does not block others. Check the `errors` array:
- Classes: `--deny-writes` blocks `cli:write` + `tool:write` (covers write+destructive+admin). `--deny-destructive` is narrower -- blocks only `cli:destructive` + `tool:destructive`; pure write ops like `storage create-bucket` stay allowed.
- Blocked operation exits **6** with `error.code = PERMISSION_DENIED`. Read commands stay unaffected.
- Safe to run under either flag without mutating the saved policy -- useful when your agent needs a one-shot read-only run on a machine with a write-enabled config.
- `permissions check OPERATION` reflects the EFFECTIVE policy (persisted policy MERGED with session flags) **(since v0.30.5)**. Pre-0.30.5 it consulted only the persisted policy, so an agent doing self-introspection (`kbagent --deny-writes permissions check branch.create`) got `allowed: true` despite the session flag denying that op at execution time. If your agent uses `permissions check` to gate destructive actions and may run against pre-0.30.5 installs, also re-check at execution-time exit codes (6 = denied) rather than trusting the dry probe alone.

## `storage create-table` native types + dev-branch materialize (since 0.25.0)

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "keboola-agent-cli"
version = "0.30.4"
version = "0.30.5"
description = "AI-friendly CLI for managing Keboola projects"
readme = "README.md"
requires-python = ">=3.12"
Expand Down
12 changes: 12 additions & 0 deletions src/keboola_agent_cli/changelog.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,18 @@

# Ordered newest-first. Each value is a list of brief one-line descriptions.
CHANGELOG: dict[str, list[str]] = {
"0.30.5": [
"Security (critical): `kbagent sync pull` no longer permits API-controlled `component_id` or `component_type` to escape the sync workspace via path traversal. `naming.config_path()` now passes both fields through a new `sanitize_path_segment()` that rejects `/`, `\\`, and parent-directory references (`..`) while preserving the dots, hyphens, and underscores in legitimate component IDs (`keboola.ex-db-mysql`, `kds-team.app-custom-python`). `services/sync_service.py:pull()` adds a defense-in-depth confinement check that raises ConfigError if a resolved config path is not contained in the branch directory. Issue #269 sec-01 / sec-07; threat actor: compromised stack or supply-chain attack on the project token. Pre-fix, `component_id = '../../../etc'` would write outside the project root.",
"Security (high): MCP HTTP transport subprocess no longer inherits Keboola tokens from the kbagent process environment. `mcp_transport.py:_start()` previously used `subprocess.Popen(cmd, ...)` with no `env=` argument, so when `KBAGENT_MCP_TRANSPORT=http` was set, the MCP server inherited `KBC_MASTER_TOKEN`, `KBC_MASTER_TOKEN_<ALIAS>`, `KBC_MANAGE_API_TOKEN`, and `KBC_TOKEN`. New `_build_minimal_env()` allow-lists only the env vars needed for binary discovery and locale handling (PATH, HOME, USER, LANG, LC_*, UV_CACHE_DIR, PYTHONPATH, ...) and explicitly drops every `KBC_*` token. Per-project Storage tokens still flow through HTTP request headers as before. Issue #269 sec-02 / sec-08; closes the gap left by v0.29.0's manage-token default-deny on the HTTP transport path.",
"Security (high): REPL history file (`~/.config/keboola-agent-cli/repl_history`) is now created with mode 0600. Pre-fix, `prompt_toolkit.FileHistory` created the file with the user's default umask (typically 0644), persisting any token typed at the prompt (e.g. `project add --token ...`) in plaintext readable by group/world. `_get_history_path()` now atomically pre-creates the file with 0o600 and tightens existing files via chmod. Issue #269 sec-04.",
"Security (high): `kbagent lineage show --format er` (and the lineage server's ER view) no longer emits XSS-vulnerable HTML. `services/deep_lineage_service.py:render_er_diagram()` previously did `name.replace('\"', \"'\")` which left `<`, `>`, and `&` untouched -- a Keboola table or config named `</div><script>alert(1)</script>` would inject the script into the generated HTML body where the browser parses it before Mermaid runs. (`--format html` flowchart was already safe; it routes through `render_mermaid()` which escapes labels.) Fix uses `html.escape(s, quote=True)` consistently for every API-derived string embedded into the Mermaid body and surrounding HTML. Mermaid renders the entities back to their characters in SVG text so visible output is unchanged. Issue #269 sec-05.",
"Security (high): `kbagent encrypt values --output-file PATH` now atomically creates the file with mode 0600. Pre-fix, `Path.write_text()` followed by `chmod(0o600)` left a race window where the encrypted secrets file was world-readable on systems with permissive umask (e.g. 0644 default). Replaced with `os.open(path, O_WRONLY|O_CREAT|O_TRUNC, 0o600)` + `os.write()`. Issue #269 sec-06.",
"Security (medium): `max_parallel_workers` Pydantic field now requires `ge=1` in addition to `le=100`. Pre-fix, a config.json with `max_parallel_workers: 0` passed validation, then `ThreadPoolExecutor(max_workers=0)` crashed every multi-project operation with `ValueError`. `BaseService._resolve_max_workers()` also clamps to >= 1 defensively so a legacy on-disk config does not crash startup. Issue #269 sec-11.",
"Security (low): `kbagent permissions check OPERATION` now reflects the EFFECTIVE policy for the current invocation -- the persisted policy MERGED with `--deny-writes` / `--deny-destructive` session flags -- matching `permissions list` semantics. Pre-fix, `permissions check` consulted only the persisted policy, so an AI agent inspecting its own self-imposed firewall got a misleading `allowed` answer for write ops. Issue #269 sec-19.",
'Security (low): `_coerce_keboola_id()` and `load_branch_mapping()` now raise descriptive errors for malformed branch IDs in `branch-mapping.json`. Pre-fix, a hand-edited file with `"id": "not-a-number"` produced a raw `ValueError: invalid literal for int()` from deep inside the parser. New error names the offending file path and the bad value so users can fix it. Issue #269 sec-20.',
"Tests: 33 new regression tests across `test_sync_naming.py` (sanitize_path_segment + config_path traversal-resistance), `test_mcp_transport.py` (env scrubbing for KBC_*), `test_repl.py` (history file 0600 + tighten existing), `test_deep_lineage_service.py` (XSS escape in ER diagram for table and config names), `test_models.py` (max_parallel_workers ge=1), `test_sync_branch_mapping.py` (descriptive ValueError + path-prefixed wrap), `test_permissions_cli.py` (--deny-writes / --deny-destructive applied by `permissions check`). Total suite: 2830 passed.",
"Audit methodology: this release was driven by a three-stage automated audit (kbagent expert -> state-machine engineer -> security engineer), each as a sub-agent reading the previous output. The use-case map (28 KB) covered every command + 20 multi-command life situations + cross-cutting concerns. The state-machine doc (43 KB) traced every persistent / in-memory state with file:line references, command-by-command read/write/assert table, life-situation traces, and 9 forbidden state combinations. The security review (20 findings, 15 KB) prioritized 9 verified issues for this release; the remainder (sec-03 token-in-argv deprecation, sec-09 PTY-bypass design, sec-10 silent collisions, sec-12 cache atomicity, sec-13 @file restriction, sec-14 alias validation, sec-16 SRI on Mermaid CDN, sec-17 toggle staleness, sec-18 PATH for uv) are tracked in #269 as out-of-scope follow-ups.",
],
"0.30.4": [
"Fix: `kbagent sync pull` against a linked dev branch now writes files under the linked branch's directory (`branch-<id>/...` or its sanitized name), not under `main/`. Pre-fix, `branch_link` persisted Keboola branch IDs as **strings** in `.keboola/branch-mapping.json` (`kbc_branch_id = str(branch_info['id'])` at five call-sites in `services/sync_service.py`), but every comparison against the manifest read those IDs as the **int** they're typed as in `ManifestBranch.id: int` and on the Storage API. Cross-type `int == str` is always False in Python, so `_find_branch_path` fell back to the default branch (`manifest.branches[0].path == 'main'`) and `_ensure_branch_registered` re-registered the 'unknown' branch on every pull, appending a duplicate `branches[]` entry with a mangled `branch-<id>-<id>` path until the manifest was hand-cleaned. The same comparison failed in `_ensure_branch_registered`'s `b.get('id') == branch_id` API-name lookup, so the branch's human-readable name from the API was never used and the path always fell through to the numeric `branch-<id>` fallback (the 'side observation' from issue #267). Fix is end-to-end `int`: `branch_link` writes `int(branch_info['id'])`, `BranchMappingEntry.keboola_id: int | None` (was `str | None`), and `from_dict` silently coerces legacy string IDs on load so existing user workspaces upgrade without manual editing. Bug A from issue #267, reported externally on v0.27.0 and reproduced on v0.30.3.",
"Fix: `kbagent sync pull` no longer re-writes every previously-tracked config on every invocation in git-branching mode. The `branch_switched` guard at `services/sync_service.py:489-491` compared `existing_branch_ids[lookup_key]` (int from manifest) against the polluted str return of `_resolve_branch_id`; cross-type `!=` was always True, so the idempotency check was completely defeated and `files_written` ticked up on every pull even when nothing changed. The Bug A end-to-end int fix automatically restores correct behaviour here -- this is Bug C from issue #267, fixed transitively. Regression test pins `pull-pull-pull` against an unchanged remote and asserts manifest stability.",
Expand Down
5 changes: 4 additions & 1 deletion src/keboola_agent_cli/commands/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -807,7 +807,10 @@
Remove all restrictions.

kbagent permissions check OPERATION
Check if operation is allowed. Exit 0=allowed, 6=denied.
Check if operation is allowed. Exit 0=allowed, 6=denied. Reflects the
EFFECTIVE policy: persisted policy MERGED with --deny-writes /
--deny-destructive session flags (since 0.30.5; pre-0.30.5 consulted
only the persisted policy and could mislead self-introspection).

## Tips for AI Agents

Expand Down
18 changes: 16 additions & 2 deletions src/keboola_agent_cli/commands/encrypt.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
"""

import json
import os
import sys
from pathlib import Path

Expand Down Expand Up @@ -104,8 +105,21 @@ def encrypt_values(
raise typer.Exit(code=exit_code) from None

if output_file:
output_file.write_text(json.dumps(result, indent=2), encoding="utf-8")
output_file.chmod(0o600)
# Atomic create with 0600 -- avoids the race window between
# write_text() (which uses the user's umask, often 0644) and a
# subsequent chmod (issue #269 sec-06). On the chmod-after-write
# pattern, another local user could read the encrypted secrets in
# the brief window before the chmod ran.
payload = json.dumps(result, indent=2).encode("utf-8")
fd = os.open(
str(output_file),
os.O_WRONLY | os.O_CREAT | os.O_TRUNC,
0o600,
)
try:
os.write(fd, payload)
finally:
os.close(fd)
if not formatter.json_mode:
formatter.console.print(f"Encrypted values written to {output_file}")

Expand Down
18 changes: 17 additions & 1 deletion src/keboola_agent_cli/commands/permissions.py
Original file line number Diff line number Diff line change
Expand Up @@ -358,13 +358,29 @@ def permissions_check(
) -> None:
"""Check if a specific operation is allowed.

Reflects the EFFECTIVE policy for this invocation: the persisted
policy merged with any top-level session flags like ``--deny-writes``
or ``--deny-destructive`` (issue #269 sec-19). Pre-fix, ``permissions
check`` only consulted the persisted policy, so an AI agent reading
its own self-imposed firewall flag would get a misleading answer.

Exit code 0 = allowed, 6 = denied.
"""
from ..cli import apply_firewall_flags

formatter = get_formatter(ctx)
config_store: ConfigStore = get_service(ctx, "config_store")
config = config_store.load()

engine = PermissionEngine(config.permissions)
deny_writes = bool(ctx.obj.get("deny_writes")) if ctx.obj else False
deny_destructive = bool(ctx.obj.get("deny_destructive")) if ctx.obj else False
effective_policy = apply_firewall_flags(
config.permissions,
deny_writes=deny_writes,
deny_destructive=deny_destructive,
)

engine = PermissionEngine(effective_policy)
allowed = engine.is_allowed(operation)

if formatter.json_mode:
Expand Down
24 changes: 22 additions & 2 deletions src/keboola_agent_cli/commands/repl.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
completion, persistent history, and colored output.
"""

import contextlib
import os
import shlex
import sys
from pathlib import Path
Expand Down Expand Up @@ -61,10 +63,28 @@ def get_completions(self, document, complete_event):


def _get_history_path() -> Path:
"""Return path for persistent REPL history file."""
"""Return path for persistent REPL history file.

Ensures the file exists with 0600 permissions before returning so
``prompt_toolkit.FileHistory`` does not create it world-readable
under a permissive umask. REPL command lines may include
``project add --token TOKEN``, so the history must not leak to
group/other (issue #269 sec-04).
"""
config_dir = Path(platformdirs.user_config_dir("keboola-agent-cli"))
config_dir.mkdir(parents=True, exist_ok=True)
return config_dir / "repl_history"
history_path = config_dir / "repl_history"
if not history_path.exists():
# Atomic create with 0600 so FileHistory.append never widens perms.
fd = os.open(str(history_path), os.O_WRONLY | os.O_CREAT | os.O_EXCL, 0o600)
os.close(fd)
else:
# Pre-existing file might have been created by an older kbagent
# under a permissive umask -- tighten perms in place. Filesystems
# that do not support chmod (e.g. some network mounts) silently no-op.
with contextlib.suppress(OSError):
history_path.chmod(0o600)
return history_path


def _run_repl(
Expand Down
1 change: 1 addition & 0 deletions src/keboola_agent_cli/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,7 @@ class AppConfig(BaseModel):
default_project: str = Field(default="", description="Alias of the default project")
max_parallel_workers: int = Field(
default=10,
ge=1,
le=100,
description="Max concurrent threads for multi-project operations (env: KBAGENT_MAX_PARALLEL_WORKERS)",
)
Expand Down
8 changes: 6 additions & 2 deletions src/keboola_agent_cli/services/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,11 @@ def _resolve_max_workers(self) -> int:
"""Resolve max parallel workers: env var > config.json > default (10).

Returns:
Positive integer for ThreadPoolExecutor max_workers.
Positive integer for ThreadPoolExecutor max_workers. Always >= 1
so a legacy config.json with ``max_parallel_workers: 0`` does not
crash multi-project ops with ``ValueError`` from the executor
(issue #269 sec-11). New configs are validated at the Pydantic
layer (``ge=1``); this clamp guards loaded-from-disk values.
"""
env_val = os.environ.get(ENV_MAX_PARALLEL_WORKERS)
if env_val is not None:
Expand All @@ -106,7 +110,7 @@ def _resolve_max_workers(self) -> int:
pass

config = self._config_store.load()
return config.max_parallel_workers
return max(config.max_parallel_workers, 1)

def _run_parallel(
self,
Expand Down
17 changes: 11 additions & 6 deletions src/keboola_agent_cli/services/deep_lineage_service.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
"""

import hashlib
import html
import json
import logging
import re
Expand Down Expand Up @@ -1153,7 +1154,11 @@ def render_er_diagram(
if not t:
continue
entity_name = f"{t.project_alias}:{t.name}"
safe_name = entity_name.replace('"', "'")
# html.escape covers <>&" so an API-supplied table or config
# name like </div><script>...</script> cannot inject HTML into
# the generated lineage page (issue #269 sec-05). Mermaid
# renders HTML entities back to their characters in SVG text.
safe_name = html.escape(entity_name, quote=True)

if not show_columns:
# Compact: just entity name with row count as single attribute
Expand All @@ -1174,7 +1179,7 @@ def render_er_diagram(
if src_expr.endswith(f".{col}") and fqn in src_expr.replace(f".{col}", ""):
cfg = graph.configurations.get(cfg_fqn)
cfg_label = cfg.config_name if cfg else cfg_fqn.split("/")[-1]
safe_label = cfg_label.replace('"', "'")
safe_label = html.escape(cfg_label, quote=True)
comment = f' "to {out_col} via {safe_label}"'
break
if out_col == col:
Expand All @@ -1201,7 +1206,7 @@ def render_er_diagram(
for cfg_fqn in config_fqns:
cfg = graph.configurations.get(cfg_fqn)
cfg_label = cfg.config_name if cfg else cfg_fqn.split("/")[-1]
safe_label = cfg_label.replace('"', "'")
safe_label = html.escape(cfg_label, quote=True)
inputs = input_of.get(cfg_fqn, [])
outputs = output_of.get(cfg_fqn, [])
for inp_fqn in inputs:
Expand All @@ -1214,8 +1219,8 @@ def render_er_diagram(
if key in seen_rels:
continue
seen_rels.add(key)
inp_name = f"{inp_t.project_alias}:{inp_t.name}".replace('"', "'")
out_name = f"{out_t.project_alias}:{out_t.name}".replace('"', "'")
inp_name = html.escape(f"{inp_t.project_alias}:{inp_t.name}", quote=True)
out_name = html.escape(f"{out_t.project_alias}:{out_t.name}", quote=True)
lines.append(f' "{inp_name}" ||--o{{ "{out_name}" : "{safe_label}"')

# If config has only inputs and no outputs (writer), show as relationship to config
Expand All @@ -1224,7 +1229,7 @@ def render_er_diagram(
inp_t = graph.tables.get(inp_fqn)
if not inp_t:
continue
inp_name = f"{inp_t.project_alias}:{inp_t.name}".replace('"', "'")
inp_name = html.escape(f"{inp_t.project_alias}:{inp_t.name}", quote=True)
lines.append(f' "{inp_name}" }}o--|| "{safe_label}" : "writes"')

return "\n".join(lines)
Expand Down
Loading
Loading