Skip to content

Security audit: 9 verified findings (path traversal, env leak, file perms, XSS, etc.) #269

@padak

Description

@padak

Summary

Comprehensive security audit of kbagent CLI surface, performed across three sub-agents (use case map → state machine → security review). Result: 20 findings, of which 9 are verified bugs that warrant immediate fix in a single PR. The rest are either suspected (need design discussion), low-impact hardening, or false positives.

This is a follow-up to #267 (which fixed chained sync git-branching bugs). The audit specifically excluded those bugs and looked for OTHER issues across the entire CLI surface — auth, secrets, races, broken invariants, footguns, input validation, subprocess safety, fail-open paths, multi-project corruption, AI agent attack surface, doc drift.

The methodology and per-finding artefacts are kept locally:

  • Use case map: 3000+ words, all command groups + 20 life situations + cross-cutting concerns
  • State machine: every persistent + in-memory state, command-by-command transitions, 9 forbidden state combinations, concurrency model, 6 Mermaid diagrams
  • Security findings doc: 20 findings with file:line, repro, exploitability, suggested fix, confidence

In-scope for this PR (9 verified findings)

Critical

ID Title File:line
sec-01 Path traversal via API-controlled component_id in sync pull sync/naming.py:25-28

High

ID Title File:line
sec-02 + sec-08 KBC_MASTER_TOKEN_* and KBC_MANAGE_API_TOKEN inherited by MCP HTTP transport services/mcp_transport.py:115-119
sec-04 REPL history file stores tokens with default umask (typically 0644) commands/repl.py:63-67
sec-05 XSS in lineage HTML output via unescaped Keboola entity names services/deep_lineage_service.py:1156, 1177, 1204, 1217-1218, 1227
sec-06 encrypt values --output-file race: file is world-readable between write_text and chmod(0o600) commands/encrypt.py:107-108

Medium

ID Title File:line
sec-07 component_type from API also unsanitized (companion to sec-01) sync/naming.py:26-27
sec-11 max_parallel_workers accepts 0 (no ge=1) → ThreadPoolExecutor crash models.py:76-79

Low

ID Title File:line
sec-19 permissions check ignores --deny-writes/--deny-destructive session flags commands/permissions.py:367
sec-20 _coerce_keboola_id raises raw ValueError on non-numeric ID sync/branch_mapping.py:24

Detailed findings (in-scope)

sec-01 — Path traversal via API-controlled component_id in sync pull (CRITICAL)

naming.config_path() interpolates component_type and component_id directly from the Keboola API response into the filesystem path template:

return naming_template.format(
    component_type=component_type,        # NOT sanitized
    component_id=component_id,            # NOT sanitized
    config_name=sanitize_name(config_name),
)

Default template {component_type}/{component_id}/{config_name} exposes both first two segments to whatever the API returns. A compromised stack or supply-chain attack injecting component_id = "../../../.ssh" would direct file writes outside the sync workspace.

Repro:

>>> from pathlib import Path
>>> Path('/tmp/proj/main') / 'other/../../../etc/passwd/test'
# resolves to /private/tmp/etc/passwd/test — outside the workspace

No Path.is_relative_to() confinement check exists anywhere in sync_service.py before file writes.

Fix: Apply sanitize_name() to component_type and component_id in naming.config_path(), AND add a defensive confinement check (config_dir.resolve().is_relative_to(branch_dir.resolve())) before any file write.


sec-02 + sec-08 — Master / Manage tokens inherited by MCP HTTP transport (HIGH/MED)

services/mcp_transport.py:115-119:

self._process = subprocess.Popen(
    cmd,
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
)

No env= argument → MCP server subprocess inherits the full kbagent environment including KBC_MASTER_TOKEN, KBC_MASTER_TOKEN_*, and (when --allow-env-manage-token is set) KBC_MANAGE_API_TOKEN. The default-deny in v0.29.0 protected against passing manage token to API calls within kbagent itself, but the MCP HTTP transport bypasses that protection by inheriting all env vars.

The stdio transport at services/mcp_service.py:228-239 correctly builds a minimal explicit env (KBC_STORAGE_TOKEN/URL/BRANCH_ID only). The HTTP transport does not.

Fix: Build explicit minimal env dict in _start(), mirroring the stdio approach. Block-list all KBC_* env vars except the per-project ones explicitly required.


sec-04 — REPL history file stores tokens with default umask (HIGH)

commands/repl.py:63-67:

def _get_history_path() -> Path:
    config_dir = Path(platformdirs.user_config_dir("keboola-agent-cli"))
    config_dir.mkdir(parents=True, exist_ok=True)
    return config_dir / "repl_history"

prompt_toolkit.FileHistory then creates this file with the user's default umask (typically 0644). Every line typed at the REPL — including project add --token 901-xxx-Secret — is persisted in plaintext.

Fix: Pre-create the history file with os.open(path, O_WRONLY|O_CREAT, 0o600) before passing to FileHistory; or chmod(0o600) the existing file at startup if it exists.


sec-05 — XSS in lineage HTML output (HIGH)

services/deep_lineage_service.py:1156 and similar:

safe_name = entity_name.replace('"', "'")

This replaces double-quotes with single-quotes but does NOT escape <, >, or &. A Keboola table or config named </div><script>alert(1)</script> survives sanitization and is embedded into the generated HTML.

Fix: Use html.escape() consistently for all API-derived strings (entity_name, cfg_label, inp_name, out_name) before embedding into the Mermaid body or the surrounding HTML scaffold.


sec-06 — encrypt values --output-file race (HIGH)

commands/encrypt.py:107-108:

output_file.write_text(json.dumps(result, indent=2), encoding="utf-8")
output_file.chmod(0o600)

write_text creates the file with the default umask (typically 0644). On systems with shared multi-user filesystems, the race window between create and chmod allows other users to read the encrypted secrets file.

Fix: Use os.open(path, O_WRONLY|O_CREAT|O_TRUNC, 0o600) then os.write() — atomic permission set at creation time.


sec-07 — component_type from API also unsanitized (MEDIUM)

Same root cause as sec-01 but for component_type. component_type = "other/../transformations" would misdirect file writes. Fix together with sec-01.


sec-11 — max_parallel_workers accepts 0 (MEDIUM)

models.py:76-79:

max_parallel_workers: int = Field(default=10, le=100)

No ge=1 lower bound. Setting to 0 in config.json passes Pydantic validation, then ThreadPoolExecutor(max_workers=0) raises ValueError, crashing every multi-project operation.

Fix: Add ge=1 to the Field. Add a defensive clamp in _resolve_max_workers() so an old config.json with 0 doesn't break startup.


sec-19 — permissions check ignores session firewall flags (LOW)

commands/permissions.py:367:

engine = PermissionEngine(config.permissions)
allowed = engine.is_allowed(operation)

The companion permissions list command at line 115 correctly applies session flags via apply_firewall_flags(). permissions check does not — so kbagent --deny-writes permissions check branch.create returns ALLOWED even though actual enforcement would deny.

Fix: Apply apply_firewall_flags(config.permissions, deny_writes=ctx.obj.get('deny_writes'), deny_destructive=ctx.obj.get('deny_destructive')) consistently with permissions list.


sec-20 — _coerce_keboola_id raises raw ValueError (LOW)

sync/branch_mapping.py:24:

return int(raw)

A hand-edited branch-mapping.json with {"id": "not-a-number"} raises raw ValueError, crashing all sync operations. Should raise a descriptive ConfigError, or treat as None (production) with a warning.

Fix: Wrap in try/except ValueError → either descriptive error or warning + None.


Out of scope for this PR (deferred / suspected / discussion needed)

ID Severity Title Why deferred
sec-03 high Token in --token argv visible to ps Design change: deprecation cycle for the flag
sec-09 medium permissions reset PTY bypass Design change: requires second-factor (sudo-pattern)
sec-10 medium Branch name silent collisions UX improvement, not security
sec-12 medium version_cache.json non-atomic write race Low impact; needs atomic-write helper
sec-13 medium --input @file reads arbitrary path Design discussion: restrict to relative paths or whitelist?
sec-14 medium KBC_MASTER_TOKEN_{ALIAS} from unvalidated alias Design: validate alias on project add
sec-16 low Lineage HTML loads Mermaid from CDN (no SRI) Hardening; needs Mermaid version pin policy
sec-17 low KBAGENT_AUTO_UPDATE toggle stale cache Document or invalidate on transition
sec-18 low doctor --fix uses PATH for uv Low risk in practice; needs design
sec-15 n/a manifest apiHost not used (false positive) Confirmed safe — no SSRF

Each deferred finding will get its own follow-up issue if/when prioritized.


Methodology

This audit was generated by three specialized sub-agents in sequence:

  1. kbagent expert read the entire command surface (commands/, AGENT_CONTEXT, workflow docs) and produced a complete use case map: every command's pre-conditions, happy paths, flag combinations that materially change behavior, plus 20 multi-command "life situations" representing real user goals (new dev onboarding, dev-branch workflow, CI/CD token rotation, AI agent flow, etc.). Plus cross-cutting concerns (auth model, branch context, multi-project semantics, local state, env vars).

  2. state-machine engineer read the use case map and traced actual code paths. Output: every persistent and in-memory state with type, owner, persistence, lifetime, invariants; per-command read/write/assert table; per-life-situation transition trace with file:line references; 9 forbidden state combinations and their guards (or absence); concurrency / process model; 6 Mermaid state diagrams.

  3. security engineer read both prior outputs and audited 11 categories: auth/authz, secrets, races/TOCTOU, broken invariants, footguns/UX traps, input validation, subprocess safety, fail-open vs fail-closed, multi-project state corruption, AI agent attack surface, doc drift. Each finding has severity, category, file:line, repro, cost, suggested fix, confidence.

This is the kind of audit that surfaces issues the original developers and reviewers don't see because they're too close to the code. Bugs A–E from #267 were intentionally excluded — those are already fixed in #268.

The PR will follow the same protocol as #267 / #268: e2e verification (where applicable), regression tests, version bump, changelog, ruff lint/format, and CI checks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions