Skip to content

fix(0.30.5): close issue #269 -- 9 verified findings from security audit#270

Closed
padak wants to merge 1 commit into
fix/267-sync-git-branching-bugsfrom
fix/269-security-audit-9-findings
Closed

fix(0.30.5): close issue #269 -- 9 verified findings from security audit#270
padak wants to merge 1 commit into
fix/267-sync-git-branching-bugsfrom
fix/269-security-audit-9-findings

Conversation

@padak
Copy link
Copy Markdown
Member

@padak padak commented May 7, 2026

Closes #269.

Fixes nine verified security findings surfaced by a three-stage automated audit (use-case map → state-machine → security review). Bugs A–E from #267 were intentionally out of scope. Each finding was verified by reading code; the eight non-trivial ones have explicit regression tests.

Branch base: This PR branches from fix/267-sync-git-branching-bugs (PR #268), because sec-20 modifies _coerce_keboola_id() which was added there. Merge order: #268 first, then this PR.

Summary

ID Severity What File:line
sec-01 + sec-07 Critical Path traversal via API-controlled component_id / component_type in sync pull sync/naming.py:25-28 + services/sync_service.py:434
sec-02 + sec-08 High / Med MCP HTTP subprocess inherits KBC_MASTER_TOKEN* and KBC_MANAGE_API_TOKEN services/mcp_transport.py:115
sec-04 High REPL history file 0644 → 0600 commands/repl.py:63
sec-05 High XSS in lineage show --format html (entity / config names) services/deep_lineage_service.py:1156, 1177, 1204, 1217-1218, 1227
sec-06 High encrypt values --output-file race (world-readable window before chmod) commands/encrypt.py:107-108
sec-11 Medium max_parallel_workers: 0 crashes ThreadPoolExecutor (no ge=1) models.py:76-79
sec-19 Low permissions check ignores --deny-writes / --deny-destructive flags commands/permissions.py:367
sec-20 Low _coerce_keboola_id raises raw ValueError on non-numeric ID sync/branch_mapping.py:24

Changes

sec-01 + sec-07 — path traversal

naming.config_path() previously interpolated component_type and component_id raw, while only config_name was sanitized. With template {component_type}/{component_id}/{config_name}, an API-controlled component_id = "../../../etc" would route sync pull writes outside the workspace.

  • New sync/naming.py:sanitize_path_segment() rejects /, \, and parent-directory references (..) while preserving dots, hyphens, underscores. Legitimate component IDs (keboola.ex-db-mysql, kds-team.app-custom-python, keboola.python-transformation-v2) pass through unchanged.
  • Both component_type and component_id now go through it in config_path().
  • Defense in depth: new _ensure_within_branch() in services/sync_service.py resolves branch_dir and config_dir and raises ConfigError if the resolved config dir is not contained in the branch dir. Called before any file write.

sec-02 + sec-08 — MCP HTTP env scrubbing

The HTTP transport at services/mcp_transport.py:_start() used subprocess.Popen(cmd, ...) with no env= argument → MCP server inherited the full kbagent environment. The stdio transport at mcp_service.py:_build_server_params() was already safe (explicit minimal env via MCP SDK's get_default_environment()); only the HTTP transport leaked.

  • New _build_minimal_env() allow-lists only what the MCP binary needs: PATH, HOME, USER, LOGNAME, TERM, SHELL, TMPDIR/TEMP/TMP, LANG, LC_ALL, LC_CTYPE, PYTHONPATH/PYTHONHOME/VIRTUAL_ENV, UV_CACHE_DIR/UV_PYTHON, XDG_*. Every KBC_* is omitted.
  • Per-project Storage tokens still flow via HTTP request headers (existing design).

sec-04 — REPL history 0600

commands/repl.py:_get_history_path() now atomically creates the file with os.open(O_WRONLY|O_CREAT|O_EXCL, 0o600) and tightens existing files via chmod(0o600) (silently no-op on filesystems that reject chmod, e.g. some network mounts).

sec-05 — HTML XSS

services/deep_lineage_service.py:render_er_diagram() previously did name.replace('"', "'") for entity/config names, leaving <, >, and & untouched. A Keboola table named </div><script>alert(1)</script> would inject the script into the generated HTML body where the browser parses it before Mermaid runs.

Fix: html.escape(s, quote=True) for every API-derived string in render_er_diagram(). Mermaid renders entities back to their characters in SVG text so visible output is unchanged.

sec-06 — atomic encrypt output

commands/encrypt.py replaced output_file.write_text(...) + output_file.chmod(0o600) with os.open(path, O_WRONLY|O_CREAT|O_TRUNC, 0o600) + os.write(). No race window between create and chmod.

sec-11 — max_parallel_workers ge=1

Added ge=1 to the Pydantic Field in models.py. BaseService._resolve_max_workers() also clamps to >= 1 defensively (so a legacy on-disk config with 0 does not break startup before the validator runs).

sec-19 — permissions check applies session flags

commands/permissions.py:permissions_check() now uses apply_firewall_flags() the same way permissions list does, so kbagent --deny-writes permissions check branch.create correctly returns DENIED.

sec-20 — descriptive ValueError

sync/branch_mapping.py:_coerce_keboola_id() now wraps int(raw) in try/except and raises a descriptive ValueError naming the bad value. load_branch_mapping() re-wraps with the file path so users see Failed to parse /path/to/branch-mapping.json: Invalid branch ID 'not-a-number'.

Tests

33 new regression tests:

File Tests
tests/test_sync_naming.py TestSanitizePathSegment (4) + TestConfigPathTraversalRegression (2)
tests/test_mcp_transport.py TestBuildMinimalEnv (6)
tests/test_repl.py TestHistoryPathPermissions (2)
tests/test_deep_lineage_service.py TestRenderErDiagramXssRegression (2)
tests/test_models.py extends TestMaxParallelWorkersValidation (2)
tests/test_sync_branch_mapping.py extends TestBranchMapping and TestBranchMappingIO (2)
tests/test_permissions_cli.py extends TestPermissionsCheck (2)

Total suite: 2830 passed (was 2797).

Test plan

  • make lint format-check skill-check changelog-check check-error-codes
  • uv run pytest tests/ --ignore=tests/test_e2e.py --ignore=tests/test_sync_e2e.py --ignore=tests/test_e2e_lineage_deep.py → 2830 passed
  • Each fix has a regression test that fails against main before the fix
  • Manual verification of sanitize_path_segment() on adversarial inputs (../../etc, foo/../bar, .., etc.)
  • Manual verification that _build_minimal_env() does not include any KBC_* even when those vars are set
  • Manual verification that html.escape renders Mermaid output as expected
  • e2e — not strictly needed; all fixes are at unit boundaries and have unit tests

Out of scope (tracked as follow-ups in #269)

  • sec-03 — Token in --token argv visible to ps. Design change (deprecation cycle for the flag).
  • sec-09permissions reset/set PTY bypass. Design change (second-factor / parent-process check).
  • sec-10 — Silent branch-name collisions in sanitize_name. UX warning, not security.
  • sec-12version_cache.json non-atomic write race.
  • sec-13--input @file reads arbitrary path. Design discussion needed.
  • sec-14KBC_MASTER_TOKEN_{ALIAS} from unvalidated alias.
  • sec-16 — Lineage HTML loads Mermaid from CDN without SRI.
  • sec-17KBAGENT_AUTO_UPDATE toggle stale cache.
  • sec-18doctor --fix uses PATH-resolved uv.

Each will get its own follow-up issue if/when prioritized.

Audit artifacts

The audit produced three structured documents (kept locally for the test design phase):

  • /tmp/kbagent-audit/use-cases.md — 28 KB — every command + 20 life situations + cross-cutting concerns
  • /tmp/kbagent-audit/state-machine.md — 43 KB — every persistent / in-memory state with file:line, command-by-command read/write/assert table, life-situation traces, 9 forbidden state combinations, concurrency model, 6 Mermaid state diagrams
  • /tmp/kbagent-audit/issues.md — 15 KB — the 20 findings with file:line, repro, exploitability, suggested fix, confidence

This release ships fixes for nine security findings surfaced by an
audit driven by three sub-agents in sequence (use-case map ->
state-machine -> security review). Findings A-E from issue #267
(merged in #268) are intentionally out of scope for this audit.

Critical
- sec-01 / sec-07: path traversal via API-controlled component_id and
  component_type. naming.config_path() interpolated those tokens raw,
  so a malicious or compromised stack returning component_id =
  "../../etc" would direct sync pull writes outside the workspace.
  Fix: new sanitize_path_segment() that rejects /, \, and parent
  references while preserving dots/hyphens/underscores in legitimate
  IDs (keboola.ex-db-mysql, kds-team.app-custom-python). Plus a
  defense-in-depth confinement check in sync_service.pull() that
  raises ConfigError if the resolved config dir is not contained in
  the branch dir.

High
- sec-02 / sec-08: MCP HTTP transport subprocess no longer inherits
  any KBC_* token from the kbagent process env. Pre-fix,
  Popen(cmd, ...) had no env= arg, so KBC_MASTER_TOKEN,
  KBC_MASTER_TOKEN_<ALIAS>, KBC_MANAGE_API_TOKEN, and KBC_TOKEN all
  leaked to the MCP server when KBAGENT_MCP_TRANSPORT=http. New
  _build_minimal_env() allow-lists only PATH, HOME, locale, and
  uv/python cache vars. Per-project Storage tokens still flow via
  HTTP request headers as before. Closes the v0.29.0 manage-token
  default-deny gap on the HTTP transport path.
- sec-04: REPL history file (~/.config/keboola-agent-cli/repl_history)
  is now created with mode 0600. Pre-fix, FileHistory created the
  file with the user's umask (typically 0644), persisting any token
  typed at the prompt in plaintext readable by group/world.
- sec-05: kbagent lineage show --format html no longer emits XSS-
  vulnerable HTML. render_er_diagram() previously did
  name.replace('"', "'") which left <, >, and & untouched -- a
  Keboola entity named </div><script>...</script> would inject the
  script. Fix uses html.escape(s, quote=True) consistently for every
  API-derived string.
- sec-06: kbagent encrypt values --output-file now atomically creates
  the file with mode 0600 via os.open(..., 0o600). Replaces the
  previous Path.write_text() + chmod(0o600) which left a race window
  where the file was world-readable.

Medium
- sec-11: max_parallel_workers Pydantic field now requires ge=1 in
  addition to le=100. Pre-fix, max_parallel_workers: 0 in config.json
  passed validation and crashed every multi-project op with
  ValueError from ThreadPoolExecutor. _resolve_max_workers() also
  clamps defensively for legacy on-disk configs.

Low
- sec-19: kbagent permissions check OPERATION now reflects the
  EFFECTIVE policy for the invocation -- persisted policy MERGED
  with --deny-writes / --deny-destructive session flags -- matching
  permissions list semantics. Pre-fix, an AI agent inspecting its
  own self-imposed firewall got a misleading "allowed" answer.
- sec-20: _coerce_keboola_id() and load_branch_mapping() now raise
  descriptive errors for malformed branch IDs in branch-mapping.json.
  Pre-fix, "id": "not-a-number" produced raw "invalid literal for
  int()" from deep inside the parser.

Tests
- 33 new regression tests across 7 test files. Total suite: 2830
  passed (was 2797).

Out of scope (tracked as follow-up in #269)
- sec-03 token-in-argv deprecation
- sec-09 PTY-bypass design
- sec-10 silent name collisions
- sec-12 version_cache atomicity
- sec-13 @file path restriction
- sec-14 alias validation
- sec-16 SRI on Mermaid CDN
- sec-17 KBAGENT_AUTO_UPDATE toggle staleness
- sec-18 doctor --fix uv via PATH

Audit artifacts kept locally at /tmp/kbagent-audit/use-cases.md (28K)
+ state-machine.md (43K) + issues.md (15K) for the test design phase.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant