Close must-have gaps in explorer command by jordanrburger · Pull Request #16 · keboola/cli

jordanrburger · 2026-02-28T23:07:22Z

Summary

Closes 6 must-have gaps identified in the PLAN-catalog.md spec for the kbagent explorer command:

YAML tier config (--tiers): Load project-to-tier mappings from a YAML file with tier descriptions and catalog description override
UNCLASSIFIED warning: Projects without a tier mapping now emit a TIER_UNCLASSIFIED warning instead of silently defaulting to L0
Schema validation: Catalog is validated against kbc-explorer/schema.json after assembly (non-blocking — errors are reported but don't prevent file writes)
Atomic file writes: All 4 output files use write-to-tmp + os.replace() for POSIX-atomic writes
Job limit default: Changed from 100 to 500 to match spec
Tests: 26 new tests covering pure functions (_assign_tier, _compute_job_stats, _type_icon, _build_mermaid) and service integration with mocked dependencies

Test plan

uv run pytest tests/test_explorer_service.py -v — 26/26 pass
uv run pytest tests/ -v — 566 pass, 3 skipped, no failures
uv run kbagent explorer --no-open — end-to-end with real projects
uv run kbagent explorer --tiers tiers.yaml --no-open — with tier config

Closes #15

🤖 Generated with Claude Code

- Add jsonschema + pyyaml dependencies - Schema validation against kbc-explorer/schema.json (non-blocking) - YAML tier config support (--tiers option) with project-to-tier mapping - UNCLASSIFIED tier warning instead of silent L0 default - Atomic file writes (write .tmp then os.replace) - Change job limit default from 100 to 500 - Add 26 tests covering pure functions and service with mocked deps Closes #15 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Document explorer.py, explorer_service.py, test_explorer_service.py in project structure, CLI commands list, and dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…or entries

padak · 2026-03-01T00:24:43Z

Thanks a lot for this contribution, Jordan! Great work on the explorer command foundation.

We've merged it and made a few follow-up fixes in 3ea241f:

CLI wiring — the explorer command wasn't registered in cli.py (add_typer + ExplorerService initialization were missing), so kbagent explorer wasn't showing up.
Orchestration parsing bug — _parse_orchestration assumed dependsOn contains {"phaseId": N} dicts, but the Keboola Flow API returns raw integers ([68783]), causing 'int' object has no attribute 'get' on every orchestration. Also, phases and tasks are separate arrays in the API (configuration.phases + configuration.tasks linked by task["phase"] == phase["id"]), not nested. All 150 orchestrations now parse correctly.
Orchestrations in catalog.json — index.html loads only catalog.js and reads CATALOG.orchestrations, but the service was writing orchestrations to separate orchestrations.js/orchestrations.json files that the HTML never loaded. Moved orchestrations inside the catalog object so the dashboard displays them.
Parallel orchestration fetching — _collect_orchestrations was sequential (one API call at a time). Now uses ThreadPoolExecutor for concurrent fetching, consistent with the rest of the codebase.
Context command — added explorer to kbagent context output so AI agents know about it.

Expand TestFullE2E from 23 to 36 progressive steps: - Add init command test (create .kbagent/ in sub-directory) - Add storage upload-table --incremental (verify 8 rows after appending 3) - Add storage load-file (upload CSV as file, load into table) - Add config update --merge (deep-merge partial JSON, verify preservation) - Add config new scaffold generation (keboola.ex-http) - Add full workspace lifecycle (create/list/detail/password/load/query/delete) - Add branch merge command (verify URL response, then delete) - Add permissions list/show/check commands - Add job list --component-id filter and job detail Add TestE2ESyncWorkflow: sync init/pull/status/diff/push --dry-run in temp git repo Add TestE2EToolCommands: tool list + tool call get_buckets (skip if no MCP) Update CLAUDE.md: add convention #16 requiring E2E tests for new commands, add test_e2e.py to project structure. Total: 12 tests, ~2.5 min runtime against real API.

* test: comprehensive E2E test suite for full CLI coverage Add tests/test_e2e.py with 9 tests covering 23 progressive steps: - Project CRUD (add/list/status/edit/remove) - Storage CRUD (bucket/table create, upload CSV, download round-trip, unload, delete) - Config operations (list/detail/search/update --set/--dry-run/--configuration/delete) - Component discovery (list with --type filter, detail via AI Service) - File lifecycle (upload/list/detail/download/tag/delete with dry-run) - Encrypt values (KBC::ProjectSecure verification) - Branch lifecycle (list/create/use/reset/delete) - Sharing list + lineage show (read-only) - Error handling (invalid token, nonexistent resources, correct exit codes) - JSON consistency (all read commands return valid JSON, token never leaked) Verbose output shows exact CLI commands and abbreviated JSON responses. Resources use e2e-{timestamp} prefix with guaranteed cleanup on failure. Also: add make test-e2e target, register @pytest.mark.e2e, fix help regex to support digits in target names, exclude e2e from default make test. * test: expand E2E coverage to all CLI commands (36 steps) Expand TestFullE2E from 23 to 36 progressive steps: - Add init command test (create .kbagent/ in sub-directory) - Add storage upload-table --incremental (verify 8 rows after appending 3) - Add storage load-file (upload CSV as file, load into table) - Add config update --merge (deep-merge partial JSON, verify preservation) - Add config new scaffold generation (keboola.ex-http) - Add full workspace lifecycle (create/list/detail/password/load/query/delete) - Add branch merge command (verify URL response, then delete) - Add permissions list/show/check commands - Add job list --component-id filter and job detail Add TestE2ESyncWorkflow: sync init/pull/status/diff/push --dry-run in temp git repo Add TestE2EToolCommands: tool list + tool call get_buckets (skip if no MCP) Update CLAUDE.md: add convention #16 requiring E2E tests for new commands, add test_e2e.py to project structure. Total: 12 tests, ~2.5 min runtime against real API. * docs: add E2E test scenarios reference (docs/e2e-scenarios.md) Documents all 12 E2E tests (36 main steps) with exact commands, assertions, and coverage. Includes table of commands NOT covered with reasons (manage token, interactive prompts, side effects). * test: add Snowflake transformation job run E2E (steps 28-32) Full transformation lifecycle: - Create output bucket + Snowflake SQL transformation config via API - SQL: SELECT id, name, CAST(value AS INT), CAST(value AS INT)*2 AS doubled_value - Run transformation with job run --wait --timeout 300 - Verify job detail: status=success, isFinished=true - Download output table, verify doubled_value == value * 2 for all rows - Clean up transformation config and output bucket via CLI Brings main E2E flow from 36 to 41 steps. All 12 tests pass (~3.5 min).

Adds `--new-alias NEW` to `kbagent project edit` so users can rename a project alias without going through `project remove` + `project add` (which forces token re-entry). Adds `--dry-run` for read-only preview. Mirrors the `kbagent config rename` precedent for the on-disk part of the cascade. Cascading scope: - config.json `projects` dict key (`pop(old)` + insert under `new`) - config.json `default_project` field (when it matched the old alias) - nested-layout sync directory `<cwd>/<old-alias>/` -> `<cwd>/<new-alias>/` (with -2 collision suffix, git-mv-with-shutil-fallback) - WARNS on `*.lineage.json` -- caches embed alias FQNs and are NOT auto-rewritten (partial rewrites are worse than no rewrite) `--dry-run` (PR keboola#266 review NIT, addressed) previews collision detection, planned disk-rename method (`git_mv` vs `shutil_move`), and the lineage- cache warning without mutating any state. Validation errors raise the same `ConfigError` exit-5 codes as the live path -- callers can rely on `--dry-run` as a 1:1 pre-flight. Combined with `--url` and/or `--token` in one call, those mutations target the new alias post-rename: `kbagent project edit --project foo --new-alias bar --token NEW` is one atomic operation. Service / Command: - `commands/project.py` -- new `--new-alias` and `--dry-run` Typer options; human formatter branches on `dry_run` / `old_alias` - `services/project_service.py` -- `edit_project` accepts `new_alias`, `search_root`, `dry_run`; new helpers `_rename_project_alias`, `_validate_alias_format`, `_rename_nested_sync_dir`, `_move_directory`, `_detect_lineage_cache_warning` for the live path; `_plan_project_alias_rename` and `_plan_nested_sync_dir` for the read-only dry-run path ConfigStore: - `config_store.py` -- new `rename_project(old, new)` method (atomic dict-key swap + `default_project` cascade in one save() call) Security hardening (from review iter 2): - Validator regex `[A-Za-z0-9_][A-Za-z0-9_.-]*` plus explicit `..` rejection; rejects path traversal, NUL bytes, leading dot/dash, whitespace, and characters outside the slug alphabet. Stricter than `project add`'s no-op check; rationale is the rename's filesystem interaction (alias becomes a directory name). - `search_root` resolved via `Path.resolve()` once before the disk rename to collapse symlinks; closes a malicious-cwd vector. - Disk rename failures (`OSError`) trigger a config rollback so config and disk never end up out of sync; rollback's own failure is suppressed via `contextlib.suppress` so the original error wins. - Lineage cache scan depth-capped at 2 levels (top + `*/` + `*/*/`) to bound cost when search_root is a deep tree. Tests: 38 new (32 service + 6 CLI). Pin alias-key swap, collision rejection, default_project cascade, sync-dir disk rename, no-sync-dir no-op, sync-dir collision -2 suffix, combined edit-and-rename, no-op on same-alias-only, parametrized 9-input path-traversal validator, legal slug shapes accepted, OS failure rolls config back, rollback failure surfaces original error, symlink target collision triggers suffix bump, dry-run no-mutation happy path, dry-run collision still raises, dry-run format validation still raises, dry-run predicts disk method without touching disk, dry-run human DRY-RUN label, dry-run JSON planned-block shape. E2E: `tests/test_e2e.py::_test_project_edit_and_remove` extended with a `--dry-run` preview (planned-block assertion) followed by a live `--new-alias` round-trip (rename + reverse-rename to baseline) before the existing `--url` step. Pinned by Padak's PR keboola#266 review BLOCKING -- every CLI command must have E2E coverage per CONTRIBUTING.md / convention keboola#16. Sync map updates: - AGENT_CONTEXT (commands/context.py) -- new flags mentioned - CLAUDE.md `## All CLI Commands` -- same wording - commands-reference.md -- expanded with cascade scope + dry-run - gotchas.md -- new `(since v0.30.7)` entry on lineage cache rebuild - keboola-expert.md -- VERSION GATE clause + tool selection matrix row Live-validated against project 1143 (`99_Playground_Max`, europe-west3.gcp.keboola.com): rename to `playground` + reverse rename to baseline; nested sync dir moved on disk; default_project cascaded; all error paths produce correct ConfigError exit-5 messages. Three review iterations on the original PR: self -> independent -> convergence; zero material findings on the convergence pass. Padak's post-merge review found 1 BLOCKING (E2E coverage) + 1 NIT (`--dry-run`); both addressed in this iteration. Rebased onto upstream/main after Padak shipped 0.30.4 (keboola#268), 0.30.5 (keboola#272), and 0.30.6 (keboola#273) since the original PR opened. This PR ships as 0.30.7.

…un (#266) Adds `--new-alias NEW` to `kbagent project edit` so users can rename a project alias without going through `project remove` + `project add` (which forces token re-entry). Adds `--dry-run` for read-only preview. Mirrors the `kbagent config rename` precedent for the on-disk part of the cascade. Cascading scope: - config.json `projects` dict key (`pop(old)` + insert under `new`) - config.json `default_project` field (when it matched the old alias) - nested-layout sync directory `<cwd>/<old-alias>/` -> `<cwd>/<new-alias>/` (with -2 collision suffix, git-mv-with-shutil-fallback) - WARNS on `*.lineage.json` -- caches embed alias FQNs and are NOT auto-rewritten (partial rewrites are worse than no rewrite) `--dry-run` (PR #266 review NIT, addressed) previews collision detection, planned disk-rename method (`git_mv` vs `shutil_move`), and the lineage- cache warning without mutating any state. Validation errors raise the same `ConfigError` exit-5 codes as the live path -- callers can rely on `--dry-run` as a 1:1 pre-flight. Combined with `--url` and/or `--token` in one call, those mutations target the new alias post-rename: `kbagent project edit --project foo --new-alias bar --token NEW` is one atomic operation. Service / Command: - `commands/project.py` -- new `--new-alias` and `--dry-run` Typer options; human formatter branches on `dry_run` / `old_alias` - `services/project_service.py` -- `edit_project` accepts `new_alias`, `search_root`, `dry_run`; new helpers `_rename_project_alias`, `_validate_alias_format`, `_rename_nested_sync_dir`, `_move_directory`, `_detect_lineage_cache_warning` for the live path; `_plan_project_alias_rename` and `_plan_nested_sync_dir` for the read-only dry-run path ConfigStore: - `config_store.py` -- new `rename_project(old, new)` method (atomic dict-key swap + `default_project` cascade in one save() call) Security hardening (from review iter 2): - Validator regex `[A-Za-z0-9_][A-Za-z0-9_.-]*` plus explicit `..` rejection; rejects path traversal, NUL bytes, leading dot/dash, whitespace, and characters outside the slug alphabet. Stricter than `project add`'s no-op check; rationale is the rename's filesystem interaction (alias becomes a directory name). - `search_root` resolved via `Path.resolve()` once before the disk rename to collapse symlinks; closes a malicious-cwd vector. - Disk rename failures (`OSError`) trigger a config rollback so config and disk never end up out of sync; rollback's own failure is suppressed via `contextlib.suppress` so the original error wins. - Lineage cache scan depth-capped at 2 levels (top + `*/` + `*/*/`) to bound cost when search_root is a deep tree. Tests: 38 new (32 service + 6 CLI). Pin alias-key swap, collision rejection, default_project cascade, sync-dir disk rename, no-sync-dir no-op, sync-dir collision -2 suffix, combined edit-and-rename, no-op on same-alias-only, parametrized 9-input path-traversal validator, legal slug shapes accepted, OS failure rolls config back, rollback failure surfaces original error, symlink target collision triggers suffix bump, dry-run no-mutation happy path, dry-run collision still raises, dry-run format validation still raises, dry-run predicts disk method without touching disk, dry-run human DRY-RUN label, dry-run JSON planned-block shape. E2E: `tests/test_e2e.py::_test_project_edit_and_remove` extended with a `--dry-run` preview (planned-block assertion) followed by a live `--new-alias` round-trip (rename + reverse-rename to baseline) before the existing `--url` step. Pinned by Padak's PR #266 review BLOCKING -- every CLI command must have E2E coverage per CONTRIBUTING.md / convention #16. Sync map updates: - AGENT_CONTEXT (commands/context.py) -- new flags mentioned - CLAUDE.md `## All CLI Commands` -- same wording - commands-reference.md -- expanded with cascade scope + dry-run - gotchas.md -- new `(since v0.30.7)` entry on lineage cache rebuild - keboola-expert.md -- VERSION GATE clause + tool selection matrix row Live-validated against project 1143 (`99_Playground_Max`, europe-west3.gcp.keboola.com): rename to `playground` + reverse rename to baseline; nested sync dir moved on disk; default_project cascaded; all error paths produce correct ConfigError exit-5 messages. Three review iterations on the original PR: self -> independent -> convergence; zero material findings on the convergence pass. Padak's post-merge review found 1 BLOCKING (E2E coverage) + 1 NIT (`--dry-run`); both addressed in this iteration. Rebased onto upstream/main after Padak shipped 0.30.4 (#268), 0.30.5 (#272), and 0.30.6 (#273) since the original PR opened. This PR ships as 0.30.7.

jordanrburger and others added 3 commits February 28, 2026 18:06

Add explorer command to CLAUDE.md documentation

b7a90cc

Document explorer.py, explorer_service.py, test_explorer_service.py in project structure, CLI commands list, and dependencies. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Resolve merge conflicts in CLAUDE.md - include both explorer and doct…

5ce2000

…or entries

padak merged commit b8232fb into main Mar 1, 2026

padak deleted the feat/explorer-must-have-gaps branch March 2, 2026 23:05

ottomansky mentioned this pull request May 2, 2026

feat(0.27.1): manage-token resolution hardening (multi-stack + AI-exfil) #238

Closed

8 tasks

padak mentioned this pull request May 7, 2026

feat(0.30.7): project edit --new-alias + --dry-run (cascading rename) #266

Merged

8 tasks

padak mentioned this pull request May 17, 2026

fix(0.41.11): cascade-delete semantic-layer model children (#306) #309

Merged

7 tasks

matyas-jirat-keboola mentioned this pull request May 20, 2026

feat(0.43.6): kbagent job run --mode run|debug #321

Merged

8 tasks

This was referenced May 27, 2026

RFC 0001: autonomous agent self-service provisioning (+ Phase 0 create_project) #352

Open

Fix Snowflake workspace login type #351

Merged

feat(dev-portal): kbagent Developer Portal support with no-bypass write safety #354

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Close must-have gaps in explorer command#16

Close must-have gaps in explorer command#16
padak merged 3 commits into
mainfrom
feat/explorer-must-have-gaps

jordanrburger commented Feb 28, 2026

Uh oh!

padak commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jordanrburger commented Feb 28, 2026

Summary

Test plan

Uh oh!

padak commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants