Skip to content

Close must-have gaps in explorer command#16

Merged
padak merged 3 commits into
mainfrom
feat/explorer-must-have-gaps
Mar 1, 2026
Merged

Close must-have gaps in explorer command#16
padak merged 3 commits into
mainfrom
feat/explorer-must-have-gaps

Conversation

@jordanrburger
Copy link
Copy Markdown
Collaborator

Summary

Closes 6 must-have gaps identified in the PLAN-catalog.md spec for the kbagent explorer command:

  • YAML tier config (--tiers): Load project-to-tier mappings from a YAML file with tier descriptions and catalog description override
  • UNCLASSIFIED warning: Projects without a tier mapping now emit a TIER_UNCLASSIFIED warning instead of silently defaulting to L0
  • Schema validation: Catalog is validated against kbc-explorer/schema.json after assembly (non-blocking — errors are reported but don't prevent file writes)
  • Atomic file writes: All 4 output files use write-to-tmp + os.replace() for POSIX-atomic writes
  • Job limit default: Changed from 100 to 500 to match spec
  • Tests: 26 new tests covering pure functions (_assign_tier, _compute_job_stats, _type_icon, _build_mermaid) and service integration with mocked dependencies

Test plan

  • uv run pytest tests/test_explorer_service.py -v — 26/26 pass
  • uv run pytest tests/ -v — 566 pass, 3 skipped, no failures
  • uv run kbagent explorer --no-open — end-to-end with real projects
  • uv run kbagent explorer --tiers tiers.yaml --no-open — with tier config

Closes #15

🤖 Generated with Claude Code

jordanrburger and others added 3 commits February 28, 2026 18:06
- Add jsonschema + pyyaml dependencies
- Schema validation against kbc-explorer/schema.json (non-blocking)
- YAML tier config support (--tiers option) with project-to-tier mapping
- UNCLASSIFIED tier warning instead of silent L0 default
- Atomic file writes (write .tmp then os.replace)
- Change job limit default from 100 to 500
- Add 26 tests covering pure functions and service with mocked deps

Closes #15

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Document explorer.py, explorer_service.py, test_explorer_service.py
in project structure, CLI commands list, and dependencies.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@padak padak merged commit b8232fb into main Mar 1, 2026
@padak
Copy link
Copy Markdown
Member

padak commented Mar 1, 2026

Thanks a lot for this contribution, Jordan! Great work on the explorer command foundation.

We've merged it and made a few follow-up fixes in 3ea241f:

  1. CLI wiring — the explorer command wasn't registered in cli.py (add_typer + ExplorerService initialization were missing), so kbagent explorer wasn't showing up.

  2. Orchestration parsing bug_parse_orchestration assumed dependsOn contains {"phaseId": N} dicts, but the Keboola Flow API returns raw integers ([68783]), causing 'int' object has no attribute 'get' on every orchestration. Also, phases and tasks are separate arrays in the API (configuration.phases + configuration.tasks linked by task["phase"] == phase["id"]), not nested. All 150 orchestrations now parse correctly.

  3. Orchestrations in catalog.jsonindex.html loads only catalog.js and reads CATALOG.orchestrations, but the service was writing orchestrations to separate orchestrations.js/orchestrations.json files that the HTML never loaded. Moved orchestrations inside the catalog object so the dashboard displays them.

  4. Parallel orchestration fetching_collect_orchestrations was sequential (one API call at a time). Now uses ThreadPoolExecutor for concurrent fetching, consistent with the rest of the codebase.

  5. Context command — added explorer to kbagent context output so AI agents know about it.

@padak padak deleted the feat/explorer-must-have-gaps branch March 2, 2026 23:05
padak added a commit that referenced this pull request Apr 13, 2026
Expand TestFullE2E from 23 to 36 progressive steps:
- Add init command test (create .kbagent/ in sub-directory)
- Add storage upload-table --incremental (verify 8 rows after appending 3)
- Add storage load-file (upload CSV as file, load into table)
- Add config update --merge (deep-merge partial JSON, verify preservation)
- Add config new scaffold generation (keboola.ex-http)
- Add full workspace lifecycle (create/list/detail/password/load/query/delete)
- Add branch merge command (verify URL response, then delete)
- Add permissions list/show/check commands
- Add job list --component-id filter and job detail

Add TestE2ESyncWorkflow: sync init/pull/status/diff/push --dry-run in temp git repo
Add TestE2EToolCommands: tool list + tool call get_buckets (skip if no MCP)

Update CLAUDE.md: add convention #16 requiring E2E tests for new commands,
add test_e2e.py to project structure.

Total: 12 tests, ~2.5 min runtime against real API.
padak added a commit that referenced this pull request Apr 14, 2026
* test: comprehensive E2E test suite for full CLI coverage

Add tests/test_e2e.py with 9 tests covering 23 progressive steps:
- Project CRUD (add/list/status/edit/remove)
- Storage CRUD (bucket/table create, upload CSV, download round-trip, unload, delete)
- Config operations (list/detail/search/update --set/--dry-run/--configuration/delete)
- Component discovery (list with --type filter, detail via AI Service)
- File lifecycle (upload/list/detail/download/tag/delete with dry-run)
- Encrypt values (KBC::ProjectSecure verification)
- Branch lifecycle (list/create/use/reset/delete)
- Sharing list + lineage show (read-only)
- Error handling (invalid token, nonexistent resources, correct exit codes)
- JSON consistency (all read commands return valid JSON, token never leaked)

Verbose output shows exact CLI commands and abbreviated JSON responses.
Resources use e2e-{timestamp} prefix with guaranteed cleanup on failure.

Also: add make test-e2e target, register @pytest.mark.e2e, fix help regex
to support digits in target names, exclude e2e from default make test.

* test: expand E2E coverage to all CLI commands (36 steps)

Expand TestFullE2E from 23 to 36 progressive steps:
- Add init command test (create .kbagent/ in sub-directory)
- Add storage upload-table --incremental (verify 8 rows after appending 3)
- Add storage load-file (upload CSV as file, load into table)
- Add config update --merge (deep-merge partial JSON, verify preservation)
- Add config new scaffold generation (keboola.ex-http)
- Add full workspace lifecycle (create/list/detail/password/load/query/delete)
- Add branch merge command (verify URL response, then delete)
- Add permissions list/show/check commands
- Add job list --component-id filter and job detail

Add TestE2ESyncWorkflow: sync init/pull/status/diff/push --dry-run in temp git repo
Add TestE2EToolCommands: tool list + tool call get_buckets (skip if no MCP)

Update CLAUDE.md: add convention #16 requiring E2E tests for new commands,
add test_e2e.py to project structure.

Total: 12 tests, ~2.5 min runtime against real API.

* docs: add E2E test scenarios reference (docs/e2e-scenarios.md)

Documents all 12 E2E tests (36 main steps) with exact commands,
assertions, and coverage. Includes table of commands NOT covered
with reasons (manage token, interactive prompts, side effects).

* test: add Snowflake transformation job run E2E (steps 28-32)

Full transformation lifecycle:
- Create output bucket + Snowflake SQL transformation config via API
- SQL: SELECT id, name, CAST(value AS INT), CAST(value AS INT)*2 AS doubled_value
- Run transformation with job run --wait --timeout 300
- Verify job detail: status=success, isFinished=true
- Download output table, verify doubled_value == value * 2 for all rows
- Clean up transformation config and output bucket via CLI

Brings main E2E flow from 36 to 41 steps. All 12 tests pass (~3.5 min).
ottomansky added a commit to ottomansky/keboola-agent-cli that referenced this pull request May 8, 2026
Adds `--new-alias NEW` to `kbagent project edit` so users can rename a
project alias without going through `project remove` + `project add`
(which forces token re-entry). Adds `--dry-run` for read-only preview.
Mirrors the `kbagent config rename` precedent for the on-disk part of
the cascade.

Cascading scope:
- config.json `projects` dict key (`pop(old)` + insert under `new`)
- config.json `default_project` field (when it matched the old alias)
- nested-layout sync directory `<cwd>/<old-alias>/` -> `<cwd>/<new-alias>/`
  (with -2 collision suffix, git-mv-with-shutil-fallback)
- WARNS on `*.lineage.json` -- caches embed alias FQNs and are NOT
  auto-rewritten (partial rewrites are worse than no rewrite)

`--dry-run` (PR keboola#266 review NIT, addressed) previews collision detection,
planned disk-rename method (`git_mv` vs `shutil_move`), and the lineage-
cache warning without mutating any state. Validation errors raise the
same `ConfigError` exit-5 codes as the live path -- callers can rely on
`--dry-run` as a 1:1 pre-flight.

Combined with `--url` and/or `--token` in one call, those mutations
target the new alias post-rename: `kbagent project edit --project foo
--new-alias bar --token NEW` is one atomic operation.

Service / Command:
- `commands/project.py` -- new `--new-alias` and `--dry-run` Typer
  options; human formatter branches on `dry_run` / `old_alias`
- `services/project_service.py` -- `edit_project` accepts `new_alias`,
  `search_root`, `dry_run`; new helpers `_rename_project_alias`,
  `_validate_alias_format`, `_rename_nested_sync_dir`, `_move_directory`,
  `_detect_lineage_cache_warning` for the live path; `_plan_project_alias_rename`
  and `_plan_nested_sync_dir` for the read-only dry-run path

ConfigStore:
- `config_store.py` -- new `rename_project(old, new)` method (atomic
  dict-key swap + `default_project` cascade in one save() call)

Security hardening (from review iter 2):
- Validator regex `[A-Za-z0-9_][A-Za-z0-9_.-]*` plus explicit `..`
  rejection; rejects path traversal, NUL bytes, leading dot/dash,
  whitespace, and characters outside the slug alphabet. Stricter than
  `project add`'s no-op check; rationale is the rename's filesystem
  interaction (alias becomes a directory name).
- `search_root` resolved via `Path.resolve()` once before the disk
  rename to collapse symlinks; closes a malicious-cwd vector.
- Disk rename failures (`OSError`) trigger a config rollback so config
  and disk never end up out of sync; rollback's own failure is
  suppressed via `contextlib.suppress` so the original error wins.
- Lineage cache scan depth-capped at 2 levels (top + `*/` + `*/*/`)
  to bound cost when search_root is a deep tree.

Tests: 38 new (32 service + 6 CLI). Pin alias-key swap, collision
rejection, default_project cascade, sync-dir disk rename, no-sync-dir
no-op, sync-dir collision -2 suffix, combined edit-and-rename, no-op
on same-alias-only, parametrized 9-input path-traversal validator,
legal slug shapes accepted, OS failure rolls config back, rollback
failure surfaces original error, symlink target collision triggers
suffix bump, dry-run no-mutation happy path, dry-run collision still
raises, dry-run format validation still raises, dry-run predicts disk
method without touching disk, dry-run human DRY-RUN label, dry-run
JSON planned-block shape.

E2E: `tests/test_e2e.py::_test_project_edit_and_remove` extended with
a `--dry-run` preview (planned-block assertion) followed by a live
`--new-alias` round-trip (rename + reverse-rename to baseline) before
the existing `--url` step. Pinned by Padak's PR keboola#266 review BLOCKING --
every CLI command must have E2E coverage per CONTRIBUTING.md /
convention keboola#16.

Sync map updates:
- AGENT_CONTEXT (commands/context.py) -- new flags mentioned
- CLAUDE.md `## All CLI Commands` -- same wording
- commands-reference.md -- expanded with cascade scope + dry-run
- gotchas.md -- new `(since v0.30.7)` entry on lineage cache rebuild
- keboola-expert.md -- VERSION GATE clause + tool selection matrix row

Live-validated against project 1143 (`99_Playground_Max`,
europe-west3.gcp.keboola.com): rename to `playground` + reverse rename
to baseline; nested sync dir moved on disk; default_project cascaded;
all error paths produce correct ConfigError exit-5 messages.

Three review iterations on the original PR: self -> independent ->
convergence; zero material findings on the convergence pass. Padak's
post-merge review found 1 BLOCKING (E2E coverage) + 1 NIT (`--dry-run`);
both addressed in this iteration.

Rebased onto upstream/main after Padak shipped 0.30.4 (keboola#268),
0.30.5 (keboola#272), and 0.30.6 (keboola#273) since the original PR opened.
This PR ships as 0.30.7.
padak pushed a commit that referenced this pull request May 11, 2026
…un (#266)

Adds `--new-alias NEW` to `kbagent project edit` so users can rename a
project alias without going through `project remove` + `project add`
(which forces token re-entry). Adds `--dry-run` for read-only preview.
Mirrors the `kbagent config rename` precedent for the on-disk part of
the cascade.

Cascading scope:
- config.json `projects` dict key (`pop(old)` + insert under `new`)
- config.json `default_project` field (when it matched the old alias)
- nested-layout sync directory `<cwd>/<old-alias>/` -> `<cwd>/<new-alias>/`
  (with -2 collision suffix, git-mv-with-shutil-fallback)
- WARNS on `*.lineage.json` -- caches embed alias FQNs and are NOT
  auto-rewritten (partial rewrites are worse than no rewrite)

`--dry-run` (PR #266 review NIT, addressed) previews collision detection,
planned disk-rename method (`git_mv` vs `shutil_move`), and the lineage-
cache warning without mutating any state. Validation errors raise the
same `ConfigError` exit-5 codes as the live path -- callers can rely on
`--dry-run` as a 1:1 pre-flight.

Combined with `--url` and/or `--token` in one call, those mutations
target the new alias post-rename: `kbagent project edit --project foo
--new-alias bar --token NEW` is one atomic operation.

Service / Command:
- `commands/project.py` -- new `--new-alias` and `--dry-run` Typer
  options; human formatter branches on `dry_run` / `old_alias`
- `services/project_service.py` -- `edit_project` accepts `new_alias`,
  `search_root`, `dry_run`; new helpers `_rename_project_alias`,
  `_validate_alias_format`, `_rename_nested_sync_dir`, `_move_directory`,
  `_detect_lineage_cache_warning` for the live path; `_plan_project_alias_rename`
  and `_plan_nested_sync_dir` for the read-only dry-run path

ConfigStore:
- `config_store.py` -- new `rename_project(old, new)` method (atomic
  dict-key swap + `default_project` cascade in one save() call)

Security hardening (from review iter 2):
- Validator regex `[A-Za-z0-9_][A-Za-z0-9_.-]*` plus explicit `..`
  rejection; rejects path traversal, NUL bytes, leading dot/dash,
  whitespace, and characters outside the slug alphabet. Stricter than
  `project add`'s no-op check; rationale is the rename's filesystem
  interaction (alias becomes a directory name).
- `search_root` resolved via `Path.resolve()` once before the disk
  rename to collapse symlinks; closes a malicious-cwd vector.
- Disk rename failures (`OSError`) trigger a config rollback so config
  and disk never end up out of sync; rollback's own failure is
  suppressed via `contextlib.suppress` so the original error wins.
- Lineage cache scan depth-capped at 2 levels (top + `*/` + `*/*/`)
  to bound cost when search_root is a deep tree.

Tests: 38 new (32 service + 6 CLI). Pin alias-key swap, collision
rejection, default_project cascade, sync-dir disk rename, no-sync-dir
no-op, sync-dir collision -2 suffix, combined edit-and-rename, no-op
on same-alias-only, parametrized 9-input path-traversal validator,
legal slug shapes accepted, OS failure rolls config back, rollback
failure surfaces original error, symlink target collision triggers
suffix bump, dry-run no-mutation happy path, dry-run collision still
raises, dry-run format validation still raises, dry-run predicts disk
method without touching disk, dry-run human DRY-RUN label, dry-run
JSON planned-block shape.

E2E: `tests/test_e2e.py::_test_project_edit_and_remove` extended with
a `--dry-run` preview (planned-block assertion) followed by a live
`--new-alias` round-trip (rename + reverse-rename to baseline) before
the existing `--url` step. Pinned by Padak's PR #266 review BLOCKING --
every CLI command must have E2E coverage per CONTRIBUTING.md /
convention #16.

Sync map updates:
- AGENT_CONTEXT (commands/context.py) -- new flags mentioned
- CLAUDE.md `## All CLI Commands` -- same wording
- commands-reference.md -- expanded with cascade scope + dry-run
- gotchas.md -- new `(since v0.30.7)` entry on lineage cache rebuild
- keboola-expert.md -- VERSION GATE clause + tool selection matrix row

Live-validated against project 1143 (`99_Playground_Max`,
europe-west3.gcp.keboola.com): rename to `playground` + reverse rename
to baseline; nested sync dir moved on disk; default_project cascaded;
all error paths produce correct ConfigError exit-5 messages.

Three review iterations on the original PR: self -> independent ->
convergence; zero material findings on the convergence pass. Padak's
post-merge review found 1 BLOCKING (E2E coverage) + 1 NIT (`--dry-run`);
both addressed in this iteration.

Rebased onto upstream/main after Padak shipped 0.30.4 (#268),
0.30.5 (#272), and 0.30.6 (#273) since the original PR opened.
This PR ships as 0.30.7.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement kbagent catalog generate command

2 participants