feat(agent-studio): Phase 1 scaffold — Playbooks + Blueprints#329
Draft
padak wants to merge 9 commits into
Draft
feat(agent-studio): Phase 1 scaffold — Playbooks + Blueprints#329padak wants to merge 9 commits into
padak wants to merge 9 commits into
Conversation
Adds docs/agents-review.md -- structured review of the Agent Teams PRD (docs/agents.md, merged in PR #305) with findings classified by severity (blocking / non-blocking / nit), a "Stable API surface" gap analysis against the current 24-router serve layer, and a recommended edit sequence. Companion document to the PRD, used as the basis for the upcoming personal-AI-agents feature work on this branch.
…ss tracker Lays the documentation foundation for the Agent Studio Phase 1 effort: - docs/agents-v2.md — Playbook-first PRD that supersedes the heavyweight Team/Role/WorkItem v1. Closes every blocking finding from docs/agents-review.md: budget caps in MVP, scoped per-run JWTs, stable API contract, body_hash + 5s undo on external_send, untrusted-content wrapping, expires_at + scope on approvals. - docs/agent-studio-design-system.md — canonical NERD UI spec, single source of truth for visual contract. Light mode primary, dark secondary. Reference implementation runs at http://127.0.0.1:8001/ via `kbagent serve --ui`. - docs/mockups/ — 6 light primary screens (conditioning approach via Playwright reference + nano-banana edit mode) + 6 dark secondary backups. README documents the regen workflow. - docs/agent-studio-progress.md — persistent cross-session tracker for the Phase 1 build. New chat sessions can pick up from this file. Customer-validated workflow (product-cost-allocation Solution) drove five v2 updates: - §9.3 xlsx-renderer added to first-party tools - §18 6th Solution product-cost-allocation (Finance Ops) with full spec - §21 Phase 2 promoted basic view scoping (created_by + allowed_users) - §21 Phase 1 acceptance criterion now includes the controller-handoff scenario - §24 Open Q #5 split (view scoping = Phase 2 done, approval routing still Phase 5+) - §26 Appendix E "Deployment Patterns" added (local / single-server shared-team / future SaaS)
…+ /v1 router + UI library page First vertical slice of the Agent Studio Phase 1 plan from docs/agents-v2.md § 21. Goal of the slice: user opens `kbagent serve --ui`, clicks "Playbooks" in the sidebar, sees a library of Playbook cards loaded from real YAML files on disk, can create a new draft. No run loop yet, no Tool Broker yet — those land in their own slices. Backend (src/keboola_agent_cli/agent_studio/): - models/playbook.py: minimal Pydantic shape per § 7 (id, name, description, revision, enabled, status, timestamps + opaque placeholders for connections/skills/plugins/triggers so the on-disk YAML stays forward-compatible with later slices). - storage.py: YAML load/save with 0600 file perms + 0700 dir, atomic temp-then-rename writes, corrupt-YAML tolerant list(). - routers/agent_studio_playbooks.py: GET list / GET detail / POST create / DELETE under `/v1/agent-studio/playbooks` (the stable surface defined in § 19.2). Server stamps id/timestamps so the client cannot smuggle them in. - server/__init__.py: register the router in `create_app`. Tests (tests/test_playbook_*.py — 27 new tests): - test_playbook_model.py: Pydantic validation + status enum + Summary projection. - test_playbook_storage.py: 0600 perms, 0700 dir, round-trip, atomic write, corrupt-YAML skip, deterministic sort. - test_playbook_router.py: auth, CRUD round-trip, 404s, OpenAPI registration guard. Frontend (web/frontend/src/): - pages/Playbooks.tsx: library page wired to /v1/agent-studio/playbooks with the 3-col `.nerd-card` grid + TwoPathEmpty empty state + "New Playbook" modal. Renders the design system primitives only (.nerd-card, .nerd-btn, .nerd-input, .nerd-pill-*) — no new CSS. - state.tsx: extend BuiltinPageId with "playbooks". - layout/Sidebar.tsx: add Playbooks under AI / Tools (BookOpen icon). - App.tsx: add `case "playbooks"` to the Router switch. Also (related but coupled): - web/frontend/index.html: anti-FOUC bootstrap defaults to light per the design system pivot (a user whose OS pref is dark still lands in dark). - web/frontend/src/apps/: lands the in-flight dynamic-apps registry that Sidebar.tsx + App.tsx + state.tsx now depend on — required to compile, would otherwise break tsc with missing AppPageId / findApp / isAppPageId / slugFromAppPageId symbols. Sample app (morning-brief) keeps the registry exercised. What's NOT in this commit (deliberately): - Tool Broker, scoped JWTs, budget enforcer, approval queue, untrusted-content wrapping — all queued for follow-up slices. - A "Blueprints" page — phase 2. - Sample Playbook YAMLs — the empty-state TwoPathEmpty is the on-ramp; pre-shipping fake data felt worse for first-run UX. - Migration of existing AgentTask state — `AgentTask` keeps the full bearer token (§ 23), Playbook runs will get scoped JWTs when the run loop wires up. ruff check + ruff format + ty check + pytest all green (27 new tests + existing server smoke).
The progress tracker now reflects what landed in the two prior commits (scaffold + design docs) and explicitly enumerates the next 9 slices that turn the scaffold into "Phase 1 acceptance criteria met" per docs/agents-v2.md § 21. Order of the next 9 is "what unblocks the most downstream work": the Playbook detail Drawer (frontend-only, easy first follow-up) ahead of the run loop ahead of Tool Broker → budget enforcer → approval queue → untrusted wrapping → skill loader → connection discovery → data-cleanup plugin.
Wires PlaybookCard onto a right-side Drawer that fetches the full
Playbook from GET /v1/agent-studio/playbooks/{id}. Body sections
mirror the Pydantic shape from docs/agents-v2.md § 7:
- Status pill + enabled/disabled pill
- Description (italic placeholder when null)
- Connections / Skills / Plugins lists rendered as outlined
mono pills, with an italic "None — set in a later slice"
empty state.
- Triggers list rendered as .nerd-code JSON blocks so the opaque
config shape (typed in Phase 2) is still legible.
- Created / Updated timestamps localised via toLocaleString,
keeping UTC ISO-8601 on disk for audit consumers.
Drawer actions surface a Delete button (red-on-hover) that pops a
two-step confirm modal — the modal calls out the on-disk path so
the user knows exactly what the destructive operation touches.
Creating a new Playbook now auto-opens its drawer (was: drop user
back on the library and make them find the new row).
No backend changes — GET detail + DELETE endpoints already exist
from the Phase 1 scaffold. tsc clean, 27 backend tests still green.
Progress tracker now points at slice 2 (run loop tied into server/agent_runner.py) as the next priority.
…s + UI Run button
Slice 2.a of the run loop. The "Run" button now produces a real
(stub) run record the user can see; real subprocess execution lands
in slice 2.b.
Backend:
- models/playbook_run.py: minimal PlaybookRun (id, playbook_id,
playbook_revision, status, started_at, ended_at, summary,
objective_override). Cost/token/workspace/SSE-log fields per § 7
arrive with the real run loop.
- storage.py: generalised the YAML load/save helpers (_safe_load is
now generic over the model type via PEP 695 `[T: BaseModel]`),
added runs_dir / list_runs (newest-first, optional playbook_id
filter) / get_run / save_run. Same 0600 file + 0700 dir perms.
- routers/agent_studio_playbooks.py: POST /{id}/run — stub that
creates a run, marks it `done` immediately with a clear summary,
propagates an optional objective_override from the body.
- routers/agent_studio_runs.py: GET /v1/agent-studio/runs
[?playbook_id=X] + GET /v1/agent-studio/runs/{run_id}.
- server/__init__.py: register the runs router.
- agent_studio + models __init__: export PlaybookRun (the exports
were missed in the scaffold commit because the Write hit a
not-yet-read guard; functionality was unaffected since storage +
routers import the concrete module path directly).
Frontend (Playbooks.tsx):
- Drawer header gains a Run button (keboola-hover) beside Delete.
- New Recent Runs section in the drawer body: status pill + short
run id + start time + computed duration, truncated to the last 5
with a "+ N earlier runs" marker pointing at the future Past Jobs
tab. Polls every 10s like the library.
- Running a Playbook invalidates both the run list and the library
query so the card status pill stays in sync.
Tests: +21 (model 4, storage 7, router 10). Full Playbook + run +
server smoke suite = 63 green. ruff + ty + tsc clean.
Infrastructure for use-case-specific apps that run inside `kbagent serve --ui`, alongside the morning-brief reference app. - OpenAPI type pipeline: scripts/dump_openapi.py builds the FastAPI app in-memory (no uvicorn boot) and dumps the schema; `make web-gen-types` feeds it to openapi-typescript -> web/frontend/src/api/generated.ts. `make web-types-check` guards drift the same way skill-check does. - web/frontend/src/api/ai.ts: askLocalAi() wraps POST /ai/chat/stream. Apps default to the user's local claude/codex/gemini install -- no master token, unlike hosted Kai (/kai/ask), which most users lack. - apps/type-inspector: reference Inspector-archetype app. Profiles a Storage table per column (null %, distinct, inferred type, samples), asks the local AI for native-type proposals, approve/edit per column. The destructive table-swap step is a documented Playbook stub -- apps produce the typed column list; a Playbook executes the swap. - build-app-over-kbagent-serve.md skill: the guide an AI agent reads to scaffold a new app -- apps/ convention, NERD UI primitives, typed client usage, local-AI invocation patterns, and the gotchas hit while building (response envelopes, vite-env.d.ts, --ui-dist override). Tests: 18 vitest cases (profile + ai_parse). TypeScript clean.
Slice 1.4. Second visible surface, matching
docs/mockups/02-blueprints-catalog.png.
Backend:
- models/blueprint.py: Blueprint shape (id, name, category,
description, systems, connections, skills, plugins) +
BLUEPRINT_CATEGORIES tuple driving the filter chips.
- blueprints_catalog.py: static in-code seed of the 9 designed
cards. v2 §11/§12 wants these as YAML data files for a
marketplace eventually; in-code keeps Phase 1 dependency-free
and uncorruptable. list_blueprints(category) + get_blueprint(id).
- routers/agent_studio_blueprints.py:
GET /v1/agent-studio/blueprints[?category=X],
GET /{id}, POST /{id}/fork. Fork mints a draft Playbook prefilled
with the blueprint's connections/skills/plugins (the parts the
current Playbook model can carry; SOP/budget/approval arrive when
those substructures exist).
- server/__init__.py: register the blueprints router.
Frontend:
- pages/Blueprints.tsx: category filter row (active chip =
keboola-green outline) + search + 3-col card grid. "Use this
blueprint" forks then navigates to the Playbooks library.
- state.tsx / Sidebar.tsx / App.tsx: new "blueprints" PageId,
sidebar entry (LayoutGrid icon, under AI / Tools after Playbooks),
route.
- Playbooks.tsx: the empty-state "Browse Blueprints" button is now
wired to navigate (was disabled "Phase 2" placeholder).
Tests: +16 (catalogue 8, router/fork 8). Full agent-studio + smoke
suite = 79 green. ruff + ty + tsc + vite build all clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Draft. First, reviewable chunk of Agent Studio — the Playbook-first
agentic surface for
kbagent serve. Ships the documentation foundationplus four vertical UI/API slices. Run loop, Tool Broker, and governance
are deliberately out of scope here and tracked in #327.
Documentation foundation
docs/agents-v2.md— Playbook-first PRD that supersedes theheavyweight Team/Role/WorkItem v1. Closes every blocking finding from
docs/agents-review.md(budget caps in MVP, scoped per-run JWTs,stable
/v1API contract,body_hash+ 5s undo, untrusted wrapping,approval
expires_at/scope).docs/agent-studio-design-system.md— canonical NERD UI spec(light mode primary, dark secondary). Single source of truth for the
visual contract; the new pages reuse only existing
.nerd-*primitives.
docs/mockups/— 6 light primary screens + 6 dark backups.docs/agent-studio-progress.md— cross-session build tracker.Code slices
1 — Scaffold
agent_studio/models/playbook.py: minimal Playbook shape (§ 7).agent_studio/storage.py: YAML persistence under<config_dir>/playbooks/,0600files /0700dir, atomictemp-then-rename, corrupt-YAML-tolerant list.
/v1/agent-studio/playbooksCRUD router, registered increate_app..nerd-cardgrid, TwoPathEmpty empty state, New Playbook modal.1.2 — Detail Drawer
skills / plugins / triggers (JSON) / timestamps. Two-step Delete.
2.a — PlaybookRun stub
PlaybookRunmodel +runs/storage.POST /{id}/runcreates a runand marks it
doneimmediately (no real execution yet — proves thedata flow).
GET /runs[?playbook_id=X]+GET /runs/{id}. Drawergains a Run button + Recent Runs section.
2.5 — Blueprints catalogue
GET /blueprints[?category=X],GET /{id},POST /{id}/fork(mints a draft Playbook). Blueprints page with category filter +
search; "Use this blueprint" forks and navigates to the library.
Out of scope (tracked in #327)
Real run loop (subprocess via
agent_runner.py), Tool Broker + scopedJWTs, budget enforcer, approval queue +
body_hash+ 5s undo,untrusted-content wrapping, skill loader, connection auto-discovery,
data-cleanupnative plugin. Also: live browser QA of the threenew surfaces (so far verified via HTTP TestClient + tsc + vite build
only).
Test plan
router / run / blueprint / fork)
ruff check+ruff format --check+ty checkcleantsc --noEmit+vite buildcleanCloses nothing yet — keeps #327 open as the umbrella for the rest of
Phase 1.
apps/ scaffolding (use-case apps over kbagent serve)
Infrastructure for AI-generated, use-case-specific apps that run inside
kbagent serve --ui, separate from the Playbook runtime:scripts/dump_openapi.pybuilds theFastAPI app in-memory and dumps the schema;
make web-gen-typesfeedsit to openapi-typescript →
web/frontend/src/api/generated.ts.make web-types-checkguards drift.apps/convention — drop an app underweb/frontend/src/apps/<slug>/,export an
AppManifest;_registry.tsx(import.meta.glob) wires itinto Router + Sidebar automatically. No manual routing.
api/ai.tsaskLocalAi()wrapsPOST /ai/chat/stream(claude/codex/gemini). Apps default to theuser's local CLI — no master token, unlike hosted Kai.
morning-brief(Dashboard archetype —cross-project job cost outliers) and
type-inspector(Inspectorarchetype — per-column profiling + AI type proposals + Playbook stub).
build-app-over-kbagent-serve.md— guide for an AI agent toscaffold a new app (conventions, NERD UI primitives, gotchas).
Tests: morning-brief compute (8) + type-inspector profile/ai_parse (18).