feat(data-app): managed git-repo deploy end-to-end (--use-managed-git-repo, git-bind-credential, runs)#455
Conversation
…-repo, git-bind-credential, runs)
Deploy a data app from a Keboola-MANAGED git repository, not just an
external one, and make the managed deploy path actually start on stacks
that do not inject managed-repo credentials.
New:
- `data-app create --use-managed-git-repo` provisions an empty
Keboola-hosted repo (POST useManagedGitRepo:true), writes no git block,
forces --no-deploy; mutually exclusive with --git-repo and all
--git-*/PAT flags.
- `data-app git-bind-credential` mints an http_token ON the app, encrypts
it under the project KMS, and writes parameters.dataApp.git (repository
+ placeholder username + encrypted #password + branch) so the runtime
can clone. The token is encrypted in place and never printed.
- `data-app runs` lists deployment attempts with failure_reason +
startup_logs (GET /apps/{id}/runs), incl. setup-phase failures with no
container logs; works on never-started/failed apps where `logs` 400s.
Fix:
- `data-app deploy` resolves configVersion by source location: pins the
latest Storage version when a git block is present (external AND
credential-wired managed), omits it only for a pure managed repo
(deploys from managedGitRepoId). Previously it always pinned, which
pointed managed deploys at a config snapshot with no git source and
made them silently revert to stopped.
UX:
- `data-app deploy --wait` now auto-surfaces the latest run's
failure_reason on timeout/error (best-effort), with a
git-bind-credential hint for managed clone-auth failures.
Verified live: a tic-tac-toe python-js app deploys and serves from its
managed repo on us-east4.gcp. The cross-stack credential-injection gap is
tracked in #454.
Layered across data_science_client.py (create_app useManagedGitRepo,
list_app_runs), data_app_service.py (managed create branch,
bind_managed_credential, list_app_runs, deploy configVersion logic,
failure diagnostic), commands/data_app.py + commands/_data_app_git.py
(flags + runs + git-bind-credential), permissions.py registry. Tests +
full doc-sync + version bump 0.65.0 + changelog.
There was a problem hiding this comment.
Devin Review found 4 potential issues.
🐛 3 issues in files not directly in the diff
🐛 REST API DataAppCreate model and endpoint omit use_managed_git_repo, breaking managed-repo creates via kbagent serve (src/keboola_agent_cli/server/routers/data_apps.py:38-57)
The DataAppCreate Pydantic model in server/routers/data_apps.py:38-57 does not include a use_managed_git_repo field, and the create endpoint at server/routers/data_apps.py:116-137 never passes use_managed_git_repo to registry.data_app.create_data_app(). This means REST API callers (Web UI, scheduled agents, CI pipelines) cannot create managed-repo data apps at all — the service layer's validation will reject the call because neither git_repo nor use_managed_git_repo is truthy. The changelog entry at src/keboola_agent_cli/changelog.py:59 explicitly claims "All of the above are mirrored on the kbagent serve REST API and the Python SDK." CONTRIBUTING.md mandates 1:1 CLI-to-REST parity.
⚠️ Missing REST endpoint for data-app runs command (src/keboola_agent_cli/server/routers/data_apps.py:384)
The new data-app runs CLI command (commands/data_app.py:767-825) lists deployment attempts with failure reasons, but there is no corresponding route in server/routers/data_apps.py. CONTRIBUTING.md mandates 1:1 CLI-to-REST parity: "every command in a group has a matching endpoint in that group's router... If you add a new command, add the corresponding route." The changelog at src/keboola_agent_cli/changelog.py:59 claims REST mirroring. REST callers (monitoring dashboards, scheduled agents) cannot query deploy failure reasons via the API.
⚠️ Missing REST endpoint for data-app git-bind-credential command (src/keboola_agent_cli/server/routers/data_apps.py:384)
The new data-app git-bind-credential CLI command (commands/_data_app_git.py:345-395) wires a managed-repo credential into a data app's config, but there is no corresponding route in server/routers/data_apps.py. CONTRIBUTING.md mandates 1:1 CLI-to-REST parity. The changelog at src/keboola_agent_cli/changelog.py:59 claims REST mirroring. This is a critical step in the managed-repo deploy flow — without a REST endpoint, external applications cannot complete the managed-repo onboarding sequence via the API.
View 1 additional finding in Devin Review.
| # Mint a credential ON the app's managed repo. The one-time secret is | ||
| # consumed immediately by the encryption step below and never returned. | ||
| credential = ds_client.create_git_credential( | ||
| app_id, | ||
| type_="http_token", | ||
| permissions=permissions, | ||
| name="kbagent-managed-deploy", | ||
| ) | ||
| secret = str(credential.get("secret") or "") | ||
| if not secret: | ||
| raise KeboolaApiError( | ||
| message=( | ||
| "create_git_credential returned no one-time secret for the " | ||
| "http_token; cannot wire the managed-repo credential." | ||
| ), | ||
| status_code=500, | ||
| error_code=ErrorCode.API_ERROR, | ||
| retryable=False, | ||
| ) | ||
|
|
||
| git_block = self._build_git_block( | ||
| alias=alias, | ||
| git_repo=https_url, | ||
| git_branch=branch, | ||
| git_public=False, | ||
| git_username=MANAGED_GIT_CREDENTIAL_USERNAME, | ||
| git_pat_plaintext=secret, | ||
| git_pat_encrypted=None, | ||
| ) |
There was a problem hiding this comment.
🚩 bind_managed_credential does not clean up the minted credential on subsequent failures
In bind_managed_credential (data_app_service.py:917-964), after minting a credential via create_git_credential (line 919), if the encryption step (_build_git_block, line 937) or the config update (update_config, line 958) fails, the minted credential remains on the app's managed repo but is never written to the config. The one-time secret is lost and cannot be retrieved again. The credential becomes orphaned. Unlike create_data_app which has cleanup-in-finally for the shell, bind_managed_credential has no rollback for the minted credential. In practice, re-running git-bind-credential would mint a fresh credential (the old orphan is harmless but leaks a credential slot). This is acceptable for an MVP but worth documenting as a known limitation.
Was this helpful? React with 👍 or 👎 to provide feedback.
Address review findings on the managed-repo feature:
- REST parity (CONTRIBUTING.md 1:1): the `kbagent serve` data-apps router
now exposes the full managed flow. `DataAppCreate` gains
`use_managed_git_repo` (and `git_repo` is optional); add
`GET /{p}/{app}/runs` and `POST /{p}/{app}/git-repo/bind-credential`.
Without these, REST callers (Web UI, scheduled agents, CI) could not
create or finish a managed-repo app, contradicting the changelog's
"mirrored on the serve REST API" claim.
- Changelog: drop the inaccurate "Python SDK" mirror claim (the SDK has
no data-app surface); state the exact REST routes instead.
- deploy_data_app + bind_managed_credential: parse `configuration`
defensively via a shared `_coerce_config_dict` helper (handles a
JSON-string echo), matching the existing pattern in get_data_app.
- patch_app docstring: document that configVersion is omitted for pure
managed-repo deploys (the §9-trio exception), so it does not read as a bug.
- bind_managed_credential docstring: document the mint-then-write orphan
window (no credential rollback; re-run mints fresh).
Tests: 3 new serve-router tests (create managed flag, runs, bind-credential).
padak
left a comment
There was a problem hiding this comment.
Review of #455 — feat(data-app): managed git-repo deploy end-to-end (--use-managed-git-repo, git-bind-credential, runs)
Generated by
kbagent-pr-reviewersubagent. Verdict and findings below
are advisory; the human author retains every veto. CI-coverable issues
(lint, format, tests) are confirmed viamake check, not duplicated here.
Summary
This PR closes the managed-repo data-app deploy gap end-to-end: data-app create --use-managed-git-repo provisions an empty Keboola-hosted repo, data-app git-bind-credential wires the encrypted deploy credential into the Storage config, and data-app runs surfaces setup-phase failures that data-app logs cannot reach. The deploy configVersion logic is fixed so pure managed repos (no git block) omit the version pin. All three new commands are registered in OPERATION_REGISTRY, documented across every mandatory plugin-sync surface, and covered by service/CLI/router unit tests. make check passes with 4,160 tests green. The claimed live verification (tic-tac-toe on us-east4.gcp) is stated in the PR description and cannot be independently reproduced by this reviewer (no matching project credentials). One NON-BLOCKING file-size budget violation and two NON-BLOCKING gaps (missing --dry-run on a write command, missing E2E tests) are noted below. Verdict: APPROVE (zero BLOCKING findings).
Verdict
- Verdict: APPROVE
- Blocking findings: 0
- Non-blocking findings: 3
- Nits: 1
Blocking findings
(none)
Non-blocking findings
[NB-1] src/keboola_agent_cli/services/data_app_service.py — file exceeds hard ceiling (2,470 LOC; hard cap is 1,500)
data_app_service.py is at 2,470 LOC after this PR, adding a net ~388 lines on top of a pre-PR baseline of ~2,082. The hard ceiling per CONTRIBUTING.md "File-size budgets" is 1,500 LOC for services/. The PR is not the root cause (the file was already at 2,082 before this change) but it materially extends an already-overflowed file without splitting.
The PR should note this as a follow-up item. The file mixes orchestration (create_data_app, deploy_data_app, bind_managed_credential) with a set of pure helper/parser concerns (_validate_create_inputs, _build_git_block, _build_storage_config_body, _build_dry_run_payload, _coerce_config_dict, _deploy_failure_diagnostic, various _redact_* helpers). Extracting the helpers into a sibling _data_app_helpers.py would bring the service back under the soft ceiling. No blocking because this is a pre-existing violation; the rule explicitly says "split before crossing the hard ceiling" applies to the NEXT PR that adds material.
[NB-2] src/keboola_agent_cli/commands/_data_app_git.py:344 — git-bind-credential has no --dry-run flag
git-bind-credential mints a credential, encrypts it, and writes parameters.dataApp.git in one shot. Per CONTRIBUTING.md UX checklist: "Write operations log what they did; Destructive operations have --dry-run and --yes flags." Although git-bind-credential is classified as write (not destructive), the operation is not idempotent in a straightforward sense: a second call mints a new credential, orphaning the previous one (this is documented in the service docstring but not reversible). A --dry-run mode that shows the planned git block (repository URL, branch, permissions) without minting or writing would let operators preview and confirm before committing the one-time credential. Compare git-credentials-create at line 244 of the same file which has a --yes confirmation gate.
This is a usability gap, not a bug; the existing confirm-by-default behavior of --json mode provides partial mitigation.
[NB-3] tests/test_e2e.py — no E2E tests for data-app runs, data-app git-bind-credential, or data-app create --use-managed-git-repo
CONTRIBUTING.md § "Tests (mandatory!)" states: "Every CLI command MUST have a corresponding E2E test in tests/test_e2e.py." None of the three new commands appear in test_e2e.py. The unit and service tests provide good layer coverage, but they mock all external API calls. The managed-repo flow depends on a concrete interaction with the Data Science API's useManagedGitRepo provision path, the git-credentials endpoint, and the git-clone → app_setup failure surface — none of which is exercised by existing E2E tests.
Deferred E2E coverage is an accepted pattern when live credentials are unavailable for CI, per CONTRIBUTING.md; the human reviewer should confirm the author has verified the flow manually (which the PR description asserts) and agree a tracking issue is sufficient.
Nits
[NIT-1]src/keboola_agent_cli/services/data_app_service.py:896— the service docstring forbind_managed_credentialcalls the orphaned credential "a harmless leaked slot." This is accurate for the current credential quota (no known hard limit), but the claim may become stale if Keboola ever enforces a per-app credential cap. Softening to "an orphaned credential slot" would be more forward-compatible.
Verification log
gh pr view 455 --json title,body,files,additions,deletions,state→ 22 files, +1446/-72, state=OPEN, title matchesfeat(data-app):conventional prefix ✓- Branch check:
git rev-parse --abbrev-ref HEAD→feat/data-app-managed-git-repo(matches PR head branch) ✓ make check(uv sync --extra server + full suite) → 4,160 passed, 8 skipped, 0 failures ✓ (exit 0)- Layer violation grep (typer in services, httpx in commands, formatter in clients) → all empty ✓
- Magic-number grep (
time.sleep|retries|timeout|interval = [0-9]+not fromconstants.) → none ✓ - Raw error-code string grep (
error_code = "...") → none ✓ - Bare
except:grep → none ✓ print()in production code → none ✓- Token in logged output grep → all matches are documentation strings and comment text about encryption, not actual raw token surfaces ✓
OPERATION_REGISTRYcheck:data-app.runs: readatpermissions.py:157,data-app.git-bind-credential: writeatpermissions.py:175✓CLAUDE.md## All CLI Commands:data-app runs,data-app git-bind-credential,--use-managed-git-repoall present ✓commands/context.pyAGENT_CONTEXT:data-app runs,data-app git-bind-credential,--use-managed-git-repoall added ✓plugins/kbagent/agents/keboola-expert.md: §2 matrix row updated for managed-repo create flow; §3 gotchas extended with--use-managed-git-repo,git-bind-credential,runs✓; byte count 54,034 < 62,000 hard cap ✓plugins/kbagent/skills/kbagent/references/commands-reference.md:data-app runs,data-app git-bind-credential, updateddata-app createanddata-app deployentries ✓plugins/kbagent/skills/kbagent/references/gotchas.md: new sectiondata-app create --use-managed-git-repotagged(since v0.65.0)✓plugins/kbagent/skills/kbagent/references/data-app-workflow.md: managed-repo section added under git-credentials ✓plugins/kbagent/skills/kbagent/SKILL.md: new trigger keywords added;data-app runsanddata-app git-bind-credentialrows added to decision table ✓src/keboola_agent_cli/server/routers/data_apps.py:GET /{project}/{app_id}/runsandPOST /{project}/{app_id}/git-repo/bind-credentialroutes added, matching 1:1 CLI parity requirement ✓changelog.pyentry for0.65.0with 5 bullets covering all new behavior ✓;pyproject.tomlversion bumped to0.65.0✓;plugin.jsonandmarketplace.jsonauto-synced ✓data_app_service.pyLOC: 2,470 (hard ceiling: 1,500; pre-PR baseline: ~2,082) → over budget, NON-BLOCKING per rule (existing violation)tests/test_data_app_cli.py:TestDataAppRuns.test_runs_json,TestDataAppGitBindCredential.test_bind_credential_json, managed-create CLI tests → present ✓tests/test_data_app_service.py:TestDataAppBindManagedCredential.*,TestDataAppRuns.test_normalizes_runs_and_failure_reason, deploy configVersion tests, diagnostic best-effort tests → present ✓tests/test_data_app_git_repo.py:test_create_app_managed_sends_flag,test_create_app_external_omits_flag,test_list_app_runs_returns_array_with_failure_reason→ present ✓tests/test_server_router_calls.py: 3 new router parity tests forruns,git-bind-credential, anduse_managed_git_repoon create → present ✓tests/test_e2e.pygrep forruns|git-bind-credential|use-managed-git-repo→ not found (NON-BLOCKING, see NB-3)- Live behavior reproduction: not independently verified (no
us-east4.gcpcredentials in reviewer environment); PR description asserts tic-tac-toe verified live ✓ (unverified by reviewer)
Open questions for the author
NB-2follow-up: Was--dry-runomitted fromgit-bind-credentialintentionally (e.g., because the one-time secret would be consumed anyway on a real dry-run) or was it an oversight? If intentional, a brief code comment explaining the design decision would help future contributors.
…2/NB-3) Address the two actionable non-blocking review findings on #455. NB-2 -- `data-app git-bind-credential --dry-run`: validates the app and resolves the managed repo URL, then previews what would be wired (repository / branch / permissions) WITHOUT minting a credential or editing the config. The http_token is one-time, so an aborted real run would orphan it; dry-run lets callers inspect first. Threaded through the service, CLI, and the serve REST endpoint (GitBindCredential.dry_run). NB-3 -- E2E coverage for the new commands: add TestE2EDataAppManagedRepo exercising `create --use-managed-git-repo` -> `git-repo` -> `git-bind-credential --dry-run` -> `runs` -> `delete` against a real project (creates one managed app, cleans it up), plus a usage-error case for "no git source". Verified live via config-dir mode (2 passed). Bind uses --dry-run in E2E so the run mints no un-rollbackable credential. Tests: service dry-run (no mint / no write), CLI dry-run flag passthrough. Docs + changelog updated for the --dry-run flag.
Why
data-app createcould only deploy from an external git repo (--git-repo URL). Keboola can also host a managed repo for an app (the model Kai uses in the UI), but kbagent had no way to create one — and even once you could, the managed deploy silently failed: the app revertedrunning → stoppedand never built, with no obvious reason (data-app logsreturns HTTP 400 on a never-started app).This PR makes the whole managed-repo flow work end-to-end and self-diagnosing.
What
New
data-app create --use-managed-git-repo— provisions an empty Keboola-hosted repo (POST useManagedGitRepo:true), writes no git block, forces--no-deploy. Mutually exclusive with--git-repoand all--git-*/PAT flags.data-app git-bind-credential— mints anhttp_tokenon the app, encrypts it under the project KMS, and writesparameters.dataApp.git(repository + placeholder username + encrypted#password+ branch) so the runtime can clone. The token is encrypted in place and never printed.data-app runs— lists deployment attempts withfailure_reason+startup_logs(GET /apps/{id}/runs), including setup-phase failures that produce no container logs. Works on never-started / failed apps wheredata-app logs400s.Fix
data-app deploynow resolvesconfigVersionby source location: it pins the latest Storage version when a git block is present (external and credential-wired managed repos) and omits it only for a pure managed repo (deploys frommanagedGitRepoId). Previously it always pinned, which pointed managed deploys at a config snapshot with no git source and made them silently revert.UX
data-app deploy --waitauto-surfaces the latest run'sfailure_reasonon timeout/error (best-effort; never masks the original error), with an actionablegit-bind-credentialhint for managed clone-auth failures.Full managed flow (verified live)
Verified end-to-end on
us-east4.gcp: a python-js tic-tac-toe app deploys and serves (HTTP 200) from its managed repo.Security / token handling
No raw token ever passes through the caller or lands in a config in plaintext.
git-bind-credentialmints → encrypts → writes only theKBC::…ciphertext (the same pattern external private repos already use); the placeholderusernameis non-secret (the git-service validates only the token).Known platform dependency
On stacks that inject managed-repo credentials at deploy time,
git-bind-credentialis unnecessary; onus-east4.gcpit is required (otherwise the clone failscould not read Username). Tracked in #454.Implementation
3-layer:
data_science_client.py(create_app(useManagedGitRepo),list_app_runs),data_app_service.py(managed create branch,bind_managed_credential,list_app_runs, deploy configVersion logic, failure diagnostic),commands/data_app.py+commands/_data_app_git.py(flags +runs+git-bind-credential),permissions.py(OPERATION_REGISTRY). The managed-repo deploy recipe was reverse-engineered fromkeboola/mcp-serverfeature_spec/managed_repo_data_apps_mvp/RFC.md.Tests & docs
git-bind-credential,runs, deploy configVersion resolution, failure diagnostic + best-effort guard). Full suite green (4157 passed).CLAUDE.md,context.py(AGENT_CONTEXT),commands-reference.md,gotchas.md,data-app-workflow.md,keboola-expert.md,SKILL.md.0.64.0 → 0.65.0+ changelog.lint/format/ty/changelog-check/command-sync/version-check/skill-checkall green.