Skip to content

feat(data-app): managed git-repo deploy end-to-end (--use-managed-git-repo, git-bind-credential, runs)#455

Merged
padak merged 3 commits into
mainfrom
feat/data-app-managed-git-repo
Jun 21, 2026
Merged

feat(data-app): managed git-repo deploy end-to-end (--use-managed-git-repo, git-bind-credential, runs)#455
padak merged 3 commits into
mainfrom
feat/data-app-managed-git-repo

Conversation

@padak

@padak padak commented Jun 20, 2026

Copy link
Copy Markdown
Member

Why

data-app create could only deploy from an external git repo (--git-repo URL). Keboola can also host a managed repo for an app (the model Kai uses in the UI), but kbagent had no way to create one — and even once you could, the managed deploy silently failed: the app reverted running → stopped and never built, with no obvious reason (data-app logs returns HTTP 400 on a never-started app).

This PR makes the whole managed-repo flow work end-to-end and self-diagnosing.

What

New

  • data-app create --use-managed-git-repo — provisions an empty Keboola-hosted repo (POST useManagedGitRepo:true), writes no git block, forces --no-deploy. Mutually exclusive with --git-repo and all --git-*/PAT flags.
  • data-app git-bind-credential — mints an http_token on the app, encrypts it under the project KMS, and writes parameters.dataApp.git (repository + placeholder username + encrypted #password + branch) so the runtime can clone. The token is encrypted in place and never printed.
  • data-app runs — lists deployment attempts with failure_reason + startup_logs (GET /apps/{id}/runs), including setup-phase failures that produce no container logs. Works on never-started / failed apps where data-app logs 400s.

Fix

  • data-app deploy now resolves configVersion by source location: it pins the latest Storage version when a git block is present (external and credential-wired managed repos) and omits it only for a pure managed repo (deploys from managedGitRepoId). Previously it always pinned, which pointed managed deploys at a config snapshot with no git source and made them silently revert.

UX

  • data-app deploy --wait auto-surfaces the latest run's failure_reason on timeout/error (best-effort; never masks the original error), with an actionable git-bind-credential hint for managed clone-auth failures.

Full managed flow (verified live)

data-app create --use-managed-git-repo
  -> git-credentials-create --type http_token --permissions readWrite + git push
  -> data-app git-bind-credential
  -> data-app deploy

Verified end-to-end on us-east4.gcp: a python-js tic-tac-toe app deploys and serves (HTTP 200) from its managed repo.

Security / token handling

No raw token ever passes through the caller or lands in a config in plaintext. git-bind-credential mints → encrypts → writes only the KBC::… ciphertext (the same pattern external private repos already use); the placeholder username is non-secret (the git-service validates only the token).

Known platform dependency

On stacks that inject managed-repo credentials at deploy time, git-bind-credential is unnecessary; on us-east4.gcp it is required (otherwise the clone fails could not read Username). Tracked in #454.

Implementation

3-layer: data_science_client.py (create_app(useManagedGitRepo), list_app_runs), data_app_service.py (managed create branch, bind_managed_credential, list_app_runs, deploy configVersion logic, failure diagnostic), commands/data_app.py + commands/_data_app_git.py (flags + runs + git-bind-credential), permissions.py (OPERATION_REGISTRY). The managed-repo deploy recipe was reverse-engineered from keboola/mcp-server feature_spec/managed_repo_data_apps_mvp/RFC.md.

Tests & docs

  • New unit tests across all three layers (managed create, mutex validation, git-bind-credential, runs, deploy configVersion resolution, failure diagnostic + best-effort guard). Full suite green (4157 passed).
  • Full doc-sync: CLAUDE.md, context.py (AGENT_CONTEXT), commands-reference.md, gotchas.md, data-app-workflow.md, keboola-expert.md, SKILL.md.
  • Version bump 0.64.0 → 0.65.0 + changelog. lint / format / ty / changelog-check / command-sync / version-check / skill-check all green.

Open in Devin Review

…-repo, git-bind-credential, runs)

Deploy a data app from a Keboola-MANAGED git repository, not just an
external one, and make the managed deploy path actually start on stacks
that do not inject managed-repo credentials.

New:
- `data-app create --use-managed-git-repo` provisions an empty
  Keboola-hosted repo (POST useManagedGitRepo:true), writes no git block,
  forces --no-deploy; mutually exclusive with --git-repo and all
  --git-*/PAT flags.
- `data-app git-bind-credential` mints an http_token ON the app, encrypts
  it under the project KMS, and writes parameters.dataApp.git (repository
  + placeholder username + encrypted #password + branch) so the runtime
  can clone. The token is encrypted in place and never printed.
- `data-app runs` lists deployment attempts with failure_reason +
  startup_logs (GET /apps/{id}/runs), incl. setup-phase failures with no
  container logs; works on never-started/failed apps where `logs` 400s.

Fix:
- `data-app deploy` resolves configVersion by source location: pins the
  latest Storage version when a git block is present (external AND
  credential-wired managed), omits it only for a pure managed repo
  (deploys from managedGitRepoId). Previously it always pinned, which
  pointed managed deploys at a config snapshot with no git source and
  made them silently revert to stopped.

UX:
- `data-app deploy --wait` now auto-surfaces the latest run's
  failure_reason on timeout/error (best-effort), with a
  git-bind-credential hint for managed clone-auth failures.

Verified live: a tic-tac-toe python-js app deploys and serves from its
managed repo on us-east4.gcp. The cross-stack credential-injection gap is
tracked in #454.

Layered across data_science_client.py (create_app useManagedGitRepo,
list_app_runs), data_app_service.py (managed create branch,
bind_managed_credential, list_app_runs, deploy configVersion logic,
failure diagnostic), commands/data_app.py + commands/_data_app_git.py
(flags + runs + git-bind-credential), permissions.py registry. Tests +
full doc-sync + version bump 0.65.0 + changelog.

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 4 potential issues.

🐛 3 issues in files not directly in the diff

🐛 REST API DataAppCreate model and endpoint omit use_managed_git_repo, breaking managed-repo creates via kbagent serve (src/keboola_agent_cli/server/routers/data_apps.py:38-57)

The DataAppCreate Pydantic model in server/routers/data_apps.py:38-57 does not include a use_managed_git_repo field, and the create endpoint at server/routers/data_apps.py:116-137 never passes use_managed_git_repo to registry.data_app.create_data_app(). This means REST API callers (Web UI, scheduled agents, CI pipelines) cannot create managed-repo data apps at all — the service layer's validation will reject the call because neither git_repo nor use_managed_git_repo is truthy. The changelog entry at src/keboola_agent_cli/changelog.py:59 explicitly claims "All of the above are mirrored on the kbagent serve REST API and the Python SDK." CONTRIBUTING.md mandates 1:1 CLI-to-REST parity.


⚠️ Missing REST endpoint for data-app runs command (src/keboola_agent_cli/server/routers/data_apps.py:384)

The new data-app runs CLI command (commands/data_app.py:767-825) lists deployment attempts with failure reasons, but there is no corresponding route in server/routers/data_apps.py. CONTRIBUTING.md mandates 1:1 CLI-to-REST parity: "every command in a group has a matching endpoint in that group's router... If you add a new command, add the corresponding route." The changelog at src/keboola_agent_cli/changelog.py:59 claims REST mirroring. REST callers (monitoring dashboards, scheduled agents) cannot query deploy failure reasons via the API.


⚠️ Missing REST endpoint for data-app git-bind-credential command (src/keboola_agent_cli/server/routers/data_apps.py:384)

The new data-app git-bind-credential CLI command (commands/_data_app_git.py:345-395) wires a managed-repo credential into a data app's config, but there is no corresponding route in server/routers/data_apps.py. CONTRIBUTING.md mandates 1:1 CLI-to-REST parity. The changelog at src/keboola_agent_cli/changelog.py:59 claims REST mirroring. This is a critical step in the managed-repo deploy flow — without a REST endpoint, external applications cannot complete the managed-repo onboarding sequence via the API.

View 1 additional finding in Devin Review.

Open in Devin Review

Comment on lines +917 to +945
# Mint a credential ON the app's managed repo. The one-time secret is
# consumed immediately by the encryption step below and never returned.
credential = ds_client.create_git_credential(
app_id,
type_="http_token",
permissions=permissions,
name="kbagent-managed-deploy",
)
secret = str(credential.get("secret") or "")
if not secret:
raise KeboolaApiError(
message=(
"create_git_credential returned no one-time secret for the "
"http_token; cannot wire the managed-repo credential."
),
status_code=500,
error_code=ErrorCode.API_ERROR,
retryable=False,
)

git_block = self._build_git_block(
alias=alias,
git_repo=https_url,
git_branch=branch,
git_public=False,
git_username=MANAGED_GIT_CREDENTIAL_USERNAME,
git_pat_plaintext=secret,
git_pat_encrypted=None,
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 bind_managed_credential does not clean up the minted credential on subsequent failures

In bind_managed_credential (data_app_service.py:917-964), after minting a credential via create_git_credential (line 919), if the encryption step (_build_git_block, line 937) or the config update (update_config, line 958) fails, the minted credential remains on the app's managed repo but is never written to the config. The one-time secret is lost and cannot be retrieved again. The credential becomes orphaned. Unlike create_data_app which has cleanup-in-finally for the shell, bind_managed_credential has no rollback for the minted credential. In practice, re-running git-bind-credential would mint a fresh credential (the old orphan is harmless but leaks a credential slot). This is acceptable for an MVP but worth documenting as a known limitation.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Address review findings on the managed-repo feature:

- REST parity (CONTRIBUTING.md 1:1): the `kbagent serve` data-apps router
  now exposes the full managed flow. `DataAppCreate` gains
  `use_managed_git_repo` (and `git_repo` is optional); add
  `GET /{p}/{app}/runs` and `POST /{p}/{app}/git-repo/bind-credential`.
  Without these, REST callers (Web UI, scheduled agents, CI) could not
  create or finish a managed-repo app, contradicting the changelog's
  "mirrored on the serve REST API" claim.
- Changelog: drop the inaccurate "Python SDK" mirror claim (the SDK has
  no data-app surface); state the exact REST routes instead.
- deploy_data_app + bind_managed_credential: parse `configuration`
  defensively via a shared `_coerce_config_dict` helper (handles a
  JSON-string echo), matching the existing pattern in get_data_app.
- patch_app docstring: document that configVersion is omitted for pure
  managed-repo deploys (the §9-trio exception), so it does not read as a bug.
- bind_managed_credential docstring: document the mint-then-write orphan
  window (no credential rollback; re-run mints fresh).

Tests: 3 new serve-router tests (create managed flag, runs, bind-credential).

@padak padak left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review of #455 — feat(data-app): managed git-repo deploy end-to-end (--use-managed-git-repo, git-bind-credential, runs)

Generated by kbagent-pr-reviewer subagent. Verdict and findings below
are advisory; the human author retains every veto. CI-coverable issues
(lint, format, tests) are confirmed via make check, not duplicated here.

Summary

This PR closes the managed-repo data-app deploy gap end-to-end: data-app create --use-managed-git-repo provisions an empty Keboola-hosted repo, data-app git-bind-credential wires the encrypted deploy credential into the Storage config, and data-app runs surfaces setup-phase failures that data-app logs cannot reach. The deploy configVersion logic is fixed so pure managed repos (no git block) omit the version pin. All three new commands are registered in OPERATION_REGISTRY, documented across every mandatory plugin-sync surface, and covered by service/CLI/router unit tests. make check passes with 4,160 tests green. The claimed live verification (tic-tac-toe on us-east4.gcp) is stated in the PR description and cannot be independently reproduced by this reviewer (no matching project credentials). One NON-BLOCKING file-size budget violation and two NON-BLOCKING gaps (missing --dry-run on a write command, missing E2E tests) are noted below. Verdict: APPROVE (zero BLOCKING findings).

Verdict

  • Verdict: APPROVE
  • Blocking findings: 0
  • Non-blocking findings: 3
  • Nits: 1

Blocking findings

(none)

Non-blocking findings

[NB-1] src/keboola_agent_cli/services/data_app_service.py — file exceeds hard ceiling (2,470 LOC; hard cap is 1,500)

data_app_service.py is at 2,470 LOC after this PR, adding a net ~388 lines on top of a pre-PR baseline of ~2,082. The hard ceiling per CONTRIBUTING.md "File-size budgets" is 1,500 LOC for services/. The PR is not the root cause (the file was already at 2,082 before this change) but it materially extends an already-overflowed file without splitting.

The PR should note this as a follow-up item. The file mixes orchestration (create_data_app, deploy_data_app, bind_managed_credential) with a set of pure helper/parser concerns (_validate_create_inputs, _build_git_block, _build_storage_config_body, _build_dry_run_payload, _coerce_config_dict, _deploy_failure_diagnostic, various _redact_* helpers). Extracting the helpers into a sibling _data_app_helpers.py would bring the service back under the soft ceiling. No blocking because this is a pre-existing violation; the rule explicitly says "split before crossing the hard ceiling" applies to the NEXT PR that adds material.

[NB-2] src/keboola_agent_cli/commands/_data_app_git.py:344git-bind-credential has no --dry-run flag

git-bind-credential mints a credential, encrypts it, and writes parameters.dataApp.git in one shot. Per CONTRIBUTING.md UX checklist: "Write operations log what they did; Destructive operations have --dry-run and --yes flags." Although git-bind-credential is classified as write (not destructive), the operation is not idempotent in a straightforward sense: a second call mints a new credential, orphaning the previous one (this is documented in the service docstring but not reversible). A --dry-run mode that shows the planned git block (repository URL, branch, permissions) without minting or writing would let operators preview and confirm before committing the one-time credential. Compare git-credentials-create at line 244 of the same file which has a --yes confirmation gate.

This is a usability gap, not a bug; the existing confirm-by-default behavior of --json mode provides partial mitigation.

[NB-3] tests/test_e2e.py — no E2E tests for data-app runs, data-app git-bind-credential, or data-app create --use-managed-git-repo

CONTRIBUTING.md § "Tests (mandatory!)" states: "Every CLI command MUST have a corresponding E2E test in tests/test_e2e.py." None of the three new commands appear in test_e2e.py. The unit and service tests provide good layer coverage, but they mock all external API calls. The managed-repo flow depends on a concrete interaction with the Data Science API's useManagedGitRepo provision path, the git-credentials endpoint, and the git-cloneapp_setup failure surface — none of which is exercised by existing E2E tests.

Deferred E2E coverage is an accepted pattern when live credentials are unavailable for CI, per CONTRIBUTING.md; the human reviewer should confirm the author has verified the flow manually (which the PR description asserts) and agree a tracking issue is sufficient.

Nits

  • [NIT-1] src/keboola_agent_cli/services/data_app_service.py:896 — the service docstring for bind_managed_credential calls the orphaned credential "a harmless leaked slot." This is accurate for the current credential quota (no known hard limit), but the claim may become stale if Keboola ever enforces a per-app credential cap. Softening to "an orphaned credential slot" would be more forward-compatible.

Verification log

  • gh pr view 455 --json title,body,files,additions,deletions,state → 22 files, +1446/-72, state=OPEN, title matches feat(data-app): conventional prefix ✓
  • Branch check: git rev-parse --abbrev-ref HEADfeat/data-app-managed-git-repo (matches PR head branch) ✓
  • make check (uv sync --extra server + full suite) → 4,160 passed, 8 skipped, 0 failures ✓ (exit 0)
  • Layer violation grep (typer in services, httpx in commands, formatter in clients) → all empty ✓
  • Magic-number grep (time.sleep|retries|timeout|interval = [0-9]+ not from constants.) → none ✓
  • Raw error-code string grep (error_code = "...") → none ✓
  • Bare except: grep → none ✓
  • print() in production code → none ✓
  • Token in logged output grep → all matches are documentation strings and comment text about encryption, not actual raw token surfaces ✓
  • OPERATION_REGISTRY check: data-app.runs: read at permissions.py:157, data-app.git-bind-credential: write at permissions.py:175
  • CLAUDE.md ## All CLI Commands: data-app runs, data-app git-bind-credential, --use-managed-git-repo all present ✓
  • commands/context.py AGENT_CONTEXT: data-app runs, data-app git-bind-credential, --use-managed-git-repo all added ✓
  • plugins/kbagent/agents/keboola-expert.md: §2 matrix row updated for managed-repo create flow; §3 gotchas extended with --use-managed-git-repo, git-bind-credential, runs ✓; byte count 54,034 < 62,000 hard cap ✓
  • plugins/kbagent/skills/kbagent/references/commands-reference.md: data-app runs, data-app git-bind-credential, updated data-app create and data-app deploy entries ✓
  • plugins/kbagent/skills/kbagent/references/gotchas.md: new section data-app create --use-managed-git-repo tagged (since v0.65.0)
  • plugins/kbagent/skills/kbagent/references/data-app-workflow.md: managed-repo section added under git-credentials ✓
  • plugins/kbagent/skills/kbagent/SKILL.md: new trigger keywords added; data-app runs and data-app git-bind-credential rows added to decision table ✓
  • src/keboola_agent_cli/server/routers/data_apps.py: GET /{project}/{app_id}/runs and POST /{project}/{app_id}/git-repo/bind-credential routes added, matching 1:1 CLI parity requirement ✓
  • changelog.py entry for 0.65.0 with 5 bullets covering all new behavior ✓; pyproject.toml version bumped to 0.65.0 ✓; plugin.json and marketplace.json auto-synced ✓
  • data_app_service.py LOC: 2,470 (hard ceiling: 1,500; pre-PR baseline: ~2,082) → over budget, NON-BLOCKING per rule (existing violation)
  • tests/test_data_app_cli.py: TestDataAppRuns.test_runs_json, TestDataAppGitBindCredential.test_bind_credential_json, managed-create CLI tests → present ✓
  • tests/test_data_app_service.py: TestDataAppBindManagedCredential.*, TestDataAppRuns.test_normalizes_runs_and_failure_reason, deploy configVersion tests, diagnostic best-effort tests → present ✓
  • tests/test_data_app_git_repo.py: test_create_app_managed_sends_flag, test_create_app_external_omits_flag, test_list_app_runs_returns_array_with_failure_reason → present ✓
  • tests/test_server_router_calls.py: 3 new router parity tests for runs, git-bind-credential, and use_managed_git_repo on create → present ✓
  • tests/test_e2e.py grep for runs|git-bind-credential|use-managed-git-reponot found (NON-BLOCKING, see NB-3)
  • Live behavior reproduction: not independently verified (no us-east4.gcp credentials in reviewer environment); PR description asserts tic-tac-toe verified live ✓ (unverified by reviewer)

Open questions for the author

  • NB-2 follow-up: Was --dry-run omitted from git-bind-credential intentionally (e.g., because the one-time secret would be consumed anyway on a real dry-run) or was it an oversight? If intentional, a brief code comment explaining the design decision would help future contributors.

…2/NB-3)

Address the two actionable non-blocking review findings on #455.

NB-2 -- `data-app git-bind-credential --dry-run`: validates the app and
resolves the managed repo URL, then previews what would be wired
(repository / branch / permissions) WITHOUT minting a credential or
editing the config. The http_token is one-time, so an aborted real run
would orphan it; dry-run lets callers inspect first. Threaded through the
service, CLI, and the serve REST endpoint (GitBindCredential.dry_run).

NB-3 -- E2E coverage for the new commands: add TestE2EDataAppManagedRepo
exercising `create --use-managed-git-repo` -> `git-repo` ->
`git-bind-credential --dry-run` -> `runs` -> `delete` against a real
project (creates one managed app, cleans it up), plus a usage-error case
for "no git source". Verified live via config-dir mode (2 passed). Bind
uses --dry-run in E2E so the run mints no un-rollbackable credential.

Tests: service dry-run (no mint / no write), CLI dry-run flag passthrough.
Docs + changelog updated for the --dry-run flag.
@padak padak merged commit bab5acb into main Jun 21, 2026
4 checks passed
@padak padak deleted the feat/data-app-managed-git-repo branch June 21, 2026 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant