Skip to content

API: Add Workspace.spec.setupCommand for pre-agent setup#1056

Merged
gjkim42 merged 1 commit intomainfrom
kelos-task-1051
Apr 30, 2026
Merged

API: Add Workspace.spec.setupCommand for pre-agent setup#1056
gjkim42 merged 1 commit intomainfrom
kelos-task-1051

Conversation

@kelos-bot
Copy link
Copy Markdown

@kelos-bot kelos-bot Bot commented Apr 29, 2026

What type of PR is this?

/kind feature

What this PR does / why we need it:

Adds Workspace.spec.setupCommand: an optional exec-form slice that runs in
the agent container after the workspace is prepared and before the agent
starts. The controller JSON-encodes the command into KELOS_SETUP_COMMAND,
and each agent's kelos_entrypoint.sh decodes the array and execs it
directly via node (no shell wrapping). A non-zero exit aborts the Task,
and ---KELOS_SETUP_COMMAND_{START,DONE,FAILED}--- banners on stderr
distinguish setup failures from agent failures when tailing pod logs.

This lets workspaces declare repo-specific bootstrapping (npm ci, pip
install, go mod download, code generation) once instead of paying the
typing/wall-clock cost on every Task.

Implementation notes:

  • Exec form, no implicit sh -c — semantics match Kubernetes
    lifecycle.postStart.exec.command. Use ["sh", "-c", "<script>"] for
    shell pipelines.
  • MinItems=1 rejects an explicit empty array at admission.
  • KELOS_SETUP_COMMAND is a statically reserved env-var name: the
    controller drops user-supplied PodOverrides.Env entries with that name
    unconditionally, even when the workspace does not set setupCommand,
    so the entrypoint cannot be tricked into executing forged commands.
  • Setup runs after clone → Ref/Branch checkout → remotes → Files → setup
    → agent, as documented in the godoc.

Which issue(s) this PR is related to:

Fixes #1051

Special notes for your reviewer:

The setup runs inside the agent container (not as a separate init
container) so it shares the agent toolchain and credentials, matching the
issue's scope decision. All five reference agent images (claude-code,
codex, gemini, opencode, cursor) already install Node.js v22, so
the entrypoint uses node -e with child_process.spawnSync for portable
JSON decoding and exec.

Test coverage:

  • Unit tests in internal/controller/job_builder_test.go cover the happy
    path, the no-setup-command case, the PodOverrides.Env override-drop
    case (workspace defines setupCommand), and the new static-list
    override-drop case (workspace does not define setupCommand).
  • E2E tests in test/e2e/setup_command_test.go exercise the success
    path (setup writes a file the agent reads) and the failure path
    (non-zero setup exit transitions the Task to Failed without running
    the agent).

Does this PR introduce a user-facing change?

Yes — a new optional CRD field.

Added `Workspace.spec.setupCommand`: an optional exec-form command run in the agent container before the agent starts (e.g. `["sh", "-c", "npm ci"]`). A non-zero exit fails the Task. Useful for repo-specific dependency installs and code generation that depend on the cloned lockfile.

@github-actions github-actions Bot added needs-triage needs-kind Indicates an issue or PR lacks a kind/* label needs-priority needs-actor needs-release-note Indicates a PR lacks a release-note block kind/feature Categorizes issue or PR as related to a new feature release-note and removed needs-kind Indicates an issue or PR lacks a kind/* label needs-release-note Indicates a PR lacks a release-note block labels Apr 29, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 12 files

@gjkim42
Copy link
Copy Markdown
Collaborator

gjkim42 commented Apr 29, 2026

/kelos review

Copy link
Copy Markdown
Author

@kelos-bot kelos-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖 Kelos Reviewer Agent @gjkim42

Review Summary

Verdict: COMMENT (would be APPROVE — self-author blocks the approval event)
Overall correctness: patch is correct
Scope: Adds Workspace.spec.setupCommand ([]string, exec-form, MinItems=1), JSON-encoded into KELOS_SETUP_COMMAND and executed by each reference entrypoint via node spawnSync after init containers and before the agent starts; non-zero exit aborts the Task with banner markers on stderr.

Findings Overview

Priority Count File:Line Summary
P0 0 none
P1 0 none
P2 1 docs/agent-image-interface.md:49-51 Reserved-name claim is only enforced when the workspace populates the env var; PodOverrides.Env can still set KELOS_SETUP_COMMAND when setupCommand is unset
P3 0 none

Findings

Correctness / Defense-in-depth

  • [P2] docs/agent-image-interface.md:49-51 (and internal/controller/job_builder.go:599-611) — see inline comment. The PR's new "reserved names ... are dropped" claim is conditional on the controller having already populated the env var; when Workspace.spec.setupCommand is unset, a user-supplied PodOverrides.Env: KELOS_SETUP_COMMAND flows through to the entrypoint and is executed. Not a privilege escalation today (task.Spec.Image already grants arbitrary code execution as the agent UID), but the new field turns an env var into a code-execution surface, so the documented invariant is worth tightening with a static reserved-name list.

Suggestions (optional)

  • [P3] internal/controller/job_builder_test.go — consider adding a complementary case to TestBuildAgentJob_SetupCommandPodOverrideIsDropped where the workspace has no setupCommand but PodOverrides.Env sets KELOS_SETUP_COMMAND. Today that injection would silently succeed; a test pinning the desired behavior makes the invariant unambiguous.
  • [P3] The node -e setup-runner block is duplicated verbatim across all five entrypoints. Acceptable and consistent with how KELOS_MCP_SERVERS decoding is handled today, but a small /kelos/kelos-run-setup helper colocated with kelos-capture would centralize future fixes (e.g., signal forwarding, timeout) without touching five files.

Key takeaways

  • API shape, ordering docs, MinItems=1, and built-in-wins test coverage all match the API review feedback.
  • The reserved-name documentation introduced in this PR overstates the controller's protection; one static-list change in job_builder.go would make the docs literally true.

Comment thread docs/agent-image-interface.md Outdated

> The names listed in this table are reserved. `PodOverrides.Env` entries that
> reuse a reserved name are dropped so the built-in value always wins; do not
> set them manually.
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] This new note ("PodOverrides.Env entries that reuse a reserved name are dropped so the built-in value always wins") overstates the actual filter at internal/controller/job_builder.go:599-611, which only drops collisions against names already populated on mainContainer.Env. When Workspace.spec.setupCommand is empty the controller does not append KELOS_SETUP_COMMAND, so a PodOverrides.Env: KELOS_SETUP_COMMAND=["sh","-c","curl ..."] entry passes through to the entrypoint and gets executed before the agent. For pre-existing reserved names (e.g. KELOS_MODEL) the asymmetry is harmless — the user value just becomes the effective config — but KELOS_SETUP_COMMAND turns an env var into a code-execution surface. Not a privilege escalation today (task.Spec.Image already grants the same blast radius), but worth closing the gap by maintaining a static reserved-name set that is filtered unconditionally, so the documented invariant is literally true and future image-pinning admission policies don't leave this path open.

Copy link
Copy Markdown
Collaborator

@gjkim42 gjkim42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add an e2e test for this new feature.

/kelos pick-up

@kelos-bot kelos-bot Bot force-pushed the kelos-task-1051 branch from 58bb6b7 to 1d1a5cc Compare April 30, 2026 10:29
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented Apr 30, 2026

Greptile Summary

This PR adds Workspace.spec.setupCommand, an optional exec-form slice that runs in the agent container between workspace preparation and agent start. The controller JSON-encodes the array into KELOS_SETUP_COMMAND, which all five reference entrypoints decode via node -e / spawnSync and execute directly; a non-zero exit fails the Task. The implementation includes a reservedEnvNames guard that unconditionally drops KELOS_SETUP_COMMAND from PodOverrides.Env to prevent command-injection via that path, unit tests for all four env-var cases, and two new e2e tests for the success and failure paths.

Confidence Score: 5/5

Safe to merge — no new blocking issues found; prior review threads cover the only meaningful feedback (signal-kill diagnostic and e2e CreateSecret idempotency).

All remaining open items are P2 (signal-name propagation for better diagnostics, e2e CreateSecret cleanup) and were already flagged in prior threads. The security-critical path — reservedEnvNames blocking KELOS_SETUP_COMMAND injection via PodOverrides — is correctly implemented and tested for all four cases. No new P0/P1 defects found.

No files require special attention beyond the prior thread items.

Important Files Changed

Filename Overview
internal/controller/job_builder.go Adds KELOS_SETUP_COMMAND env var injection from workspace.SetupCommand and the reservedEnvNames guard that blocks PodOverrides from forging it; ordering relative to builtinNames is correct.
claude-code/kelos_entrypoint.sh Adds the setup command block (node spawnSync, banners, exit propagation) identically across all five entrypoints; signal-kill masking already flagged in prior review thread.
api/v1alpha1/workspace_types.go Adds SetupCommand []string with +optional / +kubebuilder:validation:MinItems=1 markers; deepcopy generated correctly.
internal/controller/job_builder_test.go Four new unit tests cover: happy path, no-setupCommand, PodOverride-override-drop (with workspace setupCommand), and static-list override-drop (without workspace setupCommand) — full coverage of the new logic paths.
test/e2e/setup_command_test.go Two e2e tests for success and failure paths; duplicate CreateSecret("claude-credentials") across both It blocks was flagged in a prior review thread.
docs/agent-image-interface.md Documents KELOS_SETUP_COMMAND in the reserved-names table and adds a new section 6 describing pre-agent setup command semantics.

Sequence Diagram

sequenceDiagram
    participant K as Kubernetes API
    participant C as Controller (JobBuilder)
    participant I as Init Containers
    participant E as Entrypoint (kelos_entrypoint.sh)
    participant S as setupCommand process
    participant A as Agent (claude/codex/…)

    K->>C: Reconcile Task + Workspace
    C->>C: json.Marshal(setupCommand) → KELOS_SETUP_COMMAND
    C->>C: Drop KELOS_SETUP_COMMAND from PodOverrides.Env (reservedEnvNames)
    C->>K: Create Job with KELOS_SETUP_COMMAND env var

    I->>I: git clone → checkout Ref → configure remotes → write Files

    E->>E: Check KELOS_SETUP_COMMAND
    alt setupCommand defined
        E->>E: Print KELOS_SETUP_COMMAND_START
        E->>S: node spawnSync(cmd[0], cmd.slice(1))
        alt exit 0
            S-->>E: status=0
            E->>E: Print KELOS_SETUP_COMMAND_DONE
            E->>A: exec agent
        else exit non-zero
            S-->>E: status≠0
            E->>E: Print KELOS_SETUP_COMMAND_FAILED exit=N
            E->>K: exit(N) → Task → Failed
        end
    else no setupCommand
        E->>A: exec agent directly
    end
Loading

Reviews (2): Last reviewed commit: "feat(api): add Workspace.spec.setupComma..." | Re-trigger Greptile

Comment thread claude-code/kelos_entrypoint.sh
Comment thread test/e2e/setup_command_test.go
@gjkim42
Copy link
Copy Markdown
Collaborator

gjkim42 commented Apr 30, 2026

/kelos squash-commits

@kelos-bot kelos-bot Bot force-pushed the kelos-task-1051 branch from 1d1a5cc to 6d81934 Compare April 30, 2026 10:36
@kelos-bot
Copy link
Copy Markdown
Author

kelos-bot Bot commented Apr 30, 2026

Squash complete.

🤖 Kelos Agent @gjkim42

Rebased on origin/main and squashed 2 commits into 1:

  • feat(api): add Workspace.spec.setupCommand for pre-agent setup

Force-pushed to the branch. PR description and release note were already accurate — no changes needed.

@gjkim42 gjkim42 enabled auto-merge April 30, 2026 10:46
@gjkim42 gjkim42 added this pull request to the merge queue Apr 30, 2026
Merged via the queue into main with commit c80aaeb Apr 30, 2026
19 checks passed
@gjkim42 gjkim42 deleted the kelos-task-1051 branch April 30, 2026 11:17
kelos-bot Bot pushed a commit that referenced this pull request Apr 30, 2026
Add a project convention and reviewer checklist item that documentation
should describe only what the code actually does, not aspirational
behavior.

Evidence:
- PR #1035 (kelos-bot[bot] P1): docs claimed HMAC signature validation
  per source, but the GenericSource branch in handler.go never reads
  the webhook secret, never inspects X-Hub-Signature-256, and never
  calls validateHMACSignature — giving users a false sense of security.
- PR #1056 (kelos-bot[bot] P2): doc note claimed PodOverrides.Env
  entries reusing reserved names are dropped so the built-in always
  wins, but the actual filter only drops collisions against names
  already populated on mainContainer.Env, leaving KELOS_SETUP_COMMAND
  passable when the field is empty.

Applied to AGENTS.md (symlinked to CLAUDE.md), agentconfig.yaml,
kelos-workers.yaml, and a new "Documentation accuracy" subsection in
the kelos-reviewer.yaml checklist.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

generated-by-kelos kind/api Categorizes issue or PR as related to API changes kind/feature Categorizes issue or PR as related to a new feature needs-actor ok-to-test priority/important-soon release-note triage-accepted

Projects

None yet

Development

Successfully merging this pull request may close these issues.

API: Add Workspace.spec.setupCommand for pre-agent setup

1 participant