Skip to content

Fix debug package cleanup findings#1010

Merged
simple-agent-manager[bot] merged 2 commits into
mainfrom
sam/debug-package-cleanups
May 14, 2026
Merged

Fix debug package cleanup findings#1010
simple-agent-manager[bot] merged 2 commits into
mainfrom
sam/debug-package-cleanups

Conversation

@simple-agent-manager
Copy link
Copy Markdown
Contributor

@simple-agent-manager simple-agent-manager Bot commented May 14, 2026

Summary

  • Make VM-agent host credential-helper writes retry-safe by replacing existing regular helper files and refusing non-regular paths.
  • Remove avoidable cloud-init warning sources by omitting empty SSH keys, pre-seeding valid iptables rules.v4/rules.v6 files, and deferring Docker metadata blocking until Docker is ready.
  • Start the metadata-block systemd service during VM-agent provisioning (enable --now) so the DOCKER-USER rule is applied after Docker restart instead of only enabled for later boots.
  • Add bounded ACP heartbeat error-body logging on the VM agent and API-side update failure logging; classify Durable Object code-update resets as transient.

Agent Preflight

  • Preflight completed before code changes

Classification

  • external-api-change
  • cross-component-change
  • business-logic-change
  • public-surface-change
  • docs-sync-change
  • security-sensitive-change
  • ui-change
  • infra-change

External References

N/A: no external API behavior or third-party contract changed. This work used local debug packages, production Cloudflare observability already queried in this task, and repo-local postmortems.

Codebase Impact Analysis

Touches apps/api/src/routes/projects/node-acp-heartbeat.ts, packages/vm-agent/internal/bootstrap, packages/vm-agent/internal/server/acp_heartbeat.go, packages/vm-agent/internal/provision/provision.go, and packages/cloud-init/src/template.ts. The behavior changes are limited to bootstrap credential helper retry safety, cloud-init boot warning cleanup, metadata-block enforcement after Docker restart, and ACP heartbeat diagnostics.

Documentation & Specs

N/A: no public API, user docs, env vars, or config surface changed. The task file tasks/active/2026-05-14-debug-package-cleanups.md records the findings, validation notes, and acceptance criteria.

Constitution & Risk Check

Checked no hardcoded secrets, no plaintext credential embedding in cloud-init, bounded heartbeat response logging, suspicious non-regular credential-helper paths, and metadata API isolation timing. Staging VM verification is required before merge because cloud-init and VM-agent bootstrap/provisioning behavior changed.

Validation

  • pnpm --filter @simple-agent-manager/cloud-init test
  • pnpm --filter @simple-agent-manager/api test -- tests/unit/task-callback-auth-routing.test.ts
  • pnpm --filter @simple-agent-manager/cloud-init typecheck
  • pnpm --filter @simple-agent-manager/api typecheck
  • pnpm --filter @simple-agent-manager/api lint (existing warnings only)
  • pnpm typecheck
  • pnpm lint (existing warnings only)
  • pnpm test
  • git diff --check
  • Deploy Staging run 25864333930 passed on commit 00f7bb4 before the metadata-block follow-up.
  • Live staging VM pre-follow-up: workspace 01KRKD7B1M85C0JTFCCFNXNHJ9 reached running on node 01KRKD798WTAP73K3TE04CKGTZ; heartbeat became healthy at 2026-05-14T14:13:28.438Z. Debug package confirmed credential-helper and iptables placeholder fixes, and revealed the metadata-block service needed to be started after Docker restart.
  • Deploy Staging run 25865919906 passed on amended commit 0fd2250, including smoke tests.
  • Final live staging VM verification: workspace 01KRKEZ67RMRK4SG413GRE2E2P reached running on node 01KRKEYZR2XH1476WDS7KD37TK; node health became healthy with heartbeat 2026-05-14T14:45:55.575Z. Debug package manifest reported agent 0fd2250-dirty and showed:
    • sam-firewall: Metadata API blocked for containers (DOCKER-USER chain)
    • /etc/iptables/rules.v4 and /etc/iptables/rules.v6 written by cloud-init before persistence
    • credential helper wrote successfully, then post-build copy skipped because the bind-mounted helper was already present
    • no prior ssh_authorized_keys schema warning, no missing iptables persistence warning, no DOCKER-USER chain not available after 30s, and no credential-helper file exists failure
  • Cleaned up both disposable staging verification workspaces/nodes after collecting evidence. Filed unrelated non-fatal GHCR cache push permission warning as issue Investigate GHCR devcontainer cache push permission warning #1011.

Local Go validation note: this workspace image does not include go or gofmt, so VM-agent Go tests/formatting could not run locally. CI should cover the Go toolchain path.

Specialist Review Evidence

Reviewer Status Outcome
go-specialist PASS Reviewed VM-agent credential-helper and ACP heartbeat changes. No blocking findings. Local Go tooling unavailable, so CI must cover Go execution.
cloudflare-specialist PASS Reviewed Worker heartbeat route and cloud-init rules persistence cleanup. Staging debug package then found metadata-block timing; follow-up defers early cloud-init application and starts the service after Docker restart.
security-auditor PASS Reviewed credential-helper replacement, cloud-init contents, and heartbeat logging. No secret embedding or high-risk credential exposure found; logged heartbeat bodies are bounded. Metadata API block now applies after Docker restart.
test-engineer PASS Focused tests added/updated for cloud-init, API diagnostics, Go credential-helper retry paths, and ACP transient classification; full TypeScript suite passed.
task-completion-validator PASS Implementation checklist and acceptance criteria are covered; CI passed and final staging VM boot/debug-package verification passed on the amended commit.

Co-Authored-By: Claude noreply@anthropic.com

@simple-agent-manager simple-agent-manager Bot force-pushed the sam/debug-package-cleanups branch from 00f7bb4 to 0fd2250 Compare May 14, 2026 14:23
@sonarqubecloud
Copy link
Copy Markdown

@simple-agent-manager simple-agent-manager Bot merged commit 33158a1 into main May 14, 2026
23 checks passed
@simple-agent-manager simple-agent-manager Bot deleted the sam/debug-package-cleanups branch May 14, 2026 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant