Skip to content

feat(operator): oabctl Phase 1 — apply, get, delete#851

Merged
thepagent merged 7 commits into
mainfrom
feat/oabctl-phase1
May 19, 2026
Merged

feat(operator): oabctl Phase 1 — apply, get, delete#851
thepagent merged 7 commits into
mainfrom
feat/oabctl-phase1

Conversation

@chaodu-agent
Copy link
Copy Markdown
Collaborator

Summary

Implements oabctl CLI provisioner under operator/ as defined in the merged ADR (docs/adr/ecs-control-plane.md).

What's included

  • oabctl apply -f <file|dir> — validate manifest, render config.toml to S3 (immutable path per generation), register ECS task definition, create/update ECS service
  • oabctl get oabservice [name] — real-time ECS DescribeServices, displays status table
  • oabctl delete oabservice <name> — scale to 0, delete service, cleanup S3 manifests/config
  • entrypoint.sh — ECS task wrapper: restore bootstrap → overwrite with rendered config → start OAB
  • Manifest schemaoab.dev/v1 OABService with validation

Project structure

operator/
├── Cargo.toml
├── Cargo.lock
├── entrypoint.sh
└── src/
    ├── main.rs      (CLI entrypoint, clap)
    ├── manifest.rs  (schema types + validation)
    ├── apply.rs     (S3 + ECS provisioning)
    ├── get.rs       (ECS describe)
    └── delete.rs    (teardown)

Build

cd operator && cargo build --release

Phase 1 scope (per ADR)

  • Pre-created bot token in SSM (no auto-registration)
  • Single namespace, single region
  • Discord channel type only
  • Immutable config artifact per generation

cc @pahud

Implements the CLI provisioner from ADR docs/adr/ecs-control-plane.md:
- oabctl apply -f <file|dir>: validate manifest, render config to S3,
  register task def, create/update ECS service
- oabctl get oabservice: list services via ECS DescribeServices
- oabctl delete oabservice <name>: teardown ECS service + S3 cleanup
- entrypoint.sh: wrapper script for ECS tasks (bootstrap + config download)

Schema: oab.dev/v1 OABService with capacityProvider, cpu, memory,
bootstrapFrom, networking, config, secrets fields.
@chaodu-agent chaodu-agent requested a review from thepagent as a code owner May 19, 2026 02:46
@github-actions github-actions Bot added closing-soon PR missing Discord Discussion URL — will auto-close in 3 days pending-screening labels May 19, 2026
@github-actions
Copy link
Copy Markdown

⚠️ This PR is missing a Discord Discussion URL in the body.

All PRs must reference a prior Discord discussion to ensure community alignment before implementation.

Please edit the PR description to include a link like:

Discord Discussion URL: https://discord.com/channels/...

This PR will be automatically closed in 3 days if the link is not added.

@shaun-agent
Copy link
Copy Markdown
Contributor

shaun-agent commented May 19, 2026

OpenAB PR Screening

This is auto-generated by the OpenAB project-screening flow for context collection and reviewer handoff.
Click 👍 if you find this useful. Human review will be done within 24 hours. We appreciate your support and contribution 🙏

Screening report screening posted and project item moved to `PR-Screening`.

GitHub comment: #851 (comment)
Project action: https://github.com/orgs/openabdev/projects/1, item PVTI_lADOEFbZWM4BUUALzgtIkTI now has status PR-Screening.

Intent

PR #851 is trying to add the first usable oabctl operator CLI for provisioning OpenAB services on ECS from an OABService manifest. The operator-visible problem is that deployers need a repeatable path to validate desired service state, render config artifacts, create/update ECS services, inspect live status, and tear services down without hand-driving AWS resources.

Feat

Feature work. It adds an operator/ Rust CLI with apply, get oabservice, and delete oabservice, plus an ECS entrypoint wrapper, manifest schema/validation, S3 config artifact handling, ECS task/service registration, and CI coverage for the new operator crate.

Who It Serves

Primary beneficiary: deployers and agent runtime operators. Secondary beneficiaries are maintainers, because this creates a concrete Phase 1 control-plane surface that reviewers can harden incrementally.

Rewritten Prompt

Implement Phase 1 of the OpenAB ECS operator as a Rust CLI under operator/. Add an oabctl binary that can:

  • apply -f <file|dir>: parse and validate oab.dev/v1 OABService manifests, render config.toml, upload immutable generation-scoped artifacts to S3, register an ECS task definition, and create or update the target ECS service.
  • get oabservice [name]: describe matching ECS services in real time and print a useful status table.
  • delete oabservice <name>: scale the service down, delete the ECS service, and clean up generated S3 manifests/config artifacts.

Keep Phase 1 scoped to pre-created SSM bot tokens, one namespace, one AWS region, Discord channel services only, and immutable config artifacts. Include validation errors, predictable resource naming, least-surprise AWS failure handling, and CI that builds/checks the operator crate.

Merge Pitch

This should move forward because it turns the accepted ECS control-plane ADR into an executable slice with clear maintainer review boundaries: manifest shape, AWS provisioning behavior, teardown semantics, and CI integration. The risk profile is medium-high because it touches real cloud resource creation/deletion, credential references, generated config, and lifecycle cleanup. The likely reviewer concern is not whether the CLI shape is useful; it is whether apply/delete are idempotent, safe on partial failure, auditable, and constrained enough to avoid deleting or mutating resources outside its ownership model.

Best-Practice Comparison

OpenClaw is relevant as a control-plane comparison. This PR aligns with explicit delivery/provisioning intent and isolated ECS executions, but reviewers should check for durable job/state tracking, retry/backoff, and run logs. Phase 1 appears artifact-driven through S3 generations, which is good, but AWS mutations still need clear ownership tags and recovery behavior.

Hermes Agent is partially relevant. Its atomic persisted state and self-contained scheduled prompts map to the same reliability concerns: generated state should be persisted atomically, each provisioned service should be reconstructable from manifest plus generation artifact, and repeated runs should be safe. The gateway daemon tick model is less applicable here because this is an operator CLI, not a scheduler.

Implementation Options

Conservative: merge only the manifest schema, CLI skeleton, validation, get, and CI first. Hold apply and delete until AWS mutation semantics, tagging, and tests are reviewed in a follow-up PR.

Balanced: merge this Phase 1 as a complete experimental operator behind explicit docs and strong constraints, after focused review of idempotency, ownership tags, S3 key layout, ECS update behavior, and delete safety. Add targeted unit tests for manifest validation and command planning, plus a dry-run or plan-style path if feasible.

Ambitious: expand the PR into a more production-grade controller surface before merge: persisted operation records, structured run logs, retry/backoff, lock/lease protection, explicit rollback handling, and integration tests against localstack or a dedicated AWS test environment.

Comparison Table

Option Speed Complexity Reliability Maintainability User Impact Fit for OpenAB Now
Conservative Fast Low Medium High Delays full provisioning Good if reviewer bandwidth is thin
Balanced Medium Medium Good if safety checks land Good Delivers usable Phase 1 Best fit
Ambitious Slow High Highest Medium until patterns settle Strongest operator story Better as follow-up phases

Recommendation

Take the balanced path. Keep the PR scoped as Phase 1, but require focused review on AWS mutation safety before merge: idempotent apply, bounded delete, resource ownership tags, S3 artifact naming/cleanup, actionable errors, and CI that proves the operator crate builds. Split durable operation history, retry/backoff, run logs, locking, and integration-test infrastructure into follow-up issues or PRs so this does not turn into an unreviewable control-plane rewrite.

- #1: Read generation from S3 manifest, increment on each apply (immutable config path)
- #2: Remove launch_type (conflicts with capacity_provider_strategy)
- #3: Add generation field to Metadata struct
@thepagent thepagent merged commit 4968732 into main May 19, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

closing-soon PR missing Discord Discussion URL — will auto-close in 3 days

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants