Skip to content

feat(ecs): Batch 3 — services with rolling deployments#724

Merged
vieiralucas merged 2 commits intomainfrom
worktree-worktree-ecs-batch3
Apr 24, 2026
Merged

feat(ecs): Batch 3 — services with rolling deployments#724
vieiralucas merged 2 commits intomainfrom
worktree-worktree-ecs-batch3

Conversation

@vieiralucas
Copy link
Copy Markdown
Member

@vieiralucas vieiralucas commented Apr 24, 2026

Summary

Batch 3 of the ECS rollout. Adds the full services surface on top of the Batch 2 task execution: `CreateService`, `UpdateService`, `DeleteService`, `DescribeServices`, `ListServices`, `ListServicesByNamespace` with desired-count enforcement and rolling deployments.

Highlights:

  • `CreateService` spawns tasks to match `desiredCount` and tags each with `startedBy=ecs-svc/` so the tasks reconcile back to the service.
  • `UpdateService` scaling (up: spawn new tasks; down: SIGTERM excess via runtime + flip `desiredStatus=STOPPED` synchronously).
  • `UpdateService` rolling deployment: new `taskDefinition` marks the previous PRIMARY deployment ACTIVE, creates a new PRIMARY, and drains tasks on the old revision while new ones spin up. `minimumHealthyPercent` / `maximumPercent` + deployment circuit breaker are honoured in `deploymentConfiguration`.
  • `DeleteService` refuses while `desiredCount > 0` unless `force=true`; forced path scales to 0 and stops every task under the service.
  • `DescribeServices` derives `runningCount` / `pendingCount` from live task state on each read.
  • Snapshot schema bumped to v3 to persist services.

Deferred to Batch 4: task sets (EXTERNAL deployment controller), container instances, capacity providers, task protection, `ExecuteCommand`, EventBridge state-change events.

Test plan

  • `cargo test -p fakecloud-e2e --test ecs` — 18 scenarios pass (4 new for services: create/describe/list round-trip, scale up+down, rolling deployment, force-delete)
  • `cargo test -p fakecloud-conformance --test ecs` — 31 conformance tests pass (all 6 new actions have live Smithy-shape checksums)
  • `cargo clippy --workspace --all-targets -- -D warnings` — clean
  • `cargo fmt --all --check` — clean
  • CI green on all checks
  • Cubic review clean

Summary by cubic

Adds ECS services with rolling deployments on top of task execution, with desired-count enforcement and live running/pending counts. Also fixes an over-spawn issue when scaling down during a rolling deployment. Snapshot schema bumped to v3.

  • New Features

    • Added CreateService, UpdateService, DeleteService, DescribeServices, ListServices, ListServicesByNamespace.
    • Scaling: spawns tasks on scale-up; on scale-down flips excess to desiredStatus=STOPPED and SIGTERMs via the runtime.
    • Rolling deployments: new taskDefinition creates a new PRIMARY, marks the previous PRIMARY ACTIVE, and drains old tasks; honors minimumHealthyPercent, maximumPercent, and the deployment circuit breaker.
    • Service linkage and reads: tasks tagged startedBy=ecs-svc/<name>; DescribeServices derives runningCount/pendingCount from live tasks; ListServices supports cluster/launchType/schedulingStrategy filters.
  • Bug Fixes

    • Prevented over-spawning in UpdateService when scaling down and changing taskDefinition in the same call by computing replacements from the effective desired count (not from the total stop set).

Written for commit f7995e3. Summary will update on new commits.

- CreateService spawns tasks to match desiredCount, tagging each with
  `startedBy=ecs-svc/<name>` so the tasks reconcile back to the service
  for later scale/drain.
- UpdateService supports two independent mutations:
  * Scale: new desiredCount spawns additional tasks or flips excess
    tasks to `desiredStatus=STOPPED` and SIGTERMs them via the runtime.
  * Rolling deployment: new taskDefinition marks the previous PRIMARY
    deployment as ACTIVE, creates a new PRIMARY, and drains tasks on
    the old revision while new ones come up. Deployment circuit
    breaker + minimumHealthyPercent/maximumPercent are honoured.
- DeleteService refuses while desiredCount>0 unless force=true; the
  forced path scales to 0 and stops every running task under the
  service before removing it.
- DescribeServices derives runningCount/pendingCount from live task
  state on each read so scale/drain are reflected immediately.
- ListServices + ListServicesByNamespace with cluster/launchType/
  schedulingStrategy filters.
- Snapshot schema bumped to v3 to persist services.
- Conformance tests for all 6 new actions with live checksums.
- E2E scenarios for create/describe/list, scale up + down, rolling
  deployment with a new task definition, and force-delete behavior.
- Website + README updated with services section + new op count.
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 0.38168% with 783 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/fakecloud-ecs/src/service.rs 0.00% 778 Missing ⚠️
crates/fakecloud-ecs/src/state.rs 37.50% 5 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 6 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="crates/fakecloud-ecs/src/service.rs">

<violation number="1" location="crates/fakecloud-ecs/src/service.rs:2195">
P1: Rolling deployment over-spawns tasks when combined with scale-down. `stop.len()` counts tasks stopped for both scaling *and* TD-drain, so new-task spawns exceed `effective_desired`. Use `effective_desired` as the target instead of `stop.len()`.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread crates/fakecloud-ecs/src/service.rs Outdated
… overlap

Addresses Cubic P1 on PR #724. When UpdateService mutated both
desiredCount (down) and taskDefinition in the same call, the rolling
deployment code used `stop.len()` as the replacement target — which
conflated scale-down stops with TD-drain stops and ended up spawning
more tasks than the new desiredCount. Key the replacement count off
`effective_desired - kept_on_new_td - already_spawned` instead so the
post-deploy count matches intent.
@vieiralucas vieiralucas merged commit 8795a2e into main Apr 24, 2026
48 checks passed
@vieiralucas vieiralucas deleted the worktree-worktree-ecs-batch3 branch April 24, 2026 05:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant