fix(ecs): surface failure reason on /logs when task fails to start#829
Merged
vieiralucas merged 1 commit intomainfrom Apr 28, 2026
Merged
fix(ecs): surface failure reason on /logs when task fails to start#829vieiralucas merged 1 commit intomainfrom
vieiralucas merged 1 commit intomainfrom
Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
e2814f9 to
5dc1243
Compare
Tasks that died before the container started (secret resolution miss,
image pull error, `docker run` failure) went through `finalize_failure`,
which did not set `task.captured_logs`. The HTTP introspection endpoint
`/_fakecloud/ecs/tasks/{id}/logs` therefore returned an empty string,
leaving E2E assertions like `ecs_task_resolves_secretsmanager_secret`
with `"secret not injected: "` and no diagnostic.
- `finalize_failure` now writes `[task failed to start]: <reason>` into
`task.captured_logs`, mirroring the assignment `finalize_stopped`
already does on the success path.
- The `run_task` error handler also `eprintln!`s the reason so nextest's
captured-output for a failed E2E surfaces it directly.
- Add unit test `finalize_failure_writes_reason_into_captured_logs`.
This is a diagnostic surface fix only; the underlying intermittent
`resolve_secret` miss in CI (if real and not just disk pressure
side-effect) is left for a follow-up once the diagnostics expose its
root cause.
5dc1243 to
2e33b98
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Stabilize CI runner flakes (3/3) — make ECS task failures diagnosable.
The ECS secrets E2E test `ecs_task_resolves_secretsmanager_secret` has been failing intermittently in CI with `secret not injected: ` (note the empty string after the colon). Empty captured-logs means `finalize_failure` ran instead of `finalize_stopped` — i.e. the task died before the container started — but `finalize_failure` was not setting `task.captured_logs`, so the HTTP introspection endpoint `/_fakecloud/ecs/tasks/{id}/logs` returned an empty string and the assertion had nothing to attribute the failure to.
This PR is a diagnostic surface fix:
The actual intermittency (real `resolve_secret` race vs disk-pressure side-effect of PRs #827/#828) is intentionally left untouched here; once those land and CI logs show the real reason text on a failure, we'll know which root cause to fix.
Sibling PRs: #827 (nextest retries + disk diagnostics), #828 (linker OOM via test profile debuginfo).
Test plan
Summary by cubic
Show ECS task start failure reasons on
/_fakecloud/ecs/tasks/{id}/logsand stderr so failures no longer appear as empty logs.task.captured_logsinfinalize_failureto "[task failed to start]: ".run_taskfor clearer nextest output.finalize_failure_writes_reason_into_captured_logs.Written for commit 2e33b98. Summary will update on new commits. Review in cubic