Problem
Per the Audit-Workflows daily report (discussion #34537), a new failure mode appeared on 2026-05-24: two unrelated workflows died because the awf-squid sidecar container failed its healthcheck before the agent could start.
dependency failed to start: container awf-squid is unhealthy
[ERROR] Failed to start containers: docker compose up -d --pull never exit 1
Affected runs:
- Lockfile Statistics (§26371995223)
- PR Sous Chef (§26370977767)
Both workflows lost their entire run with zero useful output — a flaky sidecar shouldn't be cause for a hard fail.
Suggested Fix
In the awf bootstrap script (look in pkg/agentdrain/ or the docker-compose templates), add a short retry around the docker compose up -d step:
- 2-3 retries
- 5-10 second backoff
- If still failing, capture
docker logs awf-squid into the run output so root cause is visible
If the squid healthcheck itself is the problem, also worth investigating whether the healthcheck command is too strict for cold-start conditions.
Impact
Makes the agent runtime resilient to a transient sidecar bring-up failure. Today this surfaced 2 lost runs; if the pattern recurs at scale it could be much more.
Effort
~1-2 hours — add retry + log capture, test against a run that previously failed.
Suggested Agent
Any infrastructure-leaning coding agent.
Source: DeepReport Intelligence Briefing 2026-05-25
Generated by 🔬 DeepReport - Intelligence Gathering Agent · opus47 7.7M · ◷
Problem
Per the Audit-Workflows daily report (discussion #34537), a new failure mode appeared on 2026-05-24: two unrelated workflows died because the
awf-squidsidecar container failed its healthcheck before the agent could start.Affected runs:
Both workflows lost their entire run with zero useful output — a flaky sidecar shouldn't be cause for a hard fail.
Suggested Fix
In the
awfbootstrap script (look inpkg/agentdrain/or the docker-compose templates), add a short retry around thedocker compose up -dstep:docker logs awf-squidinto the run output so root cause is visibleIf the squid healthcheck itself is the problem, also worth investigating whether the healthcheck command is too strict for cold-start conditions.
Impact
Makes the agent runtime resilient to a transient sidecar bring-up failure. Today this surfaced 2 lost runs; if the pattern recurs at scale it could be much more.
Effort
~1-2 hours — add retry + log capture, test against a run that previously failed.
Suggested Agent
Any infrastructure-leaning coding agent.
Source: DeepReport Intelligence Briefing 2026-05-25