feat: Retry Iron Loop executor on API overload (529) with configurable backoff

## Problem

When the Anthropic API returns HTTP 529 (overloaded) during an Iron Loop executor run,
the agent terminates mid-execution. Depending on when the overload hits, this leaves
the plan in one of two states:

1. **Pre-run overload** (no tools executed yet): plan stays `in-progress`, status file
   shows `working`, but nothing was written. Safe to auto-retry.
2. **Mid-run overload** (some steps completed, files written to disk): same `.status`
   state, but the implementation is partially on disk. Auto-retry here risks duplicate
   writes or inconsistent state — a human should review before continuing.

Currently there is no recovery path. The operator must:
- Notice the plan is stuck (no dashboard signal)
- Inspect what was written
- Decide whether to retry or finish manually

This creates real friction on long executor runs (6–10 Iron Loop steps), where a single
529 terminates hours of progress.

---

## Proposed solution (three-layer change)

### Layer 1 — executor agent definition

In `agents/iron-loop/iron-loop-executor.md`, add an instruction to the agent:

> If the API returns overloaded (529) before any tool writes have been made in the
> current step, write `status: "overload-retry"` to the plan's `.status` file and call
> `ScheduleWakeup` with a configurable interval. If writes have already occurred,
> write `status: "overload-partial"` instead, so a human gate can review.

### Layer 2 — state layer (`actions.js` / `background.js`)

- Add `overload-retry` and `overload-partial` to the status enum.
- In `advanceAgent()`: if the current plan's status is `overload-retry`, treat it like
  `working` and resume from the last completed step marker, rather than skipping to the
  next plan.
- `overload-partial` should behave like a human gate: block `advanceAgent()` until a
  human clears it.

### Layer 3 — dashboard display (`menu-screens.js`)

Replace the spinning `◐` with a human-readable indicator:

| Status             | Display              |
|--------------------|----------------------|
| `overload-retry`   | `⏳ retry in Xm`     |
| `overload-partial` | `⚠ partial — review` |

The "retry in Xm" countdown could read from the `.status` file's `retry_at` timestamp.

### Config schema (`.ctoc/settings.yaml`)

```yaml
retry:
  overload_interval_seconds: 600   # default: 10 min
```

---

## Open questions for maintainers

1. **Preferred layer for retry logic:** Should the executor agent drive the retry
   (writing `overload-retry` + calling ScheduleWakeup itself), or should this be
   handled entirely in the state layer with the agent just exiting cleanly?

2. **Step-level resume vs full restart:** The executor processes steps sequentially
   (7–15). Is there an existing step-marker mechanism that would allow resuming from
   step N, or would a restart from step 7 be the safer default?

3. **`ScheduleWakeup` availability:** Is `ScheduleWakeup` available inside executor
   agent context, or does the resume need to go through a different scheduling
   mechanism (e.g., a cron entry in `.ctoc/`)?

4. **Scope appetite:** Happy to submit a focused PR for just Layer 1 (agent instruction
   only, no state-layer changes) as a first step if that's easier to review.

---

Happy to contribute this — just want to confirm the preferred approach before writing
code. Flagging the mid-run vs pre-run distinction upfront because it's the design
decision with the most downstream impact.

Status	Display
`overload-retry`	`⏳ retry in Xm`
`overload-partial`	`⚠ partial — review`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Retry Iron Loop executor on API overload (529) with configurable backoff #6

Problem

Proposed solution (three-layer change)

Layer 1 — executor agent definition

Layer 2 — state layer (`actions.js` / `background.js`)

Layer 3 — dashboard display (`menu-screens.js`)

Config schema (`.ctoc/settings.yaml`)

Open questions for maintainers

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

feat: Retry Iron Loop executor on API overload (529) with configurable backoff #6

Description

Problem

Proposed solution (three-layer change)

Layer 1 — executor agent definition

Layer 2 — state layer (actions.js / background.js)

Layer 3 — dashboard display (menu-screens.js)

Config schema (.ctoc/settings.yaml)

Open questions for maintainers

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Layer 2 — state layer (`actions.js` / `background.js`)

Layer 3 — dashboard display (`menu-screens.js`)

Config schema (`.ctoc/settings.yaml`)