Skip to content

Wiki Writer runs failing with transient error that are not transient, but timeoouts #278

@mrjf

Description

@mrjf

● Read TypeScript session SDK index (shell)
│ cat clients/typescript-session-sdk/src/index.ts 2>/dev/null | head -50
└ 7 lines...
● Request failed due to a transient API error. Retrying...
● Get repo remote URL and default branch (shell)
│ cat .git/config | grep -A3 '[remote'
└ 5 lines...
● Get current branch (shell)
│ cat .git/HEAD
└ 2 lines...
● Request failed due to a transient API error. Retrying...
● Request failed due to a transient API error. Retrying...

The agent loaded ~3000+ lines of source material into context then asked the model to generate all wiki pages at once. This creates a response that's likely 20-30K+ tokens — exceeding either the output token limit or the per-request timeout on the Copilot API proxy. The API returns a timeout/error, Copilot CLI logs it as "transient," retries, and hits the same limit again.

What to fix agentic-wiki-writer:

  1. Strengthen the batching language — the current prompt says "generate them in batches to avoid API timeouts" but the agent treats it as optional. Make it a hard constraint, e.g.:
    ▎ MANDATORY: Never generate more than 4 pages per push-wiki call. Never spawn a sub-agent to "generate all pages." Process pages sequentially, 4 at a time.
  2. Prohibit sub-agents for generation — the agent spawned a background agent to generate everything, which bypassed the batching logic. Add:
    ▎ Do NOT use sub-agents or background agents for page generation. Generate pages directly in the main conversation loop.
  3. Reduce context loading The prompt's summary/caching mechanism (summary--{path}.md) is designed to avoid this, but on first run there are no cached summaries. Consider adding:
    ▎ For large source files (>500 lines), read only the sections relevant to the current batch of pages, not the entire file.
  4. Add explicit output size guidance — tell the agent each page should be kept under a reasonable size (e.g., 2-3KB) and that the total push-wiki JSON payload should stay under 30KB per call.

The root issue is that the prompt's phased generation instructions aren't forceful enough, and the agent chose to be "clever" by spawning a sub-agent to do everything at once — hitting the exact timeout the batching was designed to prevent.

Finally, can we make this error message better, so we know it's timeouts when timeouts are the issue?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions