Summary
The default agents.defaults.model.fallbacks chain ships with google/gemma-4-31b-it (contextWindow: 8192 per modelsConfig), but the agent enforces a minimum of 16000 tokens before it will use a model. So when fallback fires (e.g. primary returns 429 RESOURCE_EXHAUSTED), the chain advances to gemma-4-31b-it which is immediately rejected, the entire chain exhausts, and the user sees a generic All models failed error with no usable recovery.
Reproduction
Easy on any container with the default config when the primary key is rate-limited:
$ docker exec openclaw-demo-typhon openclaw agent --to "+177..." --message "Show me my holdings" --timeout 90
[model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest \
candidate=google/gemini-flash-latest reason=rate_limit \
next=google/gemma-4-31b-it \
detail=Resource has been exhausted (...). RESOURCE_EXHAUSTED
[agent/embedded] low context window: google/gemma-4-31b-it ctx=8192 (warn<32000) source=modelsConfig
[agent/embedded] blocked model (context window too small): google/gemma-4-31b-it ctx=8192 (min=16000) source=modelsConfig
[model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest \
candidate=google/gemma-4-31b-it reason=unknown \
next=none \
detail=Model context window too small (8192 tokens; source=modelsConfig). Minimum is 16000.
FallbackSummaryError: All models failed (2): ...
So the user-visible behavior on hitting the primary rate limit is the agent dying immediately rather than recovering on the configured fallback.
Why this matters
The fallback chain's purpose is to absorb provider failures. Right now the default chain has a single fallback (gemma-4-31b-it) that can't absorb anything because of the ctx-window enforcement. Effectively the agent has no fallback at all — and the configured fallback creates a misleading log trail that obscures the real root cause (the primary's rate-limit error gets buried two levels down in the FallbackSummaryError aggregate).
Suggested fix (any of)
- Filter the default fallback chain at config-load time: drop any model whose
contextWindow is below the agent's enforced minimum, with a clear log line so operators know.
- Replace
gemma-4-31b-it with a fallback that has ≥16000 ctx in the shipped default (groq/llama-3.3-70b-versatile is already a configured provider in the same default config and has 131K context — would be a drop-in).
- Soften the minimum to match the smallest-ctx model in the default chain, with a per-call check that skips ctx-bound prompts to that model.
(2) is probably cleanest for ergonomics; (1) is the most defensible / future-proof.
Adjacent
Versions
- openclaw-demo:latest container, OpenClaw
2026.4.24
- Default config shipped with the container (no operator override)
Summary
The default
agents.defaults.model.fallbackschain ships withgoogle/gemma-4-31b-it(contextWindow: 8192permodelsConfig), but the agent enforces a minimum of 16000 tokens before it will use a model. So when fallback fires (e.g. primary returns 429 RESOURCE_EXHAUSTED), the chain advances togemma-4-31b-itwhich is immediately rejected, the entire chain exhausts, and the user sees a genericAll models failederror with no usable recovery.Reproduction
Easy on any container with the default config when the primary key is rate-limited:
So the user-visible behavior on hitting the primary rate limit is the agent dying immediately rather than recovering on the configured fallback.
Why this matters
The fallback chain's purpose is to absorb provider failures. Right now the default chain has a single fallback (
gemma-4-31b-it) that can't absorb anything because of the ctx-window enforcement. Effectively the agent has no fallback at all — and the configured fallback creates a misleading log trail that obscures the real root cause (the primary's rate-limit error gets buried two levels down in the FallbackSummaryError aggregate).Suggested fix (any of)
contextWindowis below the agent's enforced minimum, with a clear log line so operators know.gemma-4-31b-itwith a fallback that has ≥16000 ctx in the shipped default (groq/llama-3.3-70b-versatile is already a configured provider in the same default config and has 131K context — would be a drop-in).(2) is probably cleanest for ergonomics; (1) is the most defensible / future-proof.
Adjacent
Versions
2026.4.24