Skip to content

feat: Model fallback when configured model is unavailable #29191

@lpcox

Description

@lpcox

Problem

When a workflow specifies a model (or the auto-selector picks one) that is not available in the current environment, the entire job fails with:

CAPIError: 400 The requested model is not supported.

This was recently observed in sweagentd#11264 where the auto model selector chose gpt-5.2 (not yet supported), causing all CCA jobs for the GitHub org to fail.

The same class of failure can occur in agentic workflows when:

  • A workflow specifies a model in frontmatter (model: gpt-5.2) that becomes unavailable
  • A model is deprecated or removed between workflow authoring and execution
  • Organization-level model policies restrict certain models
  • Regional availability differs from where the workflow was authored

Proposed Solution

Add model fallback logic to the workflow engine:

1. Pre-flight model validation

Before starting the agent loop, query the /models endpoint (or equivalent) to check if the configured model is available. If not, select a fallback.

2. Fallback strategy

# In workflow frontmatter
model: gpt-5.2
model-fallback:
  - gpt-4.1        # first fallback
  - claude-sonnet-4-20250514  # second fallback
  - auto            # ultimate fallback: let the engine pick from available models

If model-fallback is not specified, a sensible default chain could be used (e.g., the closest capability-equivalent model that IS available).

3. Logging

When a fallback is triggered, emit a warning in the workflow logs:

[WARN] Configured model "gpt-5.2" is not available. Falling back to "gpt-4.1".

This ensures visibility without failing the job.

Alternatives Considered

  • Fail fast with actionable error: Instead of silent fallback, fail with a message listing available models. This is the current behavior and causes job failures.
  • Compile-time validation: gh aw compile --validate could check model availability, but this only helps at authoring time — models can become unavailable later.
  • Engine-level fallback: Let each engine (Copilot, Claude, Codex) handle fallback internally. Downside: inconsistent behavior across engines.

Related

  • sweagentd#11264 — CCA jobs failing with model_not_supported using auto selector (closed/resolved)
  • Model availability can vary by org, region, and over time as models are deprecated or added

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions