Skip to content

Foundry Local loses usable local models after catalog refresh failures #471

@jessehouwing

Description

@jessehouwing

Product and version

  • Extension: Windows AI Studio / Foundry Toolkit 1.4.3 (VS Code extension)
  • OS: Windows
  • Hardware: Intel Core Ultra 7, NVIDIA RTX GPU, Intel Arc GPU, Intel NPU

Summary

When Foundry catalog refresh is throttled or fails, the model dropdown falls back to an older built-in list and excludes models that are already downloaded locally.
After that, requests fail with Model not found even though local runtime endpoints still expose at least one usable local model.

Impact

  • Previously available local models disappear from the UI after catalog failures.
  • Inference requests fail with Model not found errors.
  • Users cannot reliably continue offline/local-only workflows during catalog outages or throttling.
  • The experience appears broken even when local model assets are present and usable.

Observed behavior

  1. Catalog refresh repeatedly returns 429 TooManyRequests / QuotaExceeded from Azure Foundry catalog.
  2. Model picker appears to reset to fallback/built-in entries and/or stale selection.
  3. Requests fail with:
    • Model qwen3.5-2b-cuda-gpu:2 not found
  4. Local service remains reachable and returns model data:
    • GET http://localhost:5272/v1/models returns qwen2.5-coder-1.5b-instruct-cuda-gpu:4
    • GET http://localhost:5272/openai/models returns qwen2.5-coder-1.5b-instruct-cuda-gpu:4 and qwen3.5-2b-cuda-gpu:2
    • GET http://localhost:5272/foundry/list returns []
  5. Additional parsing/runtime incompatibility appears for some newer models (for example qwen3.5 config parsing/model type), but the core issue is fallback not preserving known downloaded models.

Expected behavior

  • If catalog refresh fails, fallback should include all successfully downloaded local models discovered from local storage and/or local runtime endpoints.
  • The model picker should remain functional with local-only models, without requiring successful catalog calls.
  • Previously selected model should only be invalidated if local runtime confirms it is unavailable.
  • User should receive a clear warning that catalog is unavailable and local fallback is being used.

Requested fix

Please add an automatic fallback merge strategy:

  1. On catalog failure, build model list from local sources first:
    • downloaded model registry on disk
    • local runtime model endpoints
  2. Merge with cached catalog entries when available, instead of replacing local list.
  3. Never drop downloaded local models from picker due to remote catalog errors.
  4. If selected model id is missing from catalog but present locally, allow local resolution and execution.
  5. Add a visible status banner:
    • Catalog unavailable, showing local models only.
  6. Add telemetry for fallback path usage and count of locally recovered models.

Representative errors seen

  • Received too many requests in a short amount of time. Retry again after 1 seconds.
  • Failed: Fetching model list from Foundry Catalog
  • Model qwen3.5-2b-cuda-gpu:2 not found

Repro steps

  1. Download one or more local models in Foundry Toolkit.
  2. Open model picker and trigger refresh while catalog is throttled (429).
  3. Observe model list reset/fallback behavior.
  4. Attempt inference with previously selected local model id.
  5. Observe Model not found despite local assets being present.

Workaround today

  • Manually reselect a known-good local model id that still resolves in local runtime.
  • Avoid repeated catalog refresh attempts while 429 is active.
  • Workaround is not obvious and is unreliable for normal users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs attentionThe issue needs contributor's attention

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions