[SDKs] Self-heal catalog cache on BYOM cache-miss by bmehta001 · Pull Request #760 · microsoft/Foundry-Local

bmehta001 · 2026-06-03T00:35:36Z

The SDKs cache the catalog in memory with a 6-hour TTL. When a user manually drops a model directory (BYOM) into the cache while the SDK is running, the SDK's in-memory map is stale: Core returns the BYOM id from get_cached_models IPC and get_model_list, but the SDK filters it out because its modelIdToModelVariant map does not yet know the id.

Add a force-refresh path on cache-miss in the four user-facing entry points:

get_cached_models / get_loaded_models: when any id in the fresh list returned by Core does not resolve against the local map, force a catalog refresh and re-resolve only the unresolved ids.
get_model(alias) / get_model_variant(id): on miss, force a refresh and retry the lookup once.

Added calls to await UpdateModels(ct).ConfigureAwait(false); in the C# SDK for consistency with the other SDKs, though it is not completely necessary for the BYOM feature

Validated against a real BYOM directory dropped into the cache at runtime

The v1 SDKs (js, python, cs, rust) cache the catalog in memory with a 6-hour TTL. When a user manually drops a model directory (BYOM) into the cache while the SDK is running, the SDK's in-memory map is stale: Core returns the BYOM id from `get_cached_models` IPC and `get_model_list`, but the SDK filters it out because its `modelIdToModelVariant` map does not yet know the id. Add a force-refresh path on cache-miss in the four user-facing entry points: - get_cached_models / get_loaded_models: when any id in the fresh list returned by Core does not resolve against the local map, force a catalog refresh and re-resolve only the unresolved ids. - get_model(alias) / get_model_variant(id): on miss, force a refresh and retry the lookup once. No new public APIs and no new IPC commands. Relies on the FLCore `GetModelsAsync` override that surfaces BYOM stubs. Validated end-to-end via the Python SDK against a real BYOM directory dropped into the cache at runtime: get_cached_models now returns the BYOM, get_model resolves the alias, and Core's view is consistent. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

vercel · 2026-06-03T00:35:43Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
foundry-local	Ready	Preview, Comment	Jun 3, 2026 2:27am

Copilot

Pull request overview

Adds “self-healing” catalog refresh behavior across the Rust, Python, JS, and C# SDKs so that when the on-disk BYOM cache changes while the SDK is running (and Core returns new IDs), the SDK refreshes its in-memory catalog on cache-miss and retries resolution.

Changes:

Add a forced refresh + retry path to get_model(alias) / get_model_variant(id) when the initial lookup misses.
Add forced refresh-on-miss when resolving IDs returned by Core for get_cached_models / get_loaded_models, via new helper resolvers.
Add force/force: true parameters to the various “update models” caching methods to bypass the 6-hour TTL when self-healing.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
sdk/rust/src/catalog.rs	Adds forced-refresh retry on alias/id miss and a helper to resolve cached/loaded model IDs.
sdk/python/src/catalog.py	Adds `force` refresh support and self-heal retry logic for lookups and cached/loaded model resolution.
sdk/js/src/catalog.ts	Adds `force` refresh support and self-heal retry logic plus ID resolution helper for cached/loaded APIs.
sdk/cs/src/Catalog.cs	Adds forced-refresh retry logic and shared ID-resolution helper; updates `UpdateModels` signature to support forced refresh.

PR #760 review feedback: the previous resolve_model_ids implementation in all four SDKs (Rust, Python, JS, C#) split the input model_ids into `resolved` and `unresolved` lists in a single pass, then on cache-miss forced a refresh and appended the newly-resolvable IDs after the first batch. This reordered the output relative to the input model_ids list, which is a behavioral regression vs. pre-PR get_cached_models / get_loaded_models that returned variants in Core's ID order (minus unknowns). Switch all four SDKs to the same shape: 1. First pass: check whether any id is missing (short-circuit on first miss); this needs no scratch storage. 2. If anything is missing, force-refresh once. 3. Final pass: iterate model_ids in input order and collect variants present in the catalog (filter-map style). The single final pass preserves input ordering on both the warm path (no refresh) and the self-heal path (after refresh). The JS dedupe behavior via Set<IModel> is preserved. Side benefit: -32 net lines across the four files; the implementation now matches the structure the reviewer suggested. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

Addresses PR #760 review comments: - Python get_model/get_model_variant and C# GetModelImplAsync/GetModelVariantImplAsync now short-circuit on null/empty/whitespace input, mirroring the validation already present in Rust and JS. Without this, those entry points triggered the wasteful self-heal forced refresh just to return None on bad input. - Adds Python unit tests for self-heal on get_model and get_cached_models (mock _MockCoreInterop returns no models on warm fetch, then BYOM on forced refresh) and a no-refresh-on-empty-input test that warms the catalog and asserts the forced refresh is not triggered by '', ' ', or None. - Adds matching JS self-heal unit tests for getModel/getModelVariant and getCachedModels using the same mock pattern. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

bmehta001 · 2026-06-03T08:20:03Z

+        // Self-heal: the alias may belong to a BYOM model added to the cache
+        // directory after our last catalog refresh.
+        self.invalidator.invalidate();
+        self.update_models().await?;


Real but narrow race: window is ~one HTTP call to the local Core, and only manifests if two concurrent callers both miss on the same just-added BYOM at nearly the same instant. Worst case is a one-shot Unknown alias error that a caller-side retry resolves immediately.

Self-heal-on-cache-miss is scoped here as best-effort recovery for the single-caller BYOM scenario, not as a strict contract for concurrent BYOM discovery. Promoting it to a coalesced refresh_now (per-request timestamp under refresh_gate) is a clean follow-up but out of scope for this PR.

bmehta001 · 2026-06-03T08:20:05Z

+        // Self-heal: the id may belong to a BYOM model added to the cache
+        // directory after our last catalog refresh.
+        self.invalidator.invalidate();
+        self.update_models().await?;


Real but narrow race: window is ~one HTTP call to the local Core, and only manifests if two concurrent callers both miss on the same just-added BYOM at nearly the same instant. Worst case is a one-shot Unknown alias error that a caller-side retry resolves immediately.

Self-heal-on-cache-miss is scoped here as best-effort recovery for the single-caller BYOM scenario, not as a strict contract for concurrent BYOM discovery. Promoting it to a coalesced refresh_now (per-request timestamp under refresh_gate) is a clean follow-up but out of scope for this PR.

bmehta001 · 2026-06-03T08:20:08Z

+        if needs_refresh {
+            self.invalidator.invalidate();
+            self.update_models().await?;
+        }


Real but narrow race: window is ~one HTTP call to the local Core, and only manifests if two concurrent callers both miss on the same just-added BYOM at nearly the same instant. Worst case is a one-shot Unknown alias error that a caller-side retry resolves immediately.

Self-heal-on-cache-miss is scoped here as best-effort recovery for the single-caller BYOM scenario, not as a strict contract for concurrent BYOM discovery. Promoting it to a coalesced refresh_now (per-request timestamp under refresh_gate) is a clean follow-up but out of scope for this PR.

Copilot AI review requested due to automatic review settings June 3, 2026 00:35

Copilot started reviewing on behalf of bmehta001 June 3, 2026 00:35 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread sdk/rust/src/catalog.rs Outdated

Comment thread sdk/python/src/catalog.py

Comment thread sdk/js/src/catalog.ts Outdated

Comment thread sdk/cs/src/Catalog.cs Outdated

vercel Bot deployed to Preview June 3, 2026 01:29 View deployment

bmehta001 self-assigned this Jun 3, 2026

bmehta001 requested a review from Copilot June 3, 2026 02:02

Copilot started reviewing on behalf of bmehta001 June 3, 2026 02:02 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Comment thread sdk/python/src/catalog.py

Comment thread sdk/python/src/catalog.py

Comment thread sdk/cs/src/Catalog.cs

Comment thread sdk/cs/src/Catalog.cs

Comment thread sdk/js/src/catalog.ts

Comment thread sdk/python/src/catalog.py

vercel Bot deployed to Preview June 3, 2026 02:27 View deployment

bmehta001 requested a review from Copilot June 3, 2026 07:44

Copilot started reviewing on behalf of bmehta001 June 3, 2026 07:45 View session

Copilot AI reviewed Jun 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SDKs] Self-heal catalog cache on BYOM cache-miss#760

[SDKs] Self-heal catalog cache on BYOM cache-miss#760
bmehta001 wants to merge 3 commits into
mainfrom
bhamehta/byom-sdk-self-heal

bmehta001 commented Jun 3, 2026

Uh oh!

vercel Bot commented Jun 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

bmehta001 Jun 3, 2026

Uh oh!

bmehta001 Jun 3, 2026

Uh oh!

bmehta001 Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bmehta001 commented Jun 3, 2026

Uh oh!

vercel Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

bmehta001 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

bmehta001 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

bmehta001 Jun 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 3, 2026 •

edited

Loading