Progressive disclosure for catalogs:  load-on-demand schemas

**Is your feature request related to a problem? Please describe.**

Right now `PromptBuilder` inlines the entire A2UI schema into the system prompt on every turn. Every catalog item's full schema goes in, combined into one big `oneOf`. As catalogs grow, and especially for custom catalogs, that creates two problems.

First, token cost. The whole schema is re-sent on every request, so even a modest 16-item custom catalog already runs well over 10k prompt tokens, and it scales linearly with the catalog. We pay that on every turn.

Second, and more important, the model still uses components that aren't in the catalog. Even with everything inlined it will invent something like `Column` for a single-item custom catalog, and the SDK throws `CatalogItemNotFoundException` (#771). That issue was closed by removing the hardcoded standard-catalog examples from the prompt strings, but there's still nothing structural keeping the model to items that actually exist. The reporter put it well: using the SDK with a custom catalog from scratch is "hard / not possible."

Underneath both is the same thing: the model gets every schema at once, with no index in between and no real contract about which components it's actually allowed to use.

**Describe the solution you'd like**

An opt-in catalog mode built on progressive disclosure, with two tiers:

- A manifest that's always in the prompt: just each item's name and a short description. No full schemas, so it stays cheap at any catalog size.
- An on-demand body: a `loadCatalogItems` tool the model calls to pull the exact schema and examples for the components it's about to use, before it emits any A2UI. Load before use.

The host registers `loadCatalogItems` and resolves it against the in-process catalog. If the model asks for a name that doesn't exist, it gets a structured error back and can self-correct on the next turn instead of emitting something unrenderable. The current full-schema behavior stays the default; this is opt-in.

Why I think this helps:

- Fewer tokens: the per-turn prompt only carries the manifest, so input drops sharply. Full schemas are paid once, on demand, and only for what's actually used.
- More purposeful context, not just less of it. With "load before use" the prompt only ever holds the components actually in play, instead of every schema at once. That's cheaper, but the bigger win is signal: a focused, high-signal context is easier for the model to reason over than a large one padded with schemas it will never use. There's good research behind this (see Additional context). On top of that it makes the #771 failure structural to avoid, since the model has to name a real component and receive its real schema before it can use it, rather than being kept in line by prompt wording alone.
- It scales to large and custom catalogs, which is exactly where the current approach hurts most.

This does depend on the catalog id being available in the prompt, since `createSurface` needs it.

**Describe alternatives you've considered**

- Keep inlining the full schema and lean on prompt wording to keep the model in bounds (today's approach, and #771's fix). Doesn't scale on tokens, and gives no guarantee against invented components.
- Trim the inlined schema heuristically, e.g. only the "likely" items. Fragile, guesses intent, and still paid on every turn.
- Retrieval / RAG over catalog items. Heavier infrastructure than a deterministic tool call for an in-process catalog.
- A static, name-only allow-list in the prompt. Tells the model the names but not how to use them, so the schema still has to live somewhere.

**Additional context**

- This mirrors Anthropic's Agent Skills progressive disclosure: name and description are always loaded, and the full body is loaded on demand.
- Why "purposeful loading" helps beyond saving tokens: there's evidence that excess or irrelevant context measurably degrades LLM reasoning ([Shi et al., 2023](https://arxiv.org/abs/2302.00093)), and that models use information worse as it gets buried in a longer context — the "lost in the middle" effect ([Liu et al., 2023](https://arxiv.org/abs/2307.03172)). The practitioner takeaway, from both Anthropic and Google, is to keep prompts high-signal and pull detail in just-in-time via tools rather than front-loading everything ([Anthropic](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents), [Gemini function calling](https://ai.google.dev/gemini-api/docs/function-calling)).
- Initial validation (small, directional): on the simple_chat custom catalog with `gemini-flash-latest`, incremental used ~60% fewer tokens at the same expectation pass-rate. There is a latency spike from the extra `loadCatalogItems` round-trip, so it trades latency for tokens. A fuller eval across more prompts and models would firm this up.
- Related issues: #771 (renders items not in the catalog), #554 (schema descriptions, which the manifest would rely on), #900 (surfacing the catalog id, which incremental needs).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Progressive disclosure for catalogs: load-on-demand schemas #946

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Progressive disclosure for catalogs: load-on-demand schemas #946

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions