Skip to content

Progressive disclosure for catalogs: load-on-demand schemas #946

@leoafarias

Description

@leoafarias

Is your feature request related to a problem? Please describe.

Right now PromptBuilder inlines the entire A2UI schema into the system prompt on every turn. Every catalog item's full schema goes in, combined into one big oneOf. As catalogs grow, and especially for custom catalogs, that creates two problems.

First, token cost. The whole schema is re-sent on every request, so even a modest 16-item custom catalog already runs well over 10k prompt tokens, and it scales linearly with the catalog. We pay that on every turn.

Second, and more important, the model still uses components that aren't in the catalog. Even with everything inlined it will invent something like Column for a single-item custom catalog, and the SDK throws CatalogItemNotFoundException (#771). That issue was closed by removing the hardcoded standard-catalog examples from the prompt strings, but there's still nothing structural keeping the model to items that actually exist. The reporter put it well: using the SDK with a custom catalog from scratch is "hard / not possible."

Underneath both is the same thing: the model gets every schema at once, with no index in between and no real contract about which components it's actually allowed to use.

Describe the solution you'd like

An opt-in catalog mode built on progressive disclosure, with two tiers:

  • A manifest that's always in the prompt: just each item's name and a short description. No full schemas, so it stays cheap at any catalog size.
  • An on-demand body: a loadCatalogItems tool the model calls to pull the exact schema and examples for the components it's about to use, before it emits any A2UI. Load before use.

The host registers loadCatalogItems and resolves it against the in-process catalog. If the model asks for a name that doesn't exist, it gets a structured error back and can self-correct on the next turn instead of emitting something unrenderable. The current full-schema behavior stays the default; this is opt-in.

Why I think this helps:

  • Fewer tokens: the per-turn prompt only carries the manifest, so input drops sharply. Full schemas are paid once, on demand, and only for what's actually used.
  • More purposeful context, not just less of it. With "load before use" the prompt only ever holds the components actually in play, instead of every schema at once. That's cheaper, but the bigger win is signal: a focused, high-signal context is easier for the model to reason over than a large one padded with schemas it will never use. There's good research behind this (see Additional context). On top of that it makes the Gemini trying to render items that are not in the catalog #771 failure structural to avoid, since the model has to name a real component and receive its real schema before it can use it, rather than being kept in line by prompt wording alone.
  • It scales to large and custom catalogs, which is exactly where the current approach hurts most.

This does depend on the catalog id being available in the prompt, since createSurface needs it.

Describe alternatives you've considered

  • Keep inlining the full schema and lean on prompt wording to keep the model in bounds (today's approach, and Gemini trying to render items that are not in the catalog #771's fix). Doesn't scale on tokens, and gives no guarantee against invented components.
  • Trim the inlined schema heuristically, e.g. only the "likely" items. Fragile, guesses intent, and still paid on every turn.
  • Retrieval / RAG over catalog items. Heavier infrastructure than a deterministic tool call for an in-process catalog.
  • A static, name-only allow-list in the prompt. Tells the model the names but not how to use them, so the schema still has to live somewhere.

Additional context

  • This mirrors Anthropic's Agent Skills progressive disclosure: name and description are always loaded, and the full body is loaded on demand.
  • Why "purposeful loading" helps beyond saving tokens: there's evidence that excess or irrelevant context measurably degrades LLM reasoning (Shi et al., 2023), and that models use information worse as it gets buried in a longer context — the "lost in the middle" effect (Liu et al., 2023). The practitioner takeaway, from both Anthropic and Google, is to keep prompts high-signal and pull detail in just-in-time via tools rather than front-loading everything (Anthropic, Gemini function calling).
  • Initial validation (small, directional): on the simple_chat custom catalog with gemini-flash-latest, incremental used ~60% fewer tokens at the same expectation pass-rate. There is a latency spike from the extra loadCatalogItems round-trip, so it trades latency for tokens. A fuller eval across more prompts and models would firm this up.
  • Related issues: Gemini trying to render items that are not in the catalog #771 (renders items not in the catalog), Add descriptions to schemas for all catalog items to improve generation quality #554 (schema descriptions, which the manifest would rely on), PromptBuilder doesn't surface the Catalog's id #900 (surfacing the catalog id, which incremental needs).

Metadata

Metadata

Assignees

No one assigned

    Labels

    front-line-handledCan wait until the second-line triage. The front-line triage already checked if it's a P0.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions