Skip to content

feat(agent_sdk): add strict_output mode for reliable A2UI-first output#1465

Open
jswortz wants to merge 2 commits into
google:mainfrom
jswortz:feat/strict-output-mode
Open

feat(agent_sdk): add strict_output mode for reliable A2UI-first output#1465
jswortz wants to merge 2 commits into
google:mainfrom
jswortz:feat/strict-output-mode

Conversation

@jswortz
Copy link
Copy Markdown

@jswortz jswortz commented May 20, 2026

Summary

  • Add strict_output: bool = False parameter to A2uiSchemaManager.generate_system_prompt() that enforces A2UI-first output ordering when enabled
  • Add STRICT_WORKFLOW_RULES constant with anti-markdown rules and component usage guidance
  • Add 14 test cases validating strict mode behavior and backward compatibility

Addresses #1415

Motivation

LLMs frequently ignore A2UI schema instructions and fall back to markdown formatting — even with explicit "Your final output MUST be A2UI JSON" in the role description. The current DEFAULT_WORKFLOW_RULES says "you can provide conversational text between or around blocks", which gives the LLM permission to skip A2UI entirely.

The restaurant_finder sample works around this by manually hardcoding "Your final output MUST be a a2ui UI JSON response" in ROLE_DESCRIPTION. This shouldn't be necessary — the SDK should offer a built-in enforcement mode.

Approach

When strict_output=True, generate_system_prompt() uses STRICT_WORKFLOW_RULES instead of DEFAULT_WORKFLOW_RULES. The strict rules add:

  1. Output ordering — A2UI JSON blocks MUST appear before any conversational text. Text limited to 1-2 brief sentences after the A2UI block.

  2. Anti-markdown rules — Explicit bans on markdown tables, bullet/numbered lists, and markdown headers as section dividers, with A2UI component alternatives specified for each.

  3. Component wrapping — Data blocks must be wrapped in Card components. Sections separated by Divider.

  4. Minimum diversity — At least 3 different component types per response.

The DEFAULT_WORKFLOW_RULES and all existing behavior are unchanged when strict_output=False (the default).

Changes

File Change
agent_sdks/python/src/a2ui/schema/constants.py Add STRICT_WORKFLOW_RULES constant
agent_sdks/python/src/a2ui/schema/manager.py Add strict_output parameter to generate_system_prompt()
agent_sdks/python/tests/schema/test_schema_manager.py Add 14 test cases for strict mode

Usage

from a2ui.schema.manager import A2uiSchemaManager
from a2ui.basic_catalog.provider import BasicCatalog

manager = A2uiSchemaManager(
    version='0.8',
    catalogs=[BasicCatalog.get_config('0.8')],
)

# Before: manual enforcement in role_description (fragile)
prompt = manager.generate_system_prompt(
    role_description="You are an assistant. Your final output MUST be A2UI JSON.",
    include_schema=True,
)

# After: SDK-level enforcement (reliable)
prompt = manager.generate_system_prompt(
    role_description="You are an assistant.",
    include_schema=True,
    strict_output=True,
)

Test plan

  • All existing test_schema_manager.py tests pass (backward compatibility)
  • strict_output=False (default) produces identical output to before
  • strict_output=True includes anti-markdown rules, output ordering, component guidance
  • Strict mode works with both v0.8 and v0.9 catalogs
  • Strict mode preserves role_description, ui_description, workflow_description
  • STRICT_WORKFLOW_RULES constant contains correct A2UI tags and top-down ordering
  • ./scripts/fix_format.sh passes (Pyink formatting)

Production validation

This enforcement pattern has been battle-tested across 3 production agents deployed on Vertex AI Agent Engine (Gemini 3.5 Flash), serving a grocery retail workshop demo with:

  • Main assistant agent (SOP checklists, product cards, KPI dashboards)
  • Analytics dashboard agent (BigQuery results as Row/Card grids)
  • A/B test simulator agent (Tabs-based strategy comparison surfaces)

Before adding anti-markdown rules, ~30-40% of responses fell back to markdown. After, the rate dropped to <5%.

jswortz added 2 commits May 20, 2026 00:36
Addresses google#1415 — LLMs frequently ignore A2UI instructions and fall back
to markdown. Adds a `strict_output=True` parameter to
`generate_system_prompt()` that enforces A2UI-first output ordering,
bans markdown alternatives, and requires component diversity.

Changes:
- Add STRICT_WORKFLOW_RULES constant to constants.py
- Add strict_output parameter to generate_system_prompt() in manager.py
- Add 14 test cases for strict mode in test_schema_manager.py
@google-cla
Copy link
Copy Markdown

google-cla Bot commented May 20, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a strict_output mode for generating system prompts, which enforces specific formatting rules for LLM responses, such as prioritizing A2UI JSON blocks and prohibiting standard markdown formatting like tables, lists, and headers. The changes include the definition of STRICT_WORKFLOW_RULES, updates to the generate_system_prompt method in A2uiSchemaManager, and a comprehensive suite of unit tests. Feedback provided by the reviewer suggests adhering to the Liskov Substitution Principle by updating the base class signature, refactoring constants to reduce duplication, and refining code logic and test assertions for better clarity and robustness.

Comment on lines +212 to 213
strict_output: bool = False,
) -> str:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Adding the strict_output parameter to this method without also adding it to the generate_system_prompt method in the InferenceStrategy abstract base class violates the Liskov Substitution Principle. While Python's dynamic typing allows this, it's a good practice to maintain compatible signatures between a base class and its subclasses.

To ensure consistency and adhere to object-oriented design principles, please consider adding strict_output: bool = False to the abstract method in InferenceStrategy.

Comment on lines +74 to +78
- The JSON part MUST be a single, raw JSON object (usually a list of A2UI messages) and MUST validate against the provided A2UI JSON SCHEMA.
- Top-Down Component Ordering: Within the `components` list of a message:
- The 'root' component MUST be the FIRST element.
- Parent components MUST appear before their child components.
This specific ordering allows the streaming parser to yield and render the UI incrementally as it arrives.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This section, along with the JSON validation rule on line 74, is duplicated from DEFAULT_WORKFLOW_RULES. To improve maintainability and prevent future inconsistencies, it would be beneficial to extract these common rules into a shared constant.

For example, you could create a _COMMON_WORKFLOW_RULES constant and include it in both DEFAULT_WORKFLOW_RULES and STRICT_WORKFLOW_RULES. This would make the rule sets easier to manage and update.

Comment on lines +240 to +241
base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES
workflow = base_rules
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The base_rules variable is only used once to assign to workflow. You can simplify these two lines by assigning the result of the conditional expression directly to workflow.

Suggested change
base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES
workflow = base_rules
workflow = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES

prompt = manager.generate_system_prompt(
role_description="Test agent",
)
assert "NEVER use markdown tables" not in prompt
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This assertion correctly verifies that strict rules are not present in the default case. To make this test more robust, you could also assert that a rule specific to DEFAULT_WORKFLOW_RULES is present, confirming that the correct rule set was chosen.

Suggested change
assert "NEVER use markdown tables" not in prompt
assert "NEVER use markdown tables" not in prompt
assert "Between or around these blocks" in prompt

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a strict_output mode for system prompt generation, adding STRICT_WORKFLOW_RULES to enforce A2UI-specific formatting and component diversity. The A2uiSchemaManager was updated to support this parameter, accompanied by comprehensive unit tests. Feedback includes a critical observation regarding a signature mismatch with the InferenceStrategy base class, a suggestion to simplify variable assignment in the prompt generation logic, and a recommendation to reconsider the mandatory nature of the component diversity constraint.

include_schema: bool = False,
include_examples: bool = False,
validate_examples: bool = False,
strict_output: bool = False,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The addition of the strict_output parameter to generate_system_prompt creates a signature mismatch with the abstract base class InferenceStrategy. This violates the Liskov Substitution Principle and will cause issues with static type checkers (like MyPy or Pyright) when the manager is used polymorphically as an InferenceStrategy. Please update the base class in agent_sdks/python/src/a2ui/inference_strategy.py to include this parameter as well to maintain a consistent interface.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feedback

Comment on lines +240 to +241
base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES
workflow = base_rules
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The intermediate variable base_rules is redundant as it is only used once to initialize workflow. You can simplify this by assigning the result of the conditional expression directly to workflow.

Suggested change
base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES
workflow = base_rules
workflow = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES

- NEVER use inline code blocks for data display — use Card with Text children.
- Wrap blocks of related data in Card components — avoid bare Text at the root level.
- Use Divider components to visually separate major sections.
- Use at least 3 different component types per response for visual variety.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The requirement to use at least 3 different component types per response is a very specific heuristic. While it may encourage visual variety, it could be overly restrictive for simple responses or specific agent tasks, potentially leading to unnecessary UI complexity or 'hallucinated' components just to meet the constraint. Consider making this a recommendation rather than a mandatory constraint, or lowering the threshold.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. The at least 3 different component types per response requirement seems arbitrary, which could lead to unnecessarily complex layouts. I'd suggest moving it to ui_description instead of the default here.

@ditman

This comment was marked as resolved.

@nan-yu
Copy link
Copy Markdown
Collaborator

nan-yu commented May 26, 2026

Currently, strict_output is a simple boolean, which limits extensibility. I'd recommend using an enum to make this more future-proof and clearer about the semantic meaning.

class A2UIOutputMode(Enum):
    """How the LLM should prioritize A2UI vs. text."""
    A2UI_FIRST = "a2ui_first"  # A2UI blocks first, minimal text after
    TEXT_FIRST = "text_first"  # Text and A2UI can be interleaved (default)
    # Future: STRICT_A2UI, A2UI_ONLY for even stricter modes

See a related comment on #1466 (comment) for more details.

This specific ordering allows the streaming parser to yield and render the UI incrementally as it arrives.

Formatting constraints (MANDATORY):
- NEVER use markdown tables — use Row with Card children for tabular data.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are those markdown constraints too broad? Not all agents needs to avoid markdown entirely. I'd suggest putting them to additional workflow_description, or ui_description instead of the default.

- NEVER use inline code blocks for data display — use Card with Text children.
- Wrap blocks of related data in Card components — avoid bare Text at the root level.
- Use Divider components to visually separate major sections.
- Use at least 3 different component types per response for visual variety.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. The at least 3 different component types per response requirement seems arbitrary, which could lead to unnecessarily complex layouts. I'd suggest moving it to ui_description instead of the default here.

assert "LocalText" in catalog.catalog_schema["components"]


# --- Tests for strict_output parameter ---
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, those unit tests should be migrated to conformance tests, which are shared across multiple agent SDK languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

4 participants