feat(agent_sdk): add strict_output mode for reliable A2UI-first output#1465
feat(agent_sdk): add strict_output mode for reliable A2UI-first output#1465jswortz wants to merge 2 commits into
Conversation
Addresses google#1415 — LLMs frequently ignore A2UI instructions and fall back to markdown. Adds a `strict_output=True` parameter to `generate_system_prompt()` that enforces A2UI-first output ordering, bans markdown alternatives, and requires component diversity. Changes: - Add STRICT_WORKFLOW_RULES constant to constants.py - Add strict_output parameter to generate_system_prompt() in manager.py - Add 14 test cases for strict mode in test_schema_manager.py
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
There was a problem hiding this comment.
Code Review
This pull request introduces a strict_output mode for generating system prompts, which enforces specific formatting rules for LLM responses, such as prioritizing A2UI JSON blocks and prohibiting standard markdown formatting like tables, lists, and headers. The changes include the definition of STRICT_WORKFLOW_RULES, updates to the generate_system_prompt method in A2uiSchemaManager, and a comprehensive suite of unit tests. Feedback provided by the reviewer suggests adhering to the Liskov Substitution Principle by updating the base class signature, refactoring constants to reduce duplication, and refining code logic and test assertions for better clarity and robustness.
| strict_output: bool = False, | ||
| ) -> str: |
There was a problem hiding this comment.
Adding the strict_output parameter to this method without also adding it to the generate_system_prompt method in the InferenceStrategy abstract base class violates the Liskov Substitution Principle. While Python's dynamic typing allows this, it's a good practice to maintain compatible signatures between a base class and its subclasses.
To ensure consistency and adhere to object-oriented design principles, please consider adding strict_output: bool = False to the abstract method in InferenceStrategy.
| - The JSON part MUST be a single, raw JSON object (usually a list of A2UI messages) and MUST validate against the provided A2UI JSON SCHEMA. | ||
| - Top-Down Component Ordering: Within the `components` list of a message: | ||
| - The 'root' component MUST be the FIRST element. | ||
| - Parent components MUST appear before their child components. | ||
| This specific ordering allows the streaming parser to yield and render the UI incrementally as it arrives. |
There was a problem hiding this comment.
This section, along with the JSON validation rule on line 74, is duplicated from DEFAULT_WORKFLOW_RULES. To improve maintainability and prevent future inconsistencies, it would be beneficial to extract these common rules into a shared constant.
For example, you could create a _COMMON_WORKFLOW_RULES constant and include it in both DEFAULT_WORKFLOW_RULES and STRICT_WORKFLOW_RULES. This would make the rule sets easier to manage and update.
| base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES | ||
| workflow = base_rules |
There was a problem hiding this comment.
The base_rules variable is only used once to assign to workflow. You can simplify these two lines by assigning the result of the conditional expression directly to workflow.
| base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES | |
| workflow = base_rules | |
| workflow = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES |
| prompt = manager.generate_system_prompt( | ||
| role_description="Test agent", | ||
| ) | ||
| assert "NEVER use markdown tables" not in prompt |
There was a problem hiding this comment.
This assertion correctly verifies that strict rules are not present in the default case. To make this test more robust, you could also assert that a rule specific to DEFAULT_WORKFLOW_RULES is present, confirming that the correct rule set was chosen.
| assert "NEVER use markdown tables" not in prompt | |
| assert "NEVER use markdown tables" not in prompt | |
| assert "Between or around these blocks" in prompt |
There was a problem hiding this comment.
Code Review
This pull request introduces a strict_output mode for system prompt generation, adding STRICT_WORKFLOW_RULES to enforce A2UI-specific formatting and component diversity. The A2uiSchemaManager was updated to support this parameter, accompanied by comprehensive unit tests. Feedback includes a critical observation regarding a signature mismatch with the InferenceStrategy base class, a suggestion to simplify variable assignment in the prompt generation logic, and a recommendation to reconsider the mandatory nature of the component diversity constraint.
| include_schema: bool = False, | ||
| include_examples: bool = False, | ||
| validate_examples: bool = False, | ||
| strict_output: bool = False, |
There was a problem hiding this comment.
The addition of the strict_output parameter to generate_system_prompt creates a signature mismatch with the abstract base class InferenceStrategy. This violates the Liskov Substitution Principle and will cause issues with static type checkers (like MyPy or Pyright) when the manager is used polymorphically as an InferenceStrategy. Please update the base class in agent_sdks/python/src/a2ui/inference_strategy.py to include this parameter as well to maintain a consistent interface.
| base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES | ||
| workflow = base_rules |
There was a problem hiding this comment.
The intermediate variable base_rules is redundant as it is only used once to initialize workflow. You can simplify this by assigning the result of the conditional expression directly to workflow.
| base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES | |
| workflow = base_rules | |
| workflow = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES |
| - NEVER use inline code blocks for data display — use Card with Text children. | ||
| - Wrap blocks of related data in Card components — avoid bare Text at the root level. | ||
| - Use Divider components to visually separate major sections. | ||
| - Use at least 3 different component types per response for visual variety. |
There was a problem hiding this comment.
The requirement to use at least 3 different component types per response is a very specific heuristic. While it may encourage visual variety, it could be overly restrictive for simple responses or specific agent tasks, potentially leading to unnecessary UI complexity or 'hallucinated' components just to meet the constraint. Consider making this a recommendation rather than a mandatory constraint, or lowering the threshold.
There was a problem hiding this comment.
+1. The at least 3 different component types per response requirement seems arbitrary, which could lead to unnecessarily complex layouts. I'd suggest moving it to ui_description instead of the default here.
This comment was marked as resolved.
This comment was marked as resolved.
|
Currently, class A2UIOutputMode(Enum):
"""How the LLM should prioritize A2UI vs. text."""
A2UI_FIRST = "a2ui_first" # A2UI blocks first, minimal text after
TEXT_FIRST = "text_first" # Text and A2UI can be interleaved (default)
# Future: STRICT_A2UI, A2UI_ONLY for even stricter modesSee a related comment on #1466 (comment) for more details. |
| This specific ordering allows the streaming parser to yield and render the UI incrementally as it arrives. | ||
|
|
||
| Formatting constraints (MANDATORY): | ||
| - NEVER use markdown tables — use Row with Card children for tabular data. |
There was a problem hiding this comment.
Are those markdown constraints too broad? Not all agents needs to avoid markdown entirely. I'd suggest putting them to additional workflow_description, or ui_description instead of the default.
| - NEVER use inline code blocks for data display — use Card with Text children. | ||
| - Wrap blocks of related data in Card components — avoid bare Text at the root level. | ||
| - Use Divider components to visually separate major sections. | ||
| - Use at least 3 different component types per response for visual variety. |
There was a problem hiding this comment.
+1. The at least 3 different component types per response requirement seems arbitrary, which could lead to unnecessarily complex layouts. I'd suggest moving it to ui_description instead of the default here.
| assert "LocalText" in catalog.catalog_schema["components"] | ||
|
|
||
|
|
||
| # --- Tests for strict_output parameter --- |
There was a problem hiding this comment.
FYI, those unit tests should be migrated to conformance tests, which are shared across multiple agent SDK languages.
Summary
strict_output: bool = Falseparameter toA2uiSchemaManager.generate_system_prompt()that enforces A2UI-first output ordering when enabledSTRICT_WORKFLOW_RULESconstant with anti-markdown rules and component usage guidanceAddresses #1415
Motivation
LLMs frequently ignore A2UI schema instructions and fall back to markdown formatting — even with explicit "Your final output MUST be A2UI JSON" in the role description. The current
DEFAULT_WORKFLOW_RULESsays "you can provide conversational text between or around blocks", which gives the LLM permission to skip A2UI entirely.The
restaurant_findersample works around this by manually hardcoding"Your final output MUST be a a2ui UI JSON response"inROLE_DESCRIPTION. This shouldn't be necessary — the SDK should offer a built-in enforcement mode.Approach
When
strict_output=True,generate_system_prompt()usesSTRICT_WORKFLOW_RULESinstead ofDEFAULT_WORKFLOW_RULES. The strict rules add:Output ordering — A2UI JSON blocks MUST appear before any conversational text. Text limited to 1-2 brief sentences after the A2UI block.
Anti-markdown rules — Explicit bans on markdown tables, bullet/numbered lists, and markdown headers as section dividers, with A2UI component alternatives specified for each.
Component wrapping — Data blocks must be wrapped in Card components. Sections separated by Divider.
Minimum diversity — At least 3 different component types per response.
The
DEFAULT_WORKFLOW_RULESand all existing behavior are unchanged whenstrict_output=False(the default).Changes
agent_sdks/python/src/a2ui/schema/constants.pySTRICT_WORKFLOW_RULESconstantagent_sdks/python/src/a2ui/schema/manager.pystrict_outputparameter togenerate_system_prompt()agent_sdks/python/tests/schema/test_schema_manager.pyUsage
Test plan
test_schema_manager.pytests pass (backward compatibility)strict_output=False(default) produces identical output to beforestrict_output=Trueincludes anti-markdown rules, output ordering, component guidanceSTRICT_WORKFLOW_RULESconstant contains correct A2UI tags and top-down ordering./scripts/fix_format.shpasses (Pyink formatting)Production validation
This enforcement pattern has been battle-tested across 3 production agents deployed on Vertex AI Agent Engine (Gemini 3.5 Flash), serving a grocery retail workshop demo with:
Before adding anti-markdown rules, ~30-40% of responses fell back to markdown. After, the rate dropped to <5%.