-
Notifications
You must be signed in to change notification settings - Fork 1.2k
feat(agent_sdk): add strict_output mode for reliable A2UI-first output #1465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -64,3 +64,25 @@ | |
| - Parent components MUST appear before their child components. | ||
| This specific ordering allows the streaming parser to yield and render the UI incrementally as it arrives. | ||
| """ | ||
|
|
||
| STRICT_WORKFLOW_RULES = f""" | ||
| The generated response MUST follow these rules: | ||
| - Your primary output format is A2UI JSON blocks, NOT conversational text. | ||
| - Each response MUST contain at least one A2UI JSON block wrapped in `{A2UI_OPEN_TAG}` and `{A2UI_CLOSE_TAG}` tags. | ||
| - Output A2UI JSON block(s) FIRST. After the A2UI block(s), you may include at most 1-2 brief sentences of contextual summary. | ||
| - NEVER output conversational text or explanations without an accompanying A2UI JSON block. | ||
| - The JSON part MUST be a single, raw JSON object (usually a list of A2UI messages) and MUST validate against the provided A2UI JSON SCHEMA. | ||
| - Top-Down Component Ordering: Within the `components` list of a message: | ||
| - The 'root' component MUST be the FIRST element. | ||
| - Parent components MUST appear before their child components. | ||
| This specific ordering allows the streaming parser to yield and render the UI incrementally as it arrives. | ||
|
|
||
| Formatting constraints (MANDATORY): | ||
| - NEVER use markdown tables — use Row with Card children for tabular data. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are those markdown constraints too broad? Not all agents needs to avoid markdown entirely. I'd suggest putting them to additional workflow_description, or ui_description instead of the default. |
||
| - NEVER use markdown bullet or numbered lists — use List with Card children instead. | ||
| - NEVER use markdown headers (##, ###) as section dividers — use Text with usageHint "h2" or "h3" inside a Column. | ||
| - NEVER use inline code blocks for data display — use Card with Text children. | ||
| - Wrap blocks of related data in Card components — avoid bare Text at the root level. | ||
| - Use Divider components to visually separate major sections. | ||
| - Use at least 3 different component types per response for visual variety. | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The requirement to use at least 3 different component types per response is a very specific heuristic. While it may encourage visual variety, it could be overly restrictive for simple responses or specific agent tasks, potentially leading to unnecessary UI complexity or 'hallucinated' components just to meet the constraint. Consider making this a recommendation rather than a mandatory constraint, or lowering the threshold.
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1. The |
||
| """ | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -209,11 +209,36 @@ def generate_system_prompt( | |||||||||||||
| include_schema: bool = False, | ||||||||||||||
| include_examples: bool = False, | ||||||||||||||
| validate_examples: bool = False, | ||||||||||||||
| strict_output: bool = False, | ||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The addition of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feedback |
||||||||||||||
| ) -> str: | ||||||||||||||
|
Comment on lines
+212
to
213
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding the To ensure consistency and adhere to object-oriented design principles, please consider adding |
||||||||||||||
| """Assembles the final system instruction for the LLM.""" | ||||||||||||||
| """Assembles the final system instruction for the LLM. | ||||||||||||||
|
|
||||||||||||||
| Args: | ||||||||||||||
| role_description: A description of the agent's role. | ||||||||||||||
| workflow_description: Additional workflow rules appended to the base | ||||||||||||||
| rules. | ||||||||||||||
| ui_description: A description of the UI the agent should generate. | ||||||||||||||
| client_ui_capabilities: A dictionary of client UI capabilities, used | ||||||||||||||
| for catalog selection. | ||||||||||||||
| allowed_components: An optional list of component names to include in | ||||||||||||||
| the prompt. If None, all components are included. | ||||||||||||||
| allowed_messages: An optional list of message names to include in the | ||||||||||||||
| prompt. If None, all messages are included. | ||||||||||||||
| include_schema: Whether to include the JSON schema in the prompt. | ||||||||||||||
| include_examples: Whether to include examples in the prompt. | ||||||||||||||
| validate_examples: Whether to validate examples against the schema. | ||||||||||||||
| strict_output: When True, enforces A2UI-first output ordering, bans | ||||||||||||||
| markdown formatting alternatives, and requires minimum component | ||||||||||||||
| diversity. Recommended for agents whose primary purpose is visual | ||||||||||||||
| UI generation. Defaults to False. | ||||||||||||||
|
|
||||||||||||||
| Returns: | ||||||||||||||
| The assembled system prompt string. | ||||||||||||||
| """ | ||||||||||||||
| parts = [role_description] | ||||||||||||||
|
|
||||||||||||||
| workflow = DEFAULT_WORKFLOW_RULES | ||||||||||||||
| base_rules = STRICT_WORKFLOW_RULES if strict_output else DEFAULT_WORKFLOW_RULES | ||||||||||||||
| workflow = base_rules | ||||||||||||||
|
Comment on lines
+240
to
+241
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The
Suggested change
Comment on lines
+240
to
+241
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The intermediate variable
Suggested change
|
||||||||||||||
| if workflow_description: | ||||||||||||||
| workflow += f"\n{workflow_description}" | ||||||||||||||
| parts.append(f"## Workflow Description:\n{workflow}") | ||||||||||||||
|
|
||||||||||||||
| Original file line number | Diff line number | Diff line change | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
@@ -19,6 +19,7 @@ | |||||||
| from a2ui.basic_catalog.constants import BASIC_CATALOG_NAME | ||||||||
| from a2ui.schema.constants import ( | ||||||||
| DEFAULT_WORKFLOW_RULES, | ||||||||
| STRICT_WORKFLOW_RULES, | ||||||||
| INLINE_CATALOG_NAME, | ||||||||
| VERSION_0_8, | ||||||||
| VERSION_0_9, | ||||||||
|
|
@@ -125,3 +126,151 @@ def open_side_effect(path, *args, **kwargs): | |||||||
| assert len(manager._supported_catalogs) >= 1 | ||||||||
| catalog = manager._supported_catalogs[0] | ||||||||
| assert "LocalText" in catalog.catalog_schema["components"] | ||||||||
|
|
||||||||
|
|
||||||||
| # --- Tests for strict_output parameter --- | ||||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI, those unit tests should be migrated to conformance tests, which are shared across multiple agent SDK languages. |
||||||||
|
|
||||||||
|
|
||||||||
| class TestStrictOutput: | ||||||||
| """Tests for the strict_output parameter on generate_system_prompt().""" | ||||||||
|
|
||||||||
| @pytest.fixture | ||||||||
| def manager(self): | ||||||||
| return A2uiSchemaManager( | ||||||||
| VERSION_0_8, | ||||||||
| catalogs=[BasicCatalog.get_config(VERSION_0_8)], | ||||||||
| ) | ||||||||
|
|
||||||||
| def test_default_uses_default_rules(self, manager): | ||||||||
| """Default behavior (strict_output not set) uses DEFAULT_WORKFLOW_RULES.""" | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| ) | ||||||||
| assert "NEVER use markdown tables" not in prompt | ||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This assertion correctly verifies that strict rules are not present in the default case. To make this test more robust, you could also assert that a rule specific to
Suggested change
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Feedback |
||||||||
|
|
||||||||
| def test_strict_output_false_matches_default(self, manager): | ||||||||
| """Explicitly passing strict_output=False matches default behavior.""" | ||||||||
| default_prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| ) | ||||||||
| explicit_prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| strict_output=False, | ||||||||
| ) | ||||||||
| assert default_prompt == explicit_prompt | ||||||||
|
|
||||||||
| def test_strict_uses_strict_rules(self, manager): | ||||||||
| """strict_output=True uses STRICT_WORKFLOW_RULES.""" | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert ( | ||||||||
| "Your primary output format is A2UI JSON blocks, NOT conversational text" | ||||||||
| in prompt | ||||||||
| ) | ||||||||
|
|
||||||||
| def test_strict_contains_anti_markdown_rules(self, manager): | ||||||||
| """strict_output=True includes anti-markdown formatting rules.""" | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert "NEVER use markdown tables" in prompt | ||||||||
| assert "NEVER use markdown bullet" in prompt | ||||||||
| assert "NEVER use markdown headers" in prompt | ||||||||
|
|
||||||||
| def test_strict_contains_output_ordering(self, manager): | ||||||||
| """strict_output=True mandates A2UI-first output ordering.""" | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert "Output A2UI JSON block(s) FIRST" in prompt | ||||||||
| assert "1-2 brief sentences" in prompt | ||||||||
|
|
||||||||
| def test_strict_contains_component_guidance(self, manager): | ||||||||
| """strict_output=True includes component usage guidance.""" | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert "Row with Card children" in prompt | ||||||||
| assert "List with Card children" in prompt | ||||||||
| assert "Divider" in prompt | ||||||||
| assert "at least 3 different component types" in prompt | ||||||||
|
|
||||||||
| def test_strict_preserves_role_description(self, manager): | ||||||||
| """strict_output=True still includes the role description.""" | ||||||||
| role = "You are a helpful analytics dashboard agent." | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description=role, | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert role in prompt | ||||||||
|
|
||||||||
| def test_strict_preserves_ui_description(self, manager): | ||||||||
| """strict_output=True still includes the UI description.""" | ||||||||
| ui_desc = "Render sales data as Card grids." | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| ui_description=ui_desc, | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert ui_desc in prompt | ||||||||
|
|
||||||||
| def test_strict_preserves_workflow_description(self, manager): | ||||||||
| """strict_output=True appends workflow_description after strict rules.""" | ||||||||
| workflow = "Always confirm before booking." | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| workflow_description=workflow, | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert workflow in prompt | ||||||||
| assert "NEVER use markdown tables" in prompt | ||||||||
|
|
||||||||
| def test_strict_with_schema(self, manager): | ||||||||
| """strict_output=True works with include_schema=True.""" | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| include_schema=True, | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert "NEVER use markdown tables" in prompt | ||||||||
|
|
||||||||
| def test_strict_output_differs_from_default(self, manager): | ||||||||
| """strict_output=True produces different output from default.""" | ||||||||
| default_prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| ) | ||||||||
| strict_prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert default_prompt != strict_prompt | ||||||||
|
|
||||||||
| def test_strict_rules_constant_contains_a2ui_tags(self): | ||||||||
| """STRICT_WORKFLOW_RULES references the correct A2UI tags.""" | ||||||||
| assert "<a2ui-json>" in STRICT_WORKFLOW_RULES | ||||||||
| assert "</a2ui-json>" in STRICT_WORKFLOW_RULES | ||||||||
|
|
||||||||
| def test_strict_rules_has_top_down_ordering(self): | ||||||||
| """STRICT_WORKFLOW_RULES preserves the top-down ordering requirement.""" | ||||||||
| assert "root" in STRICT_WORKFLOW_RULES | ||||||||
| assert ( | ||||||||
| "Parent components MUST appear before their child" in STRICT_WORKFLOW_RULES | ||||||||
| ) | ||||||||
|
|
||||||||
| def test_strict_works_with_v09(self): | ||||||||
| """strict_output=True works with v0.9 catalogs.""" | ||||||||
| manager = A2uiSchemaManager( | ||||||||
| VERSION_0_9, | ||||||||
| catalogs=[BasicCatalog.get_config(VERSION_0_9)], | ||||||||
| ) | ||||||||
| prompt = manager.generate_system_prompt( | ||||||||
| role_description="Test agent", | ||||||||
| strict_output=True, | ||||||||
| ) | ||||||||
| assert "NEVER use markdown tables" in prompt | ||||||||
| assert "Your primary output format is A2UI JSON blocks" in prompt | ||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section, along with the JSON validation rule on line 74, is duplicated from
DEFAULT_WORKFLOW_RULES. To improve maintainability and prevent future inconsistencies, it would be beneficial to extract these common rules into a shared constant.For example, you could create a
_COMMON_WORKFLOW_RULESconstant and include it in bothDEFAULT_WORKFLOW_RULESandSTRICT_WORKFLOW_RULES. This would make the rule sets easier to manage and update.