Skip to content

Python: accept AG-UI state data URI parameters#6905

Open
VectorPeak wants to merge 2 commits into
microsoft:mainfrom
VectorPeak:codex-agui-state-data-uri-params
Open

Python: accept AG-UI state data URI parameters#6905
VectorPeak wants to merge 2 commits into
microsoft:mainfrom
VectorPeak:codex-agui-state-data-uri-params

Conversation

@VectorPeak

@VectorPeak VectorPeak commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Motivation & Context

What Problem This Solves

AG-UI state extraction already gates state carrier content through Content.from_uri(...), content.type == "data", and content.media_type == "application/json". However, after those structured checks passed, the extractor still re-checked the raw URI string against one exact prefix:

data:application/json;base64,

That made the raw URI metadata stricter than the parsed content model. A valid JSON data URI with media type parameters, such as:

data:application/json;charset=utf-8;base64,...

can still parse as data content with media_type == "application/json", but previously failed state extraction because the raw metadata segment did not exactly match the hard-coded prefix.

The mismatch meant a final AG-UI message could carry valid JSON/base64 state, pass the structured content gates, and still leave state unset. The fix stays narrow: accept parameterized application/json data URIs only when the metadata still includes base64, then decode and parse that payload as JSON rather than broadening state extraction to arbitrary URI or non-JSON content.

Changes

This PR parses the data URI metadata segment instead of matching one exact string prefix.

The state extraction path remains intentionally narrow:

  • The content must still be data content.
  • The parsed media type must still be application/json.
  • The URI metadata must still include the base64 marker.
  • The payload is decoded with strict base64 validation before JSON parsing.
  • Invalid base64 and invalid JSON continue to fall through without extracting state.

The reviewer follow-up tightened malformed payload behavior by using base64.b64decode(..., validate=True) and catching binascii.Error in the existing warning/fallthrough path.

Evidence

Regression coverage was added for a parameterized JSON state URI:

data:application/json;charset=utf-8;base64,<encoded-json-state>

The test verifies that this form now populates the AG-UI state field and removes the state carrier message from the returned message list, matching the existing non-parameterized JSON/base64 behavior.

Additional regression coverage was added for malformed base64:

data:application/json;base64,not-valid-base64!

That test verifies malformed base64 does not become state and the original messages are preserved.

This evidence is limited to the focused AG-UI regression tests in python/packages/ag-ui/tests/ag_ui/test_ag_ui_client.py; it does not claim a full repository CI or typecheck run.

Possible call chain / impact

final AG-UI message content
  -> Content.from_uri(...)
  -> content.type == "data"
  -> content.media_type == "application/json"
  -> AG-UI _extract_state_from_messages(...)
  -> data URI metadata parse
  -> validated base64 decode
  -> JSON decode
  -> state extraction and message-list cleanup

The main impact is that AG-UI state extraction now accepts JSON/base64 state data URIs with media type parameters such as charset=utf-8. Non-JSON data content, missing base64 metadata, invalid base64, and invalid JSON continue to avoid becoming AG-UI state through the fallback path.

This should be a narrow parsing fix for valid JSON/base64 data URI metadata rather than a broad expansion of what can be treated as AG-UI state.

Description & Review Guide

  • What are the major changes?

    • Parse the data URI metadata segment instead of matching one exact prefix.
    • Preserve the requirement that AG-UI state data must be application/json and base64 encoded.
    • Add regression coverage for a JSON state data URI with charset=utf-8.
  • What is the impact of these changes?

    • Parameterized JSON state data URIs now populate the AG-UI state field and are removed from the message list like the existing non-parameterized form.
    • Non-JSON data content and malformed state payloads continue to fall through without becoming state.
  • What do you want reviewers to focus on?

    • Whether the metadata parsing keeps the state extraction narrow enough while accepting valid media type parameters.

Related Issue

Fixes #6902

Contribution Checklist

  • The code builds clean without any errors or warnings
  • All unit tests pass, and I have added new tests where possible
  • The PR follows the Contribution Guidelines
  • This PR is linked to an issue and there is no other open PR for this issue (see Related Issue above).
  • This is not a breaking change. If it is a breaking change, add the breaking change label (or add "[BREAKING]" to the title prefix, before or after any language prefix) - a workflow keeps the label and title prefix in sync automatically.

Copilot AI review requested due to automatic review settings July 3, 2026 12:13
@giles17 giles17 added the python Usage: [Issues, PRs], Target: Python label Jul 3, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the Python AG-UI client’s state extraction to accept standards-valid, parameterized JSON data URIs (e.g. including charset=utf-8) while still requiring application/json and base64 encoding, matching the broader data-URI parsing behavior elsewhere in the framework.

Changes:

  • Parse the data URI metadata segment (instead of matching a single hard-coded prefix) to support media type parameters.
  • Keep AG-UI state extraction constrained to application/json + base64.
  • Add a regression test covering data:application/json;charset=utf-8;base64,... state extraction.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/packages/ag-ui/agent_framework_ag_ui/_client.py Broadened AG-UI state data-URI parsing to accept parameterized JSON data URIs.
python/packages/ag-ui/tests/ag_ui/test_ag_ui_client.py Added regression coverage for parameterized JSON data URIs (charset=utf-8).

Comment on lines +299 to 303
if prefix.startswith("data:") and media_type == "application/json" and "base64" in parameters:
import base64

encoded_data = uri.split(",", 1)[1] # type: ignore[union-attr]
decoded_bytes = base64.b64decode(encoded_data)
state = json.loads(decoded_bytes.decode("utf-8"))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 829a628 by validating base64 with base64.b64decode(..., validate=True) and catching binascii.Error in the existing warning/fallthrough path. I also added a malformed-base64 regression test that preserves the original message list and returns state is None.

Validation run locally:

  • uv run pytest packages/ag-ui/tests/ag_ui/test_ag_ui_client.py -q
  • uv run ruff check packages/ag-ui/agent_framework_ag_ui/_client.py packages/ag-ui/tests/ag_ui/test_ag_ui_client.py
  • git diff --check -- python/packages/ag-ui/agent_framework_ag_ui/_client.py python/packages/ag-ui/tests/ag_ui/test_ag_ui_client.py

@VectorPeak

Copy link
Copy Markdown
Contributor Author

@copilot-pull-request-reviewer I addressed the review feedback in 829a628 by switching to strict base64 validation, catching binascii.Error in the existing warning/fallthrough path, and adding malformed-base64 regression coverage.

Local validation completed:

  • uv run pytest packages/ag-ui/tests/ag_ui/test_ag_ui_client.py -q
  • uv run ruff check packages/ag-ui/agent_framework_ag_ui/_client.py packages/ag-ui/tests/ag_ui/test_ag_ui_client.py
  • git diff --check -- python/packages/ag-ui/agent_framework_ag_ui/_client.py python/packages/ag-ui/tests/ag_ui/test_ag_ui_client.py

Could you please re-review when you get a chance?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Usage: [Issues, PRs], Target: Python

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Python: AG-UI state extraction ignores parameterized JSON data URIs

3 participants