Extend Activity Schema to Support Multimodal Interactions with Streaming#423
Closed
Extend Activity Schema to Support Multimodal Interactions with Streaming#423
Conversation
Copilot AI
added a commit
that referenced
this pull request
Feb 26, 2026
…ing (PR #423) Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com>
tracyboehrer
approved these changes
Feb 26, 2026
This PR implements the approved proposal from issue #416 to extend the Activity Protocol schema for multimodal interactions with streaming support for voice/audio. Changes: - Added Reserved Events for Media Streaming (Media.Start, Media.Chunk, Media.End, Voice.Message) - Extended streaminfo entity to support media streaming with streamState property - Added Session Lifecycle Commands (session.init, session.update, session.end) for multimodal interactions - Bumped version to Provisional 3.4 Key design decisions (per AP Core Committee): - No new activity types - uses existing event, command, commandResult - No new schema fields - uses existing value, valueType, entities - 100% backward compatible - Uses streamInfo entity for stream metadata and sequencing - Uses Media.* prefix for media streaming events Related: #416
… changes) Per discussion on #416, the existing streaminfo entity properties are sufficient for media streaming: - streamType uses existing values: 'streaming', 'final' (not new 'audio'/'video') - valueType on the event activity identifies the media type - No need for new streamState property This ensures zero schema changes to streaminfo entity while supporting multimodal media streaming.
Added separate examples in streaminfo section: - Text Streaming: Existing example using typing/message activities - Voice/Media Streaming: New example showing Media.Start, Media.Chunk, Media.End, and Voice.Message events with streaminfo entities Both examples demonstrate consistent use of streamType values (streaming, final) while different activity types and valueType distinguish the modality.
Based on comprehensive review of proposal #416: 1. Added Implementation Note for Voice.Message explaining: - Why event is used instead of message (SDK validation limitation) - Protocol does allow value/valueType on message (A2005) - Reference to future APv4 vision (#377) 2. Added Error Handling section (A5260-A5262): - Handling Media.Chunk without Media.Start - Stream error signaling via streamResult - Resilience requirements for missing chunks 3. Added Note clarifying session.* commands are reserved protocol commands (not subject to application/* namespace requirement per A6301) These additions address gaps identified during comprehensive review and capture the open discussion points from the proposal.
Added the detailed client-server interaction example from proposal #416: - Session handshake with embedded readiness state - Media streaming events (start, chunk, end) - Optional state updates (thinking, speaking) with threshold notes - Final Voice.Message delivery - Explanatory notes about optional steps This provides a complete reference for implementers to understand the end-to-end flow of a voice streaming session.
- Split large JSON code blocks with comments into separate blocks - Added descriptive headers before each example - Used proper code block language specifiers (json, text) - Reorganized multimodal interaction flow into numbered steps - Added blockquotes for explanatory notes - Removed invalid JSON comments (JSON doesn't support //) This improves readability when the spec is rendered in GitHub, documentation sites, and other markdown viewers.
24e75dc to
6117896
Compare
Copilot AI
added a commit
that referenced
this pull request
Mar 2, 2026
…hanges Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com>
Contributor
Author
gurubhg
added a commit
that referenced
this pull request
Mar 12, 2026
…Streaming (#468) * Initial plan * Update schema spec to align with final multimodal proposal from issue #416 Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * Move Voice message from Event activity to Message activity section Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * Cross-link multimodal sections; clarify voice message context and full interaction flow Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * Fix bidirectionality: Voice message and Media streaming events work in both directions Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * Fix Event activity intro: Media streaming events are bidirectional, not client-to-Agent only Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * spec: enrich Appendix I with PR-#423-style summaries for multimodal changes Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * revert: restore Appendix I to original content (no spec file changes) Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * spec: rewrite Appendix I entry for PR #468 in structured format (intro + Changes + Key design decisions + Related) Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * spec: restore Appendix I to original content and update PR description to match PR #423 format for PR #468 changes Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> * Fix casing of 'streamInfo' to 'streaminfo' --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: gurubhg <89311725+gurubhg@users.noreply.github.com> Co-authored-by: tracyboehrer <tracyboehrer@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR implements the approved proposal from issue #416 to extend the Activity Protocol schema for multimodal interactions with streaming support for voice/audio.
Changes:
Key design decisions (per AP Core Committee):
Related: #416