feat(pkg-py): segment-based message storage for mixed content type streaming#213
Draft
cpsievert wants to merge 10 commits into
Draft
feat(pkg-py): segment-based message storage for mixed content type streaming#213cpsievert wants to merge 10 commits into
cpsievert wants to merge 10 commits into
Conversation
Replace separate _current_stream_message/deps tracking with a unified list of ContentSegment objects that preserve per-chunk content type (markdown vs html) and HTML dependencies. This enables correct round-tripping of mixed-content streams through bookmark save/restore. Key changes: - ContentSegment dataclass and StoredContentSegment TypedDict for runtime and serialized segment representations - BookmarkMessageDict includes optional segments for bookmark state - _restore_bookmark_message replays multi-segment messages as streaming sequences to preserve content type boundaries on restore - _send_append_message accepts explicit content_type to override inference from the message object
Add optional `segments` field to MessagePayload on both JS and Python sides. The JS client uses segments directly when present, falling back to synthesizing a single segment from content + content_type. On the Python side, _restore_bookmark_message collapses from a streaming replay (chunk_start/chunk/chunk_end) to a single message send. Segment html_deps are hoisted to the envelope. Redundant top-level deps storage on StoredMessage is removed when segments carry them.
… types Replace deepcopy with a targeted copy_segments that only copies what's needed (segment dataclass fields, not HTMLDependency objects). Also break BookmarkMessageDict inheritance from ChatMessageDict since they have different dep semantics, and fix the append_to_segments early-return guard.
09bf542 to
edd8836
Compare
StoredMessage now stores only role + segments. Content and html_deps are computed properties. Wire protocol MessagePayload carries segments exclusively (no top-level content/content_type). Bookmark format uses segments with a legacy shim for old bookmarks missing the key.
Remove the `deps` parameter from both methods — deps are now always attached directly to segments at the point they're created, rather than threaded through storage helpers.
6 tasks
Update R's chat_append_message to send segments array in chunk_start and complete message payloads, matching the Python wire format. Rebuild JS with simplified content model (no top-level contentType).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_current_stream_message+_current_stream_deps) with a segment list that preserves per-chunk content type boundaries during streamingsegmentsto the wire protocol (MessagePayload) so the client can faithfully restore mixed-content messages from bookmarks_chat_segments.pywith dedicated helpers for append, copy, serialize, and content/dep extractionMotivation
When an assistant response streams a mix of content types, the Python server previously concatenated everything into a single flat string tagged with one
content_type. This worked fine for live streaming (the client tracked segments internally), but broke on bookmark restore: the server had lost the content type boundaries, so it sent the entire message back tagged as the first content type.The bug specifically surfaces when a message starts with HTML content and then switches to markdown. On restore, the whole message gets tagged as HTML (the first content type), causing the trailing markdown to be rendered as raw HTML instead of being markdown-processed. (The reverse — markdown first followed by HTML — is less visibly broken because HTML passed through a markdown renderer is often passable.)
The deeper issue is that the server's internal state didn't reflect what the client already knew: a message is an ordered list of typed segments, not a flat string with a single type. This PR aligns the two, making bookmark save/restore correct by construction.
This also lays groundwork for an upcoming PR that introduces an additional content type, which will make mixed-type messages more common and the restore bug more noticeable.
Test plan