Replace pickle with msgspec for IPC serialization#8713
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
48c1137 to
00ec803
Compare
00ec803 to
3d8ac65
Compare
3d8ac65 to
696273c
Compare
There was a problem hiding this comment.
Pull request overview
Replaces Python pickle serialization with msgspec.msgpack for ZeroMQ-based IPC between host and kernel, adding typed decoders on pull channels and tests to validate round-trip behavior and intended schema evolution.
Changes:
- Switch
PushQueuesend path frompickle.dumpstomsgspec.msgpack.encode. - Add per-channel
msgspec.msgpack.Decoder[T]construction for pull channels and use it in the receiver thread. - Add IPC msgpack serialization tests covering union dispatch, nested request objects, primitives, and schema-evolution scenarios.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/_ipc/test_kernel_communication.py | Adds msgpack encode/decode tests for each IPC channel type, including forward/backward compatibility scenarios. |
| marimo/_ipc/queue_proxy.py | Replaces pickle with msgpack on send, and decodes received frames via a per-channel decoder. |
| marimo/_ipc/connection.py | Introduces typed decoders on Channel.Pull and wires pull channels into the receiver thread with their known message types. |
| def test_unknown_fields_are_ignored(self) -> None: | ||
| """Decoder silently drops fields it doesn't recognize. | ||
|
|
||
| This matters when the sender is newer than the receiver (e.g. | ||
| a field was added to a command). msgspec must not reject the | ||
| message — it should decode the known fields and discard the rest. | ||
| """ |
There was a problem hiding this comment.
test_unknown_fields_are_ignored assumes msgspec will silently drop unknown struct fields, but msgspec Struct decoding typically raises a validation error on unknown fields unless the struct is configured to allow them. If forward-compatibility is required (new sender -> old receiver), consider explicitly allowing unknown fields on the relevant msgspec.Struct base (e.g., the Command hierarchy) and keep this test; otherwise, update the test to assert the expected decode failure so it matches actual behavior.
| Channel.Push(context) if sys.platform == "win32" else None | ||
| ), | ||
| input=Channel.Push(context, maxsize=1), | ||
| stream=Channel.Pull(context), | ||
| stream=Channel.Pull(context, msg_type=bytes), | ||
| ) |
There was a problem hiding this comment.
Channel.Pull(..., msg_type=bytes) populates the stream queue with plain bytes, but the stream channel/queue is typed elsewhere as KernelMessage (a NewType over bytes). This can cause type-checking drift and makes it unclear what consumers should expect. Consider aligning the annotation/queue type with the decoder output (either change the stream channel type to bytes or cast/normalize decoded values to KernelMessage).
| assert ch.decoder is not None, ( | ||
| "Pull channel must have a decoder" | ||
| ) | ||
| ch.queue.put(ch.decoder.decode(msg)) |
There was a problem hiding this comment.
Using assert ch.decoder is not None for a required runtime invariant can be bypassed when Python is run with optimizations (-O), leading to a less clear failure later (AttributeError on None.decode). Prefer an explicit runtime check that raises a meaningful exception (or logs and skips) instead of an assert here.
| assert ch.decoder is not None, ( | |
| "Pull channel must have a decoder" | |
| ) | |
| ch.queue.put(ch.decoder.decode(msg)) | |
| decoder = ch.decoder | |
| if decoder is None: | |
| LOGGER.error( | |
| "Pull channel %r has no decoder; dropping message", | |
| ch, | |
| ) | |
| continue | |
| ch.queue.put(decoder.decode(msg)) |
| def Pull( | ||
| cls, context: zmq.Context[zmq.Socket[bytes]], *, maxsize: int = 0 | ||
| cls, | ||
| context: zmq.Context[zmq.Socket[bytes]], | ||
| *, | ||
| msg_type: type[T], | ||
| maxsize: int = 0, | ||
| ) -> Channel[T]: |
There was a problem hiding this comment.
Channel.Pull restricts msg_type to type[T], but callers pass typing unions like CommandMessage/BatchableCommand (and other non-type decode annotations are likely). With strict mypy this will be flagged as an incompatible argument type. Consider widening the parameter type to a msgspec-supported “decode type” (or typing.Any) and adjusting the decoder attribute annotation accordingly so unions are accepted without casts.
373a8a9 to
933737b
Compare
Pickle serialization across the ZeroMQ IPC boundary is fragile because any change to the command types (renaming, adding/removing fields) can silently break deserialization between the host and kernel processes. msgspec.msgpack gives us a proper wire format with well-defined schema evolution rules tied to the existing `Command` struct definitions. Each Pull channel now constructs a `msgspec.msgpack.Decoder[T]` from its known message type at creation time. The receiver thread uses this decoder to deserialize incoming messages back into their correct types, leveraging the discriminated union tags already on `Command` subclasses.
933737b to
ccadefe
Compare
Pickle serialization across the ZeroMQ IPC boundary is fragile because any change to the command types (renaming, adding/removing fields) can silently break deserialization between the host and kernel processes.
msgspec.msgpackgives us a proper (binary) wire format with well-defined schema evolution rules tied to the existingCommandstruct definitions.Each Pull channel now constructs a
msgspec.msgpack.Decoder[T]from its known message type at creation time. The receiver thread uses this decoder to deserialize incoming messages back into their correct types, leveraging the discriminated union tags already onCommandsubclasses.