feat(python, rust!): Read/write support for IPC streams in DataFrames #10606
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Adds support for reading and writing Arrow IPC streams (see "Streaming format" in https://arrow.apache.org/docs/python/ipc.html) to py-polars DataFrames. py-polars had support for IPC files (formerly "Feather" files), but did not have support for IPC streams (which are essentially just streams of RecordBatches). IPC streams are useful for reading from and writing to network streams of RecordBatches, without random access / seek support.
This feature does not add IPC stream support to LazyFrames. I don't think it makes much sense to copy the existing
scan_ipc
andsink_ipc
for streams, because these only support files on disk, but I think IPC streams make the most sense in streaming contexts (such as network I/O). I've created for an issue for this; see #10605.Includes a potentially breaking Rust change as well, changing the with_compression argument type to IpcStreamWriter to match that of IpcWriter (polars_io::ipc::IpcCompression, instead of a type from arrow2).