Add auto-segmentation to the serial profile (#730)#730
Closed
Cyan4973 wants to merge 1 commit into
Closed
Conversation
|
@Cyan4973 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103759746. |
Cyan4973
added a commit
to Cyan4973/openzl
that referenced
this pull request
May 5, 2026
Summary: The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation. This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`. A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`. Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`. As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients. This also adds: - A new public macro, `ZL_SEGMENT_SERIAL` - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h` - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2` The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`. Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs. Differential Revision: D103759746
2df5a92 to
e01f152
Compare
Cyan4973
added a commit
to Cyan4973/openzl
that referenced
this pull request
May 5, 2026
Summary: The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation. This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`. A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`. Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`. As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients. This also adds: - A new public macro, `ZL_SEGMENT_SERIAL` - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h` - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2` The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`. Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs. Differential Revision: D103759746
Cyan4973
added a commit
to Cyan4973/openzl
that referenced
this pull request
May 5, 2026
Summary: The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation. This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`. A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`. Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`. As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients. This also adds: - A new public macro, `ZL_SEGMENT_SERIAL` - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h` - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2` The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`. Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs. Differential Revision: D103759746
e01f152 to
f0ab248
Compare
f0ab248 to
725875f
Compare
Summary: The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation. This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`. A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`. Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`. As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients. This also adds: - A new public macro, `ZL_SEGMENT_SERIAL` - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h` - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2` The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`. Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs. Reviewed By: kevinjzhang Differential Revision: D103759746
725875f to
e7aad0d
Compare
|
This pull request has been merged in c57e2f4. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
The
serialprofile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation.This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with
--chunk-size.A new standard segmenter,
SEGM_serial, is added and modeled directly onSEGM_numFromSerial. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameterZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM.Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to
ZL_GRAPH_COMPRESS_GENERIC.As with the other segmenters, when
formatVersion < ZL_CHUNK_VERSION_MIN, the segmenter emits a single chunk so the resulting frame remains decodable by older clients.This also adds:
ZL_SEGMENT_SERIALZL_Compressor_buildSerialSegmenter[2], ininclude/openzl/codecs/zl_segmenters.hgraphs::SegmentSerial, modeled ongraphs::SDDL2The CLI
serialprofile now reads--chunk-sizeand wraps its existingACE+LZgraph with the new segmenter. As a result, it now setssupportsChunkSize_ = true.Finally, the new standard graph ID,
ZL_StandardGraphID_segment_serial, is appended to the public enum before_public_end, preserving all existing wire-format IDs.Reviewed By: kevinjzhang
Differential Revision: D103759746