Skip to content

Add auto-segmentation to the serial profile (#730)#730

Closed
Cyan4973 wants to merge 1 commit into
facebook:devfrom
Cyan4973:export-D103759746
Closed

Add auto-segmentation to the serial profile (#730)#730
Cyan4973 wants to merge 1 commit into
facebook:devfrom
Cyan4973:export-D103759746

Conversation

@Cyan4973
Copy link
Copy Markdown
Contributor

@Cyan4973 Cyan4973 commented May 5, 2026

Summary:

The serial profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation.

This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with --chunk-size.

A new standard segmenter, SEGM_serial, is added and modeled directly on SEGM_numFromSerial. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM.

Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to ZL_GRAPH_COMPRESS_GENERIC.

As with the other segmenters, when formatVersion < ZL_CHUNK_VERSION_MIN, the segmenter emits a single chunk so the resulting frame remains decodable by older clients.

This also adds:

  • A new public macro, ZL_SEGMENT_SERIAL
  • Two builder helpers, ZL_Compressor_buildSerialSegmenter[2], in include/openzl/codecs/zl_segmenters.h
  • A typed C++ wrapper, graphs::SegmentSerial, modeled on graphs::SDDL2

The CLI serial profile now reads --chunk-size and wraps its existing ACE+LZ graph with the new segmenter. As a result, it now sets supportsChunkSize_ = true.

Finally, the new standard graph ID, ZL_StandardGraphID_segment_serial, is appended to the public enum before _public_end, preserving all existing wire-format IDs.

Reviewed By: kevinjzhang

Differential Revision: D103759746

@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 5, 2026

@Cyan4973 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D103759746.

@meta-cla meta-cla Bot added the cla signed label May 5, 2026
@Cyan4973 Cyan4973 self-assigned this May 5, 2026
@meta-codesync meta-codesync Bot changed the title Add auto-segmentation to the serial profile Add auto-segmentation to the serial profile (#730) May 5, 2026
Cyan4973 added a commit to Cyan4973/openzl that referenced this pull request May 5, 2026
Summary:

The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation.

This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`.

A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`.

Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`.

As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients.

This also adds:

  - A new public macro, `ZL_SEGMENT_SERIAL`
  - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h`
  - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2`

The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`.

Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs.

Differential Revision: D103759746
@Cyan4973 Cyan4973 force-pushed the export-D103759746 branch 2 times, most recently from 2df5a92 to e01f152 Compare May 5, 2026 21:19
Cyan4973 added a commit to Cyan4973/openzl that referenced this pull request May 5, 2026
Summary:

The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation.

This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`.

A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`.

Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`.

As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients.

This also adds:

  - A new public macro, `ZL_SEGMENT_SERIAL`
  - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h`
  - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2`

The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`.

Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs.

Differential Revision: D103759746
Cyan4973 added a commit to Cyan4973/openzl that referenced this pull request May 5, 2026
Summary:

The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation.

This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`.

A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`.

Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`.

As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients.

This also adds:

  - A new public macro, `ZL_SEGMENT_SERIAL`
  - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h`
  - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2`

The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`.

Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs.

Differential Revision: D103759746
@Cyan4973 Cyan4973 force-pushed the export-D103759746 branch from e01f152 to f0ab248 Compare May 5, 2026 21:19
@meta-codesync meta-codesync Bot changed the title Add auto-segmentation to the serial profile (#730) Add auto-segmentation to the serial profile May 5, 2026
@Cyan4973 Cyan4973 force-pushed the export-D103759746 branch from f0ab248 to 725875f Compare May 5, 2026 23:27
Summary:

The `serial` profile can no longer ingest inputs larger than 4 GiB, likely due to the internal LZ engine being updated from Zstandard to the new implementation.

This change applies the same approach used by numeric profiles: large inputs are automatically split into smaller, more manageable chunks. The default chunk size is 16 MiB, and users can override it with `--chunk-size`.

A new standard segmenter, `SEGM_serial`, is added and modeled directly on `SEGM_numFromSerial`. It splits serial input by byte size, using a default of 16 MiB. The chunk size can also be configured through the new public local integer parameter `ZL_SEGMENT_SERIAL_CHUNK_BYTE_SIZE_PARAM`.

Each chunk is forwarded to a successor graph. If no successor is provided, the segmenter falls back to `ZL_GRAPH_COMPRESS_GENERIC`.

As with the other segmenters, when `formatVersion < ZL_CHUNK_VERSION_MIN`, the segmenter emits a single chunk so the resulting frame remains decodable by older clients.

This also adds:

  - A new public macro, `ZL_SEGMENT_SERIAL`
  - Two builder helpers, `ZL_Compressor_buildSerialSegmenter[2]`, in `include/openzl/codecs/zl_segmenters.h`
  - A typed C++ wrapper, `graphs::SegmentSerial`, modeled on `graphs::SDDL2`

The CLI `serial` profile now reads `--chunk-size` and wraps its existing `ACE+LZ` graph with the new segmenter. As a result, it now sets `supportsChunkSize_ = true`.

Finally, the new standard graph ID, `ZL_StandardGraphID_segment_serial`, is appended to the public enum before `_public_end`, preserving all existing wire-format IDs.

Reviewed By: kevinjzhang

Differential Revision: D103759746
@meta-codesync meta-codesync Bot changed the title Add auto-segmentation to the serial profile Add auto-segmentation to the serial profile (#730) May 5, 2026
@Cyan4973 Cyan4973 force-pushed the export-D103759746 branch from 725875f to e7aad0d Compare May 5, 2026 23:28
@meta-codesync meta-codesync Bot closed this in c57e2f4 May 6, 2026
@meta-codesync
Copy link
Copy Markdown

meta-codesync Bot commented May 6, 2026

This pull request has been merged in c57e2f4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant