Skip to content

Explore using PyBytesWriter API for compression libraries output buffers #139877

@emmatyping

Description

@emmatyping

Feature or enhancement

The new PyBytesWriter() API is fast and easy to use. I expect it will bring a nice improvement both to maintainability and speed for compression output buffer management.

I have some perf recordings showing that a large portion (>50%!) of time in decompression for a mix of data sizes (1K, 1M, 1G) is in _BlocksOutputBuffer_Finish, re-assembling the output buffer.

I also made a very hacky modification to pycore_blocks_output_buffer.h to use PyBytesWriter() and found it greatly sped up decompression time:

The below two tests are operating on compressed enwiki content with zstd compression.

test main PyBytesWriter()
decompress 1M 2.15ms 1.65ms
decompress 1G 2.2s 1.73s

Those are 25-30% speedups!

I think this is enough to motivate a refactor of this code to use PyBytesWriter() and benchmark against the current implementation across compression modules and data sizes.

cc @vstinner for viz

Linked PRs

Metadata

Metadata

Assignees

Labels

extension-modulesC modules in the Modules dirperformancePerformance or resource usagetype-featureA feature request or enhancement

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions