Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompression fails where no content size is included in the frame (e.g. streaming) #53

Closed
starrify opened this issue Oct 5, 2020 · 1 comment

Comments

@starrify
Copy link

starrify commented Oct 5, 2020

Examples for reproducing the issue:

$ man zstd > tmpmanual
$ zstd tmpmanual --content-size -o with-size.zst
tmpmanual            : 34.74%   ( 26286 =>   9133 bytes, with-size.zst)        
$ zstd tmpmanual --no-content-size -o no-size.zst
tmpmanual            : 34.74%   ( 26286 =>   9132 bytes, no-size.zst)          
$ man zstd | zstd -o no-size-2.zst
/*stdin*\            : 34.70%   ( 26286 =>   9120 bytes, no-size-2.zst)        
$ python -c 'import zstd; zstd.uncompress(open("with-size.zst", "rb").read())'
<string>:1: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
$ python -c 'import zstd; zstd.uncompress(open("no-size.zst", "rb").read())'
<string>:1: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
Traceback (most recent call last):
  File "<string>", line 1, in <module>
zstd.Error: Input data invalid or missing content size in frame header.
$ python -c 'import zstd; zstd.uncompress(open("no-size-2.zst", "rb").read())'
<string>:1: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
Traceback (most recent call last):
  File "<string>", line 1, in <module>
zstd.Error: Input data invalid or missing content size in frame header.

Test environment:

$ uname -a
Linux GLaDOS 5.8.10-arch1-1 #1 SMP PREEMPT Thu, 17 Sep 2020 18:01:06 +0000 x86_64 GNU/Linux
$ python --version
Python 3.8.5
$ python -c 'import zstd; print(zstd.version())'
1.4.5.1
$ zstd --version
*** zstd command line interface 64-bits v1.4.5, by Yann Collet ***

Proposal: Enhance the API so that it may work also for data without the content size field embedded.

By the way, the other Python-binding library python-zstandard also fails for the same reason when using its simple decompression API (zstandard.ZstdDecompressor().decompress()), but its other APIs (e.g. streaming API) may handle properly the case when there is no content size.

@sergey-dryabzhinsky
Copy link
Owner

Yes, this module is simple and dumb.
It never meant to support streaming compression.
And I'll keep it this way.

starrify added a commit to starrify/scrapy that referenced this issue Oct 5, 2020
yaniv-aknin added a commit to yaniv-aknin/fafdata that referenced this issue Feb 3, 2023
Around November 2022, FAF switched to use the Rust based replay server,
which uses zstd streaming decompression [0].

The Python zstd module doesn't and won't support streaming [1], but
zstandard does.

[0] https://faforever.zulipchat.com/#narrow/stream/203478-general/topic/.28no.20topic.29/near/325517709
[1] sergey-dryabzhinsky/python-zstd#53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants