-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZSTD_decompress(NULL, 0, ...) returns -ZSTD_error_dstSize_tooSmall #1385
Comments
We might have tightened the rules regarding output buffer at some point during development. But if this is a feature that must work, we can add that to the list of properties to respect. |
@Cyan4973 Indeed, the library should avoid performing operations on a Alternatively, I could propose to address this in the arrow-cpp. However I don't think it is feasible to prevent callers from supplying diff --git a/cpp/src/arrow/util/compression_zstd.cc b/cpp/src/arrow/util/compression_zstd.cc
index 4064f29c..c5b5d1c0 100644
--- a/cpp/src/arrow/util/compression_zstd.cc
+++ b/cpp/src/arrow/util/compression_zstd.cc
@@ -206,8 +206,13 @@ Status ZSTDCodec::MakeDecompressor(std::shared_ptr<Decompressor>* out) {
Status ZSTDCodec::Decompress(int64_t input_len, const uint8_t* input, int64_t output_len,
uint8_t* output_buffer) {
+ void *safe_output_buffer = static_cast<void*>(output_buffer);
+ int dummy {};
+ if ((output_len == 0) && (output_buffer == NULL)) {
+ safe_output_buffer = static_cast<void*>(&dummy);
+ }
int64_t decompressed_size =
- ZSTD_decompress(output_buffer, static_cast<size_t>(output_len), input,
+ ZSTD_decompress(safe_output_buffer, static_cast<size_t>(output_len), input,
static_cast<size_t>(input_len));
if (decompressed_size != output_len) {
return Status::IOError("Corrupt ZSTD compressed data."); What do you think? |
I believe your proposed solution for I'll discuss the merit to allow providing That being said, it will only be part of next version, v1.3.8, which is several weeks away. For a quicker fix, one can use the |
fix #1385 decompressing into NULL was an automatic error. It is now allowed, as long as the content of the frame is empty. Seems to simplify things for `arrow`. Maybe some other projects rely on this behavior ?
For a bit more context, the failing piece of code was decompressing into a |
Upstream issue: facebook/zstd#1385 Author: Antoine Pitrou <antoine@python.org> Closes #2909 from pitrou/ARROW-3707-zstd-null-pointer and squashes the following commits: 9fb0676 <Antoine Pitrou> Use cmake to build zstd 8a2488d <Antoine Pitrou> ARROW-3707: Fix test regression with zstd 1.3.7
Upstream issue: facebook/zstd#1385 Author: Antoine Pitrou <antoine@python.org> Closes apache#2909 from pitrou/ARROW-3707-zstd-null-pointer and squashes the following commits: 9fb0676 <Antoine Pitrou> Use cmake to build zstd 8a2488d <Antoine Pitrou> ARROW-3707: Fix test regression with zstd 1.3.7
I'm maintaining an arrow-cpp (https://github.com/apache/arrow) package in nixpkgs. arrow-cpp can use zstd as one of the compression backends. Since we made an upgrade for zstd from 1.3.5 to 1.3.6 one of the tests in arrow-cpp started to fail. After some debugging I found that the source of failure is the call to
ZSTD_decompress
withdstCapacity=0
anddst=NULL
that is not working in zstd 1.3.6+. I've come up with a minimal reproducing example:On zstd 1.3.5:
On zstd 1.3.7:
I'm not very familiar with arrow-cpp's codebase, but, from what I understand, the situation when
dstCapacity=0
anddst=NULL
is possible at runtime in arrow-cpp, for example, when reading "parquet" files with empty columns. It also seems that all other decompressors (GZIP, ZLIB, LZ4, SNAPPY, BROTLI) can handle these zero-length output buffers starting at NULL, as they pass the same test. I was wondering if it is possible to address this issue in zstd.The text was updated successfully, but these errors were encountered: