Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZSTD_decompress(NULL, 0, ...) returns -ZSTD_error_dstSize_tooSmall #1385

Closed
veprbl opened this issue Oct 23, 2018 · 4 comments · Fixed by #1390
Closed

ZSTD_decompress(NULL, 0, ...) returns -ZSTD_error_dstSize_tooSmall #1385

veprbl opened this issue Oct 23, 2018 · 4 comments · Fixed by #1390
Assignees

Comments

@veprbl
Copy link

veprbl commented Oct 23, 2018

I'm maintaining an arrow-cpp (https://github.com/apache/arrow) package in nixpkgs. arrow-cpp can use zstd as one of the compression backends. Since we made an upgrade for zstd from 1.3.5 to 1.3.6 one of the tests in arrow-cpp started to fail. After some debugging I found that the source of failure is the call to ZSTD_decompress with dstCapacity=0 and dst=NULL that is not working in zstd 1.3.6+. I've come up with a minimal reproducing example:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <zstd.h>

#define MAX_UNCOMPRESSED_SIZE 4096
#define MAX_COMPRESSED_SIZE 4096

#define CHECK_ERROR(code) \
  if (ZSTD_isError(code)) { \
    printf("Error: %s\n", ZSTD_getErrorName(code)); \
    exit(EXIT_FAILURE); \
  }

int main(void) {
  char uncompressed[MAX_UNCOMPRESSED_SIZE];
  char compressed[MAX_COMPRESSED_SIZE];
  int compression_level = 1;
  size_t input_len = 0;
  memset(uncompressed, 0, sizeof(uncompressed));
  size_t compressed_size = ZSTD_compress(compressed, MAX_COMPRESSED_SIZE, uncompressed, input_len, compression_level);
  CHECK_ERROR(compressed_size);
  printf("compressed_size = %zu\n", compressed_size);

  {
    printf("test1\n");
    memset(uncompressed, 0, sizeof(uncompressed));
    size_t uncompressed_size = ZSTD_decompress(uncompressed, MAX_UNCOMPRESSED_SIZE, compressed, compressed_size);
    CHECK_ERROR(uncompressed_size);
    printf("uncompressed_size = %zu\n", uncompressed_size);
  }
  {
    printf("test2\n");
    memset(uncompressed, 0, sizeof(uncompressed));
    size_t uncompressed_size = ZSTD_decompress(uncompressed, 0, compressed, compressed_size);
    CHECK_ERROR(uncompressed_size);
    printf("uncompressed_size = %zu\n", uncompressed_size);
  }
  {
    printf("test3\n");
    size_t uncompressed_size = ZSTD_decompress(NULL, 0, compressed, compressed_size);
    CHECK_ERROR(uncompressed_size);
    printf("uncompressed_size = %zu\n", uncompressed_size);
  }
  return EXIT_SUCCESS;
}

On zstd 1.3.5:

compressed_size = 9
test1
uncompressed_size = 0
test2
uncompressed_size = 0
test3
uncompressed_size = 0

On zstd 1.3.7:

compressed_size = 9
test1
uncompressed_size = 0
test2
uncompressed_size = 0
test3
Error: Destination buffer is too small

I'm not very familiar with arrow-cpp's codebase, but, from what I understand, the situation when dstCapacity=0 and dst=NULL is possible at runtime in arrow-cpp, for example, when reading "parquet" files with empty columns. It also seems that all other decompressors (GZIP, ZLIB, LZ4, SNAPPY, BROTLI) can handle these zero-length output buffers starting at NULL, as they pass the same test. I was wondering if it is possible to address this issue in zstd.

@Cyan4973
Copy link
Contributor

Cyan4973 commented Oct 23, 2018

We might have tightened the rules regarding output buffer at some point during development.
Operations involving a potentially NULL pointer tend to generate warnings with advanced test tools, such as sanitizers or static analyzers. So I suspect one such warning invited closing a suspicious use case.

But if this is a feature that must work, we can add that to the list of properties to respect.

@veprbl
Copy link
Author

veprbl commented Oct 23, 2018

@Cyan4973 Indeed, the library should avoid performing operations on a NULL pointer. However, I naively think that, if the function is given a zero dstCapacity, then the effective interval of allowed memory is now an empty set [NULL, NULL), so no memory should be accessed. Perhaps the constraint could be relaxed just for this particular case of dstCapacity = 0 without triggering any code analyzers.

Alternatively, I could propose to address this in the arrow-cpp. However I don't think it is feasible to prevent callers from supplying NULL pointers for the output buffers. This is because those pointers come from various standard and custom allocators used in arrow-cpp's test suite and library. Probably, the solution I could go for in this case would be to just modify the zstd backend to handle the exceptional case using something like:

diff --git a/cpp/src/arrow/util/compression_zstd.cc b/cpp/src/arrow/util/compression_zstd.cc
index 4064f29c..c5b5d1c0 100644
--- a/cpp/src/arrow/util/compression_zstd.cc
+++ b/cpp/src/arrow/util/compression_zstd.cc
@@ -206,8 +206,13 @@ Status ZSTDCodec::MakeDecompressor(std::shared_ptr<Decompressor>* out) {

 Status ZSTDCodec::Decompress(int64_t input_len, const uint8_t* input, int64_t output_len,
                              uint8_t* output_buffer) {
+  void *safe_output_buffer = static_cast<void*>(output_buffer);
+  int dummy {};
+  if ((output_len == 0) && (output_buffer == NULL)) {
+    safe_output_buffer = static_cast<void*>(&dummy);
+  }
   int64_t decompressed_size =
-      ZSTD_decompress(output_buffer, static_cast<size_t>(output_len), input,
+      ZSTD_decompress(safe_output_buffer, static_cast<size_t>(output_len), input,
                       static_cast<size_t>(input_len));
   if (decompressed_size != output_len) {
     return Status::IOError("Corrupt ZSTD compressed data.");

What do you think?

@Cyan4973
Copy link
Contributor

I believe your proposed solution for arrow looks good.
It's a great patch that is valuable to merge,
as both v1.3.6 and v1.3.7 disallow providing NULL as output buffer,
so if the library version is outside of arrow control, it's a situation that may happen on any user environment.

I'll discuss the merit to allow providing NULL,0 as output buffer,
and if there is agreement on such need, will proceed with ensuring it's properly supported and properly tested.

That being said, it will only be part of next version, v1.3.8, which is several weeks away.

For a quicker fix, one can use the dev branch directly or cherry-pick the patch.
Alternatively, your patch will definitely solve the situation for arrow, and keep it safe from any similar issue in the future.

@Cyan4973 Cyan4973 self-assigned this Oct 23, 2018
Cyan4973 added a commit that referenced this issue Oct 24, 2018
fix #1385
decompressing into NULL was an automatic error.
It is now allowed, as long as the content of the frame is empty.

Seems to simplify things for `arrow`.
Maybe some other projects rely on this behavior ?
@pitrou
Copy link
Contributor

pitrou commented Nov 6, 2018

For a bit more context, the failing piece of code was decompressing into a std::vector, but a 0-sized std::vector may return a null pointer from its data() method (this is allowed by the standard, unfortunately).

pitrou added a commit to apache/arrow that referenced this issue Nov 7, 2018
Upstream issue: facebook/zstd#1385

Author: Antoine Pitrou <antoine@python.org>

Closes #2909 from pitrou/ARROW-3707-zstd-null-pointer and squashes the following commits:

9fb0676 <Antoine Pitrou> Use cmake to build zstd
8a2488d <Antoine Pitrou> ARROW-3707:  Fix test regression with zstd 1.3.7
romainfrancois pushed a commit to romainfrancois/arrow that referenced this issue Nov 7, 2018
Upstream issue: facebook/zstd#1385

Author: Antoine Pitrou <antoine@python.org>

Closes apache#2909 from pitrou/ARROW-3707-zstd-null-pointer and squashes the following commits:

9fb0676 <Antoine Pitrou> Use cmake to build zstd
8a2488d <Antoine Pitrou> ARROW-3707:  Fix test regression with zstd 1.3.7
@Cyan4973 Cyan4973 mentioned this issue Dec 27, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants