latest version 0.9.0 does not allow compression of empty string #27

bmoscon · 2017-05-12T00:40:02Z

Previous versions of python-lz4 allowed compression of the empty string:

In [1]: lz4.compress('')
Out[1]: b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x88/\xc9\x0b'

the latest version fails:

In [2]: lz4.compress('')
lz4/__init__.py:18: DeprecationWarning: Call to deprecated function or method compress (use lz4.block.compress instead).
  def compress(*args, **kwargs):
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last) in <module>()
----> 1 lz4.compress('')

lz4/deprecated.py in new_func(*args, **kwargs)
     37             warnings.warn_explicit(msg, category=DeprecationWarning, filename=filename, lineno=lineno)
     38             warnings.simplefilter('default', DeprecationWarning)  # reset filter
---> 39             return cls_or_func(*args, **kwargs)
     40 
     41         return new_func

lz4/__init__.py in compress(*args, **kwargs)
     17 @deprecated('use lz4.block.compress instead')
     18 def compress(*args, **kwargs):
---> 19     return lz4.block.compress(*args, **kwargs)
     20 
     21 @deprecated('use lz4.block.decompress instead')

ValueError: Input source data size invalid: 0 bytes

Was there a reason for this change? We use lz4 for compression of data we write to a database. Some of the records might contain empty strings. If I were to upgrade to the latest version of lz4 I'd no longer be able to support datasets with empty strings, and I'd also no longer be able to decompress the compressed data that's already been written.

The text was updated successfully, but these errors were encountered:

jonathanunderwood · 2017-05-12T06:36:40Z

This wasn't an intentional change - we should support the previous behaviour. Will look into it when I get avbit of time. PRs always welcome too :)

jonathanunderwood · 2017-05-12T06:40:04Z

Looking at the code, I realize it was an intentional change, as I thought it would be a helpful check - I couldn't imagine a use case for compressing an empty string. I suppose you have provided one. I'll add an option to check for empty strings, which defaults to off, to retain the old behaviour.

bmoscon · 2017-05-12T10:12:05Z

@jonathanunderwood more than happy to supply a PR, I just wanted to understand the reason for the change, and learn if a PR would be accepted or not. Will try and produce one this weekend.

There are valid use cases for passing a zero length string to compress: #27

jonathanunderwood · 2017-05-13T12:11:36Z

Ok, I think I've fixed this, will cut a new release. Please re-open if you think there's still a problem.

bmoscon · 2017-05-13T12:50:22Z

looks good, thanks. I'll test it out once a new release is on pypi.

Previously we returned a string (bytes) object.

spectral54 · 2019-02-15T03:45:16Z

As far as I can tell, this lets you compress and decompress the empty string, but doesn't allow you to decompress empty strings compressed with <v0.9.0.

From bmoscon's first comment opening this issue, lz4.compress('') produced this on v0.8.2 (this actually looks corrupt, the stuff after the first four bytes is just random memory... yikes; in my testing it's always 20 bytes long (I assume 4 size bytes and 16 bytes of random memory)):

b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x88/\xc9\x0b'

However, the decompression isn't backwards compatible; on v0.8.2, this decompresses to the empty string:

>>> lz4.decompress(b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x88/\xc9\x0b')
''

But on v0.9.1 (and newer, as far as I can tell):

>>> block.decompress(b'\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x88/\xc9\x0b')
ValueError (the actual error changes a bit, I believe).

Would it make sense to support the old (v0.8.2) "corrupt" compressed empty strings? As far as I know, this would mean checking that the size information is zero, and accepting any garbage after that.

jonathanunderwood · 2019-02-16T10:51:09Z

Would it make sense to support the old (v0.8.2) "corrupt" compressed empty strings? As far as I know, this would mean checking that the size information is zero, and accepting any garbage after that.

I'm in two minds over this. On the face of it, this seems like a reasonable idea. But, it would special case the zero length case. At the moment, we always error if there is junk in the passed in bytes, irrespective of uncompressed size. My inclination is to leave it as simple as possible.

bmoscon · 2019-02-16T18:07:26Z

Simple as possible always best!

bmoscon mentioned this issue May 12, 2017

new version of python lz4 breaks arctic man-group/arctic#336

Closed

jonathanunderwood added a commit that referenced this issue May 13, 2017

Remove check for zero size in block compress (#27)

1f49adf

There are valid use cases for passing a zero length string to compress: #27

jonathanunderwood added a commit that referenced this issue May 13, 2017

Allow dest size of 0 in block decompress (#27)

94c8473

jonathanunderwood added a commit that referenced this issue May 13, 2017

Add test for zero size input for block compress/decompress (#27)

96f2a89

jonathanunderwood mentioned this issue May 13, 2017

Allow zero length data to be passed to block compress #29

Merged

jonathanunderwood closed this as completed May 13, 2017

jonathanunderwood added a commit that referenced this issue May 16, 2017

Return bytearray instead from block (de)compress for Python3 (#27)

46fb991

Previously we returned a string (bytes) object.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

latest version 0.9.0 does not allow compression of empty string #27

latest version 0.9.0 does not allow compression of empty string #27

bmoscon commented May 12, 2017 •

edited

jonathanunderwood commented May 12, 2017

jonathanunderwood commented May 12, 2017

bmoscon commented May 12, 2017

jonathanunderwood commented May 13, 2017

bmoscon commented May 13, 2017

spectral54 commented Feb 15, 2019

jonathanunderwood commented Feb 16, 2019 •

edited

bmoscon commented Feb 16, 2019

latest version 0.9.0 does not allow compression of empty string #27

latest version 0.9.0 does not allow compression of empty string #27

Comments

bmoscon commented May 12, 2017 • edited

jonathanunderwood commented May 12, 2017

jonathanunderwood commented May 12, 2017

bmoscon commented May 12, 2017

jonathanunderwood commented May 13, 2017

bmoscon commented May 13, 2017

spectral54 commented Feb 15, 2019

jonathanunderwood commented Feb 16, 2019 • edited

bmoscon commented Feb 16, 2019

bmoscon commented May 12, 2017 •

edited

jonathanunderwood commented Feb 16, 2019 •

edited