Handling too large inputs gracefully #5

jseppanen · 2012-06-20T07:19:34Z

Currently, if the input is over 2 GB, lz4.compress may silently truncate the input, or if you get lucky, raise "SystemError: Negative size passed to PyString_FromStringAndSize". This pull request handles too large inputs and raises ValueError for too large inputs. Requires Python 2.5 or newer for Py_ssize_t support. Bumping version number to 0.4.1.

steeve · 2012-06-20T07:35:09Z

Nice catch !
I think we should check out what is LZ4's fault, and what is the binding's (aka my) fault...

Because I'm not in favor of patching lz4.c (which comes directly from Yann Collet's repo).

steeve · 2012-06-20T07:37:46Z

Although checking lz4.h, all the size are handled as int, so I guess LZ4 should work on that too...

Cyan4973 · 2012-06-20T13:42:15Z

Good point. I actually had one request to work on that one.

The solution was to move from 'int' to 'size_t'.
'size_t' is unsigned 32 bits on 32 bits sytems, and unsigned 64 bits on 64 bits ones.
So it fits the bill. And it works fine.

The problem is about the error codes.

They are currently provided as negative numbers.
Changing the interface from 'int' to 'size_t' requires existing user code to handle errors differently, and therefore requires them to rewrite some code.

As a consequence, and since having an input size >=2GB was considered uncommon, if not out of scope, i finally selected to remain with int, avoiding any change to existing user base.

Maybe that's a decision to review again.
In the meantime, there is indeed a size limitation when using 'int' types.

steeve · 2012-06-20T14:20:29Z

Why not force int64_t? After all, even on 32 bits systems, having stuff over 2gb could be possible (with PAE), although unlikely....

steeve · 2012-06-20T14:20:47Z

For the record, @Cyan4973 is Yann Collet :)

Cyan4973 · 2012-06-21T08:35:39Z

I'll make some test with your proposal.
The most important idea is to avoid existing user code to be modified.
So i'll test a bunch of user code, and see if it works.

jseppanen · 2012-06-21T10:15:16Z

I can live with the 2GB limit as such, but the problem is that I want to make sure my data doesn't get silently truncated if I at some point attempt to compress/decompress something that is too long. Also, because LZ4_compressBound may overflow (in lz4.c), I think it should be patched, in addition to the python wrappers.

@Cyan4973 In lz4.c, the LZ4_compressBound function may overflow internally, so I propose to fix that by detecting oversize inputs and flagging them as errors.

oakwhiz · 2012-06-21T20:00:28Z

If there is a 2GB limit, people will start writing their stuff to compress large files in blocks, and then not all implementations will necessarily be compatible.

steeve · 2013-10-21T12:07:32Z

Going on old issues. Is this one fixed?

I'm gonna update the release this week.

Cyan4973 · 2013-10-21T12:24:09Z

There is now an official streaming format specification
(http://fastcompression.blogspot.fr/2013/04/lz4-streaming-format-final.html)
which is meant to solve such issue.

It allows creation of compressed streams of any size, cut into independent (or chained) blocks. There is also an optional checksum mechanism defined, for error detection.

treeandbrick · 2016-05-05T16:19:22Z

Many years later...

$ pip list | grep lz4
lz4 (0.8.2)
$ python -c "import lz4; lz4.compress(b'a' * (2 ** 31))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
OverflowError: size does not fit in an int

Is this going to be fixed? The error is clear, but the limitation is not ideal.

jonathanunderwood · 2016-05-06T07:37:07Z

Upstream has moved. Please file an issue at the current upstream.

https://github.com/python-lz4/python-lz4
On 5 May 2016 17:19, "treeandbrick" notifications@github.com wrote:

$ pip list | grep lz4
lz4 (0.8.2)
$ python -c "import lz4; lz4.compress(b'a' * (2 ** 31))"
Traceback (most recent call last):
File "", line 1, in
OverflowError: size does not fit in an int

Is this going to be fixed?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#5 (comment)

Bugfix: discard oversize inputs instead of truncating silently

7bce869

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling too large inputs gracefully #5

Handling too large inputs gracefully #5

jseppanen commented Jun 20, 2012

steeve commented Jun 20, 2012

steeve commented Jun 20, 2012

Cyan4973 commented Jun 20, 2012

steeve commented Jun 20, 2012

steeve commented Jun 20, 2012

Cyan4973 commented Jun 21, 2012

jseppanen commented Jun 21, 2012

oakwhiz commented Jun 21, 2012

steeve commented Oct 21, 2013

Cyan4973 commented Oct 21, 2013

treeandbrick commented May 5, 2016 •

edited

Loading

jonathanunderwood commented May 6, 2016

Handling too large inputs gracefully #5

Are you sure you want to change the base?

Handling too large inputs gracefully #5

Conversation

jseppanen commented Jun 20, 2012

steeve commented Jun 20, 2012

steeve commented Jun 20, 2012

Cyan4973 commented Jun 20, 2012

steeve commented Jun 20, 2012

steeve commented Jun 20, 2012

Cyan4973 commented Jun 21, 2012

jseppanen commented Jun 21, 2012

oakwhiz commented Jun 21, 2012

steeve commented Oct 21, 2013

Cyan4973 commented Oct 21, 2013

treeandbrick commented May 5, 2016 • edited Loading

jonathanunderwood commented May 6, 2016

treeandbrick commented May 5, 2016 •

edited

Loading