New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zlib decompression produces wrong output #6606
Comments
The diff appears to be two null bytes at the end of the file. I would guess we are not properly setting the length of the final chunk. |
The issue here seems to be that the gz file has an extra two \0 at the end of the file (after the end of deflated data), and we have logic that includes those characters in the inflated output. A comment says this mimics CRuby behavior, and I found some code in CRuby that looks similar... but this is clearly not correct behavior. |
@slonopotamus Where did this gz file come from? I am wondering if these extra \0 are intentional or a bug somewhere else. |
This data is a font, extracted from a mobi file that was generated by Amazon KindleGen. Font is stored in zlib-compressed form inside mobi file. Mobi file also stores expected uncompressed data length, that's how I noticed that JRuby output doesn't match what is expected. |
Anyway, my point here is that I couldn't find any other zlib implementation that would agree with JRuby behavior. |
@slonopotamus There is definitely a bug, but I was curious why those bytes (which are not part of the compressed content) are there in the first place. |
Not sure. I'm using my own code to extract these bytes from a mobi file. I actually it tis JRuby bug while testing my code on JRuby. But I think that I'm properly following Mobi spec and these bytes are actually part of zlib'ed data. There are alternative implementations of the same code in libmobi and in Calibre, one could possibly hook a couple of To give you some more background: mobi contains a bunch of "resources", each inside a "record" whose size is known in advance. Depending on resource type, record data is interpreted differently. Images are stored as-is, video has |
@slonopotamus Ok thank you, that is good background info. My reason for asking is that we have never had someone else report this, which means to me that nobody ever ran a gz file with extra bytes through our inflater. That makes this a rather unique case, and I wanted to understand why those bytes are there. Digging back through your code, it seems like this is originating from whatever library provides "palmdb". There could be some other bug in that library, or in the functions it calls to do the extraction of this gz data, but I did not dig that far down. If you feel so inclined to investigate that end of things, here is the last bit of the hexdump from both the gz and the resulting inflated content:
It is peculiar, at the very least. |
"palmdb" is part of the same library I'm referring to :) I'll investigate whether those trailing zero bytes were supposed to be part of zlib stream or had be stripped off by some kind of lower-level logic that handles mobi records, though it doesn't cancel current JRuby bug. |
Indeed. Looking into a fix now. |
Upd: I'm confirming that libmobi (completely independent MOBI reading library) passes the same amount of bytes (36488) as input to zlib and expects to receive back the same amount of bytes (60992) as my Ruby implementation when given the same MOBI file. If you would like to reproduce this with libmobi, you would need to:
And it will print:
36488 is exactly the size of So, I'm pretty confident that "palmdb" passes right compressed data to Ruby code that performs zlib uncompression. |
Ok, so this is likely not covered by the available Ruby zlib tests. I will add some along with the fix. |
FWIW this does not appear to be a bug in jzlib. We actually add these extra bytes in JRuby code, for some reason. |
The finish result will include data that came after the compressed section from the input. This is appropriate when calling finish to terminate a previous compressed segment and proceed to the next segment, but the class-level #inflate should only return the inflated content. Fixes jruby#6606
@slonopotamus Have a look at the new fix in #6612. It passes your case, existing zlib specs, and makes a couple specs pass that didn't before. |
Confirming that |
Environment Information
Provide at least:
jruby -v
) and command line:jruby 9.2.16.0 (2.5.7) 2021-03-03 f82228dc32 OpenJDK 64-Bit Server VM 25.275-b01 on 1.8.0_275-b01 +jit [linux-x86_64]
uname -a
):Linux noblesse 5.4.80-gentoo-r1 #1 SMP Tue Dec 8 10:34:59 MSK 2020 x86_64 AMD Ryzen 7 3700X 8-Core Processor AuthenticAMD GNU/Linux
Way to reproduce
ruby -e "require 'zlib'; f = File.read('data.gz'); puts Zlib::Inflate.inflate(f).bytesize"
in the directory where you saveddata.gz
with different Ruby implementationsExpected Behavior
Uncompressed data size is 60992 bytes. This is what MRI says and what I actually expect.
$ zlib-flate -uncompress < data.gz > data && ls -l data
agrees that uncompressed data is 60992 bytes.$ openssl zlib -d < data.gz > data && ls -l data
also thinks that uncompressed data is 60992 bytes.$ python3 -c "import zlib; print(len(zlib.decompress(open('data.gz', 'rb').read())))"
also thinks that uncompressed data is 60992 bytes.Java also agrees data is 60992 bytes:
Actual Behavior
Uncompressed data size is 60994 bytes on JRuby. I observe this on JRuby 9.2.13 and 9.2.16. I didn't test other JRuby versions.
The text was updated successfully, but these errors were encountered: