Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speedup progressive scan optimization #15

Open
pengvado opened this issue Mar 7, 2014 · 2 comments
Open

Speedup progressive scan optimization #15

pengvado opened this issue Mar 7, 2014 · 2 comments

Comments

@pengvado
Copy link
Contributor

pengvado commented Mar 7, 2014

It should be possible to double the speed of progressive scan optimization:
Currently each proposed scan is processed twice, first to collect huffman statistics and then to construct its bitstream. But for the purpose of deciding which scans to keep, we only need to know their sizes, not their contents, and size can be derived from the huffman statistics alone.

@frkay
Copy link

frkay commented Mar 19, 2014

Unfortunately this is not true, the reason is that in the entropy coded data the byte 0xFF is not supposed to appear since it is the begging of a JPEG marker (FFxx codes), when it accidentally appears it has to be followed by a 0x00 byte (this forms a dummy marker that can be ignored by the decoder). This allows parsing of a JPEG file to extract all its markers without actually performing the decompression. This also explains why the Huffman code made only of ones is never used, 0xFF will appear when for instance a code like 0111 is followed by 111110 and the 11111111 middle part is right on a byte boundary and this means you have to construct the bitstream to detect these 0xFF bytes and pad them with a 0x00.
Now there are perhaps creative ways to build the Huffman tree (not necessary optimal) to reduce the number of 0xFF and save a 0x00 byte here and there...

@kornelski
Copy link
Member

If you assume there's a typical overhead of 0xFF bytes (say, 1%) then you could at least quickly eliminate scans that are >1% larger than the best candidate.

It might be worth checking whether ignoring 0xFF is good enough (assuming that overhead is generally constant or can be minimized).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants