You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
0xFF bytes in Huffman data need special coding that adds another byte of overhead. If this overhead could be reliably minimized, then approximation of size if scan data described #15 would be more effective.
Here's an algorithm off the top of my head:
Find which code happens to overlap the 0xFF bytes most frequently
Swap the code with another code of the same length (if there isn't a code to swap with — maybe modify/regenerate the Huffman tree (e.g. flip all bits in all codes?))
goto 1 as long as result keeps getting smaller
And/Or:
For all symbols in the stream make histogram of pairs (code[symbol[n-1]], code[symbol[n]])
Find which pairs have 8+ consecutive 1s.
Starting from most frequent pair swap codes with other codes of the same length to break the streaks.
Likely it can be improved.
The text was updated successfully, but these errors were encountered:
I have to check how codes having the same Huffman code length are stored in the file, I seem to recall that it's by increasing value and nothing related to there actual frequency (a part from having the same Huffman code length which comes from the fact that they have similar frequencies).
i.e codes of length 5, 4 entries : 7 9 10 12
But if 12 appears a bit more frequently than 10 and unfortunately has a code ending with a lot of ones like 00111 it could be smart move to change the recording order to 7 9 12 10 and this time 12 would get a less "dangerous" 00110 code and 10 would get 00111.
I'm pretty sure the histogram of pairs will not work since there is often raw binary data in-between two Huffman codes.
0xFF bytes in Huffman data need special coding that adds another byte of overhead. If this overhead could be reliably minimized, then approximation of size if scan data described #15 would be more effective.
Here's an algorithm off the top of my head:
And/Or:
(code[symbol[n-1]], code[symbol[n]])
Likely it can be improved.
The text was updated successfully, but these errors were encountered: