Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pad the LZW code table if less than minimum size #14

Closed
wants to merge 1 commit into from

Conversation

rendaw
Copy link

@rendaw rendaw commented Aug 6, 2019

I think this fixes #11.

My understanding is that while the color table has a minumum size of 2, the LZW minimum code size of 2 indicates that the code table has 4 colors, so the code table needs to be padded with two empty colors before the cc and eoi codes. I believe decoders were failing because they expected cc and eoi codes to have different code indexes and were thus misinterpreting the whole stream (first code in the stream is always cc).

I think this is indicated on page 32 here https://www.w3.org/Graphics/GIF/spec-gif89a.txt although I'm bad with specs without lots of pictures and pretty diagrams... it says

A special Clear code is defined which resets all compression/decompression parameters and tables to a start-up state. The value of this code is 2**<code size>.

so for a code size of 2, the cc index would be 4 (0-1 would be colors, 2-3 ??).

gifencoder currently has 2 as the current minimum for the LZW minimum code size value, and I believe this is correct (i.e. the issue isn't that the minimum code size is being set too high). For reference, the definition says a bit above that that for 2 color images the minimum code size is 2:

Because of some algorithmic constraints however, black & white images which have one color bit must be indicated as having a code size of 2.
It's not indicated how "just black" or "just white" images are handled but I guess it's reasonable to assume that anything less than 5 colors has a code size of 2...

I couldn't find information on padding. My understanding is that since the code table is used by matching color table indexes, reusing color index 0 for the padding elements is fine since a lookup for color 0 will never get that far.

In any case I tried this and I could read the output file with a couple programs (not entirely sure they aren't all using the same library).

@rendaw
Copy link
Author

rendaw commented Aug 6, 2019

Build failure unrelated?

@dlubarov
Copy link
Collaborator

dlubarov commented Aug 7, 2019

Thank you for looking into this! I need to reacquaint myself with the spec and the code a bit, but will try to reply soon, maybe later today.

@dlubarov
Copy link
Collaborator

dlubarov commented Aug 7, 2019

Your explanation makes perfect sense; I must have missed that part of the spec.

I'm not sure if this would matter in practice, but it seems like to be 100% conformant, we should pad up to 2**code_size (code_size being the "minimum" size, before we add 1 for the special codes), rather than stopping at 4?

@rendaw
Copy link
Author

rendaw commented Aug 7, 2019

Right now numColors is the padded size from the color table, so it's guaranteed to be a power of two but its minimum value is 2 rather than 4.

The 4 is kind of a magical number, maybe adding a doc comment to the LzwEncoder constructor specifying that numColors must be a power of two? Or I could make it a constant with a comment on its derivation?

@rendaw
Copy link
Author

rendaw commented Aug 7, 2019

I have an alternate fix in #15

@dlubarov
Copy link
Collaborator

dlubarov commented Aug 7, 2019

Oh right -- forgot that numColors was already padded to a power of 2.

I think I prefer your other fix for now, and afterward I can refactor a bit to explicitly derive numColors from the code size (to make sure it's clear where the minimum comes from), and maybe name it something different as a hint that numColors isn't necessarily the same as the color table size.

@rendaw rendaw closed this Aug 7, 2019
dlubarov added a commit to dlubarov/gifencoder that referenced this pull request Aug 7, 2019
We were deriving the minimum code size from the number of colors in the LZW code table, which is sort of backwards. The spec defines the "clear" code's value (and implicitly, the number of colors in the LZW code table) based on the minimum code size.

I also made some tweaks to emphasize that the number of colors in the LZW code table may not match the color table size, as @rendaw pointed out in square#14. The former quantity is now just a local variable in `defaultCodeTable`, so it's more contained. The latter quantity I renamed to `colorTableSize`.
dlubarov added a commit that referenced this pull request Aug 7, 2019
We were deriving the minimum code size from the number of colors in the LZW code table, which is sort of backwards. The spec defines the "clear" code's value (and implicitly, the number of colors in the LZW code table) based on the minimum code size.

I also made some tweaks to emphasize that the number of colors in the LZW code table may not match the color table size, as @rendaw pointed out in #14. The former quantity is now just a local variable in `defaultCodeTable`, so it's more contained. The latter quantity I renamed to `colorTableSize`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1-2 color GIFs are unreadable
2 participants