Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable output format #29

Open
lifthrasiir opened this issue Sep 25, 2021 · 0 comments
Open

Configurable output format #29

lifthrasiir opened this issue Sep 25, 2021 · 0 comments
Assignees
Milestone

Comments

@lifthrasiir
Copy link
Owner

The current output format is fixed to the six-bit code, which is optimized for js13kGames submissions. There are numerous other possibilities though and I'd like to reserve the -F|--output-format option in the CLI for this purpose. For the API it would be probably new method to be used instead of the current Packer.makeDecoder.

The default would remain as the six-bit coding, an explicit option being -F6. The first digit denotes outBits which is the number of bits per each coded unit. There may be following characters if there are multiple output formats with the same outBits or it requires a manual intervention (therefore we would probably never have -F8).

Each possible future format would go into there own issues. In this issue I'd document the current six-bit coding.

-F6: Six-bit coding

This is the default and only possibility as of 2.1.0. The compressed data is encoded into a template literal, where a code unit k is encoded as a code point k or k + 0x40 so that the escape sequence is not required. The latter is preferred since they coincide with alphabets, but the former is desirable or even required when k is one of the following:

  • 0x1c (because 0x5c is \)
  • 0x28 (which is (, the first line has to include this character when -D is not in use)
  • 0x3d (which is =, the first line has to include this character when -D is in use)
  • 0x3f (because 0x7f is not printable)

This format is designed to be "compressed" again with the Huffman tree. This is ensured by making every code unit corresponds to a (mostly consecutive) set of 64 characters and minimize the number of characters not in the set (we can't completely get rid of them because ` has to be somewhere in that line). The Huffman tree needs to encode characters not in the set so the actual coding rate is slightly below 8 bit/char even after DEFLATE, but the optimal tree should result in at least 7 * (384/385) = 7.979 bit/char.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant