Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New output format: Single-byte encoding (-F8l1 or -F8cyrl) #31

Open
lifthrasiir opened this issue Sep 25, 2021 · 0 comments
Open

New output format: Single-byte encoding (-F8l1 or -F8cyrl) #31

lifthrasiir opened this issue Sep 25, 2021 · 0 comments
Assignees

Comments

@lifthrasiir
Copy link
Owner

The normal JS string literal is limited in its information density, but if we have a control over the character encoding we can overcome this limitation. xem's int2binary2html has used ISO/IEC 8859-1 (Latin-1, l1) for example, which requires 88 bytes to decode:

b=i.charCodeAt(),b>>8?"€ ‚ƒ„…†‡ˆ‰Š‹Œ Ž  ‘’“”•–—˜™š›œ žŸ".indexOf(i)+128:b
// note that the string literal above is also encoded in Latin-1 so they are one byte per character.

After some testing I concluded that ISO/IEC 8859-5 (cyrillic) is a better choice, which can be decoded in 44 bytes of code due to its regular assignment:

b=i.charCodeAt(),b>>8?b%3683-864:b-167?b:253

Unlike 8859-1 we need to declare the character encoding for 8859-5, but <meta charset=cyrillic> is only 23 bytes long so it is still better than 8859-1.

A critical problem with this approach is that the server-side header is preferred over the client-side metadata, so it is very sensitive to the server setting. In the case of js13kGames the server originally didn't declare the character encoding but it now does, breaking past entries that have used this technique.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant