Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closure minification adds \uxxxx escapes into output file, increasing code size #4158

Closed
juj opened this issue Mar 5, 2024 · 2 comments
Closed

Comments

@juj
Copy link

juj commented Mar 5, 2024

In emscripten-core/emscripten#21426 we are discussing ways to improve on Base64 encoding of binary WebAssembly Modules embedded inside .js code. It is observed that both gzip and brotli compress Base64 pessimistically.

One observation here is that the UTF-8 standard is well-specified, so we can attempt to embed bytes directly as UTF-8 code points.

Attempting to do so runs into a Closure minification problem however.

Input: ab.zip

function binaryDecode(r) {
  for(var t=0, B=r.length, e=new Uint8Array(B); t<B; ++t) e[t]=r.charCodeAt(t)-1;
  return e;
}

// String with bytes 0x00 - 0xFF embedded in it.
var js = '��������	\n�\r������������������ !"#$%&\'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~��������������������������������� ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀ';

var a = binaryDecode(js);
console.log(a.slice(0, 64));
console.log(a.slice(64, 128));
console.log(a.slice(128, 192));
console.log(a.slice(192, 256));

This code nicely prints out all bytes from 0x00 up to 0xFF.

Input file is 689 bytes in size. However, running this file through Closure compiler Advanced Optimizations produces a file that is 1225 bytes in size: ab_closured.zip

Online Closure link

@juj juj changed the title Closure minification adds \uxxxx espaces into output file, increasing code size Closure minification adds \uxxxx escapes into output file, increasing code size Mar 5, 2024
@lauraharker
Copy link
Contributor

The compiler defaults to outputting ASCII, but you can specify a different output charset via the --charset flag. (https://github.com/google/closure-compiler/wiki/Flags-and-Options#miscellaneous).

Does --charset=UTF-8 work for you?

@juj
Copy link
Author

juj commented Mar 6, 2024

Thanks, yeah, that works!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants