Skip to content

Conversation

juj
Copy link
Collaborator

@juj juj commented Oct 20, 2025

Implement -s SINGLE_FILE_BINARY_ENCODE=1 option to embed Wasm binary as binary-encoded form instead of base64 form in SINGLE_FILE mode. Continuation of #21478.

For comparison of code size, see #21426 (comment).

juj added 2 commits October 20, 2025 21:28
…as binary-encoded form instead of base64 form in SINGLE_FILE mode. Continuation of emscripten-core#21478.
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

Do you have some rough number you could put in the PR description?

How about adding a codesize check that asserts the overall result is smaller?

Assuming no unforseen issues do you expect we could just make this setting always-enabled one day? (and remove the old base64 fallback?)

@juj
Copy link
Collaborator Author

juj commented Oct 20, 2025

Do you have some rough number you could put in the PR description?

Added.

How about adding a codesize check that asserts the overall result is smaller?

Added.

Assuming no unforseen issues do you expect we could just make this setting always-enabled one day?

Maybe one day. For now I think it's good to have the opt-out to give people the fallback if issues come up.

src/preamble.js Outdated
#elif SINGLE_FILE
return base64Decode('<<< WASM_BINARY_DATA >>>');
#elif AUDIO_WORKLET || !EXPORT_ES6
// For an Audio Worklet, we cannot use `new URL()`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its hard to see why the locateFile path is only used in EXPORT_ES6 mode... do you know why EXPORT_ES6 is here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know. It is indeed a little bit odd code flow, which is why I cleaned the ifdefs a bit here as a drive-by. Something to ponder further in a separate PR.

Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth hinting in the docs that the plan is to remove this new setting if nobody uses it.

src/preamble.js Outdated
#endif

#if SINGLE_FILE && SINGLE_FILE_BINARY_ENCODE && !WASM2JS
#include "binaryDecode.js"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we not want to support this in MINIMAL_RUNTIME too? If so maybe this could go in runtime_common.js instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, we do. I lost this when resurrecting the code. Added back, and updated test to cover minimal runtime as well.

@sbc100
Copy link
Collaborator

sbc100 commented Oct 20, 2025

Are there any cases where SINGLE_FILE_BINARY_ENCODE could make the total payload larger?

o[i] = bin.charCodeAt(i) - 1;
}
return o;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is only 5 lines i wonder if its worth creating a completely new flie? Maybe just inline into runtime_common.js?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check

@juj
Copy link
Collaborator Author

juj commented Oct 20, 2025

Are there any cases where SINGLE_FILE_BINARY_ENCODE could make the total payload larger?

base64 encoding encodes each byte uniformly with +33% overhead.

Binary encoding encodes compatible bytes in the original byte (+0% overhead), but incompatible bytes (09h, 0Ch, 21h, 5Bh, 80h-FFh) will be encoded as two bytes, so with +100% overhead.

If the distribution of input bytes were uniform, this comes out to a +49.61% overhead. So a larger size in uncompressed form.

But the trick here is that

a) since all those +100% code points are byte-delimited, they compress efficiently with gzip and brotli compression, and
b) in practice those incompatible bytes are more rare (MSB is more common to be unset than set), so the end result when encoding a .wasm code file ends up being a major win in uncompressed form as well.


function findWasmBinary() {
return locateFile('a.out.wasm');
// For an Audio Worklet, we cannot use `new URL()`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the new location of this comment is misplaced?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm the code size before was relying on the comment being swallowed by virtue of being at the end of an #ifdef.. moved it back up there.

This is an automatic change generated by tools/maint/rebaseline_tests.py.

The following (1) test expectation files were updated by
running the tests with `--rebaseline`:

```
codesize/test_unoptimized_code_size.json: 180812 => 180812 [+0 bytes / +0.00%]

Average change: +0.00% (+0.00% - +0.00%)
```
@juj juj enabled auto-merge (squash) October 21, 2025 00:15
@juj juj merged commit 9110edd into emscripten-core:main Oct 21, 2025
34 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants