Re-evaluate JS-to-Wasm ASCII bypass of TextEncoder on Firefox Nightly #1776

hsivonen · 2019-09-20T07:35:12Z

Revision 57b1a57 added an ASCII bypass of TextEncoder for passing strings from JS to Wasm. From my perspective as a TextEncoder developer, the need to add such a special case is a performance bug on the TextEncoder side. Therefore, I made TextEncoder.encodeInto faster in Firefox Nightly especially for ASCII strings.

Notably, for ropes deeper than one level, charCodeAt linearizes the string in Firefox but TextEncoder no longer does. If the string is a more than one-level concatenation of relatively long ASCII pieces, I expect TextEncoder.encodeInto to now outperform wasm-bindgen's manual ASCII path.

Could you, please, re-run the benchmarks that motivated the manual TextDecoder bypass in wasm-bindgen in Firefox Nightly to see if the manual by-pass can be removed or if the manual bypass should be limited to tiny strings (where JIT inlining of charCodeAt may still be a win)?

(FWIW, there's also a similar TextDecoder optimization in flight for Firefox.)

The text was updated successfully, but these errors were encountered:

hsivonen · 2019-09-20T07:35:25Z

CC @RReverser

RReverser · 2019-09-20T11:10:30Z

Hmm I might need to defer this to someone else as I'm working on something else now, but I think the difference was noticeable on V8 too, so fix in one engine doesn't necessarily fix it. I might be wrong and happy for someone to show otherwise by re-running benchmarks that were included in original PR.

alexcrichton · 2019-09-20T15:56:10Z

Using this patch on current master using today's Firefox nightly built from Built from https://hg.mozilla.org/mozilla-central/rev/19704452bd548d0c36d601d609cfcfe2c3e0caa2 and using these benchmarks sourced from this directory I'm getting:

Benchmark	Current master	Master with patch
Pass ascii_small to/from wasm-bindgen	1,739,343/s ±1.33%	1,028,842/s ±2.03%
Pass ascii_medium to/from wasm-bindgen	1,553,687/s ±1.37%	990,149/s ±2.37%
Pass ascii_number to/from wasm-bindgen	1,461,455/s ±1.38%	988,315/s ±2.01%
Pass ascii_date to/from wasm-bindgen	1,182,276/s ±1.25%	845,042/s ±2.17%
Pass ascii_url to/from wasm-bindgen	1,179,832/s ±1.82%	891,663/s ±1.95%
Pass ascii_link to/from wasm-bindgen	1,202,142/s ±0.83%	887,017/s ±1.59%
Pass unicode to/from wasm-bindgen	406,366/s ±3.59%	365,823/s ±2.78%

RReverser · 2019-09-20T18:55:39Z

@alexcrichton Nice! For what it's worth, my original benchmarks from the PR are still here: https://github.com/RReverser/wasm-bindgen-string-benches.

Either way, can you please check and show results with Chrome as well? As stated in the original message, Firefox was already getting just 45% speed up from my original patch, whereas Chrome was getting 80%; I wonder if it still benefits from these changes.

hsivonen · 2019-09-23T11:13:33Z

1,739,343/s ±1.33% | 1,028,842/s ±2.03%

So these are something per second and, therefore, higher is better? I.e. the bypass is still faster than encodeInto? Even for the Unicode case? (Seem weird that the ASCII prefix in the Unicode case makes that big a difference.)

If so, the results are disappointing from my point of view. :-(

Since these strings are literals and not the result of JavaScript programs putting them together by concatenation, etc., they don't end up testing rope issues at all. Also, testing round-tripping can be noisy, since encodeInto doesn't produce garbage but decode does.

Do you happen to know what the usage patterns are like? Do strings that get passed to Wasm usually originate from DOM API return values without further processing (not ropes). Or from arbitrary JS code (potentially ropes)?

alexcrichton · 2019-09-23T13:50:54Z

@RReverser I think this issue is largely about Firefox, so it's probably not worth checking Chrome until we think we want to change it for Firefox. Or at least that's what I'm personally gonna do!

@hsivonen right yeah, these are operations/section where the higher numbers are better. And yeah these are showing that the current code is still faster than unconditionally calling encodeInto.

As for usage patterns, I'm not really entirely sure myself. I think that usage is general enough though that there's not really a typical pattern, but rather there's a lot of different use cases for where strings come from and such.

alexcrichton · 2019-12-03T17:21:26Z

I'm going to close this since the numbers above I think show that this is still a beneficial optimization to have.

hsivonen added the enhancement label Sep 20, 2019

alexcrichton closed this as completed Dec 3, 2019

hsivonen mentioned this issue May 5, 2020

Document and address WebAssembly constraints unicode-org/icu4x#73

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-evaluate JS-to-Wasm ASCII bypass of TextEncoder on Firefox Nightly #1776

Re-evaluate JS-to-Wasm ASCII bypass of TextEncoder on Firefox Nightly #1776

hsivonen commented Sep 20, 2019

hsivonen commented Sep 20, 2019

RReverser commented Sep 20, 2019

alexcrichton commented Sep 20, 2019

RReverser commented Sep 20, 2019

hsivonen commented Sep 23, 2019

alexcrichton commented Sep 23, 2019

alexcrichton commented Dec 3, 2019

Re-evaluate JS-to-Wasm ASCII bypass of TextEncoder on Firefox Nightly #1776

Re-evaluate JS-to-Wasm ASCII bypass of TextEncoder on Firefox Nightly #1776

Comments

hsivonen commented Sep 20, 2019

hsivonen commented Sep 20, 2019

RReverser commented Sep 20, 2019

alexcrichton commented Sep 20, 2019

RReverser commented Sep 20, 2019

hsivonen commented Sep 23, 2019

alexcrichton commented Sep 23, 2019

alexcrichton commented Dec 3, 2019