Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for TextEncoder#encodeInto #1279

Merged
merged 1 commit into from
Feb 26, 2019

Conversation

alexcrichton
Copy link
Contributor

This commit adds support for the recently implemented standard of
TextEncoder#encodeInto. This new function is a "bring your
own buffer" style function where we can avoid an intermediate allocation
and copy by encoding strings directly into wasm's memory.

Currently we feature-detect whether encodeInto exists as it is only
implemented in recent browsers and not in all browsers. Additionally
this commit emits the binding using encodeInto by default, but this
requires realloc functionality to be exposed by the wasm module.
Measured locally an empty binary which takes &str previously took
7.6k, but after this commit takes 8.7k due to the extra code needed for
realloc.

Closes #1172

@alexcrichton
Copy link
Contributor Author

@fitzgen I'm curious on your thoughts about the file size increase here. One possible option I can think of is that we could detect when wasm-bindgen, the crate, is compiled with -Os or -Oz and perhaps exclude this faster functionality.

@fitzgen
Copy link
Member

fitzgen commented Feb 20, 2019

@fitzgen I'm curious on your thoughts about the file size increase here. One possible option I can think of is that we could detect when wasm-bindgen, the crate, is compiled with -Os or -Oz and perhaps exclude this faster functionality.

I think the ideal place we would like to end up at is something like

  • add -O1 and -Os to optimize for speed and size of bindings respectively
    • decide on a default choice between these if none is provided
    • default usage of TextEncode#encodeInto based on whether this is present or not
  • have a flag to control TextEncoder#encodeInto behavior specifically so its default can be overridden

how does that sound?

In terms for this PR, as opposed to an ideal future end state, I think just choosing a default and providing a flag to explicitly set it on or off is good enough to land this.

self.global(&format!(
"
function passStringToWasm(arg) {{
{}
if (typeof cachedTextEncoder.encodeInto === 'function')
return passStringToWasmFast(arg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of checking every time we pass a string, we should do this:

(function () {
    return typeof TextEncoder.prototype.encodeInto === "function"
        ? passStringToWasmFast
        : passStringToWasm;

    function passStringToWasmFast(arg) { ... }
    function passStringToWasm(arg) { ... }
}())

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use_encode body is still checking for encodeInto, which I think is an oversight

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops yes

@alexcrichton
Copy link
Contributor Author

Yeah I think ideally we'd have all the various knobs, but I'm also hoping that we can in the meantime pick reasonable defaults that avoid the need for knobs for as long as possible. Do you think it's worth doing any detection right now? (I'm sort of on the fence about this but would lean towards "no")

If we need to do detection, though, do you think we need to probe much beyond the compile settings for the crate?

@fitzgen
Copy link
Member

fitzgen commented Feb 20, 2019

Do you think it's worth doing any detection right now? (I'm sort of on the fence about this but would lean towards "no")

Detection of whether -Os or -Oz were used instead of -O2 or soemthing? No I don't think that is worth attempting.

@alexcrichton
Copy link
Contributor Author

Ok just to make sure I understand, you're ok merging this (with the change to only do the feature detect once) as-is? In the long-term we'd keep the option open though to have configuration for tweaking this output

@fitzgen
Copy link
Member

fitzgen commented Feb 21, 2019

Ok just to make sure I understand, you're ok merging this (with the change to only do the feature detect once) as-is?

I think we should at least have a CLI flag like --use-encode-into=yes|no before landing and leave the larger configuration stuff for the future.

@alexcrichton
Copy link
Contributor Author

Updated!

self.global(&format!(
"
function passStringToWasm(arg) {{
{}
if (typeof cachedTextEncoder.encodeInto === 'function')
return passStringToWasmFast(arg);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use_encode body is still checking for encodeInto, which I think is an oversight

This commit adds support for the recently implemented standard of
[`TextEncoder#encodeInto`][standard]. This new function is a "bring your
own buffer" style function where we can avoid an intermediate allocation
and copy by encoding strings directly into wasm's memory.

Currently we feature-detect whether `encodeInto` exists as it is only
implemented in recent browsers and not in all browsers. Additionally
this commit emits the binding using `encodeInto` by default, but this
requires `realloc` functionality to be exposed by the wasm module.
Measured locally an empty binary which takes `&str` previously took
7.6k, but after this commit takes 8.7k due to the extra code needed for
`realloc`.

[standard]: https://encoding.spec.whatwg.org/#dom-textencoder-encodeinto

Closes rustwasm#1172
@alexcrichton alexcrichton merged commit 8606124 into rustwasm:master Feb 26, 2019
@alexcrichton alexcrichton deleted the encode-into branch February 26, 2019 18:29
break;
}}
ptr = wasm.__wbindgen_realloc(ptr, size, size * 2);
size *= 2;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the allocator isn't a power-of-two allocator, this can overallocate. It is known that the max number of bytes that encodeInto can require is JavaScript string length times 3. (It follows that if the Wasm program knows the string to be short-lived on the Wasm side, it makes sense to just allocate for the ma length case up front and not reallocate.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm good point! Do you think it'd be worthwhile do do something like malloc(string.length), and if that fails realloc(string.length * 3) followed by realloc(the_actual_length) and that'd be more appropriate than this loop?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the allocations can be assumed the be short-lived, I suggest going with malloc(string.length * 3) without a loop.

If they are assumed to be long-lived, I suggest coming up with a magic number magic that characterizes how much smaller the initial allocation needs to than the worst case for it to be worthwhile not to allocate for the worst case right away.

Then:

let min = roundUpToBucketSize(string.length);
let max = string.length * 3;
if (min + magic > max) {
  // min is close enough to max anyway
  let m = malloc(max);
  let (read, written) = encodeInto(...);
  assert(read == string.length);
} else {
  let m = malloc(min);
  let (read, written) = encodeInto(...);
  if (read < string.length) {
     let inputLeft = string.length - read;
     let outputNeeded = inputLeft * 3;
     let m = realloc(m, written + outputNeeded);
     // ... encodeInto with input starting from read and output starting from written
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok thanks for the info! I've opened up #1313 to further track this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants