Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop utf8ToBytes and asciiToBytes in favor of TextEncoder #60

Closed
coolaj86 opened this issue Jun 4, 2015 · 9 comments
Closed

Drop utf8ToBytes and asciiToBytes in favor of TextEncoder #60

coolaj86 opened this issue Jun 4, 2015 · 9 comments

Comments

@coolaj86
Copy link

coolaj86 commented Jun 4, 2015

TextEncoder and TextDecoder are already supported in Chrome and Firefox (not sure about MSIE and Safari).

var buf = new TextEncoder('utf-16le').encode("I ½ ♥ 💩");
var str = new TextDecoder('utf-16le').decode(buf);

If you use them you could drop a fair amount of code and only provide your UTF-8, UTF-16, etc conversions as a separate Polyfill module when the native TextEncoder and TextDecoder don't exist.

I haven't actually tested your UTF-8 converter, but if you get the same values as you get in node, then it's probably the best that there is. The full TextEncoder Polyfill seems to have much more code than I would intuitively expect and Mozilla's UTF-8 converter is actually incorrect (well it provides a correct encoding, but it's not the encoding that node and the W3C utilities use).

Anyway, then you could drop a bit of code.

I'm not sure how this compares in terms of performance, but if you wanted the polyfill code to be even smaller you can actually do this:

function utf8ToBinaryString() {
  var escstr = encodeURIComponent(str);
  // replaces any uri escape sequence, such as %0A, with binary escape, such as 0x0A
  var binstr = escstr.replace(/%([0-9A-F]{2})/g, function(match, p1) {
    return String.fromCharCode('0x' + p1);
  });

  return binstr;
}

function utf8ToBuffer(str) {
  var binstr = utf8ToBinaryString(str);
  var buf = new Uint8Array(binstr.length);
  Array.prototype.forEach.call(binstr, function (ch, i) {
    buf[i] = ch.charCodeAt(0);
  });
  return buf;
}

function utf8ToBase64(str) {
  var binstr = utf8ToBinaryString(str);
  return btoa(binstr);
}

I don't know if that still works for utf-16, but I have a hunch that it would.

I know I could have just used your module here, but I've been enjoying the headache of learning the hacky, non-hacky, and best-possible ways of doing this stuff in the browser.

@jessetane
Copy link
Collaborator

I left some notes here for why I made some of the decisions I did for utf8tobytes: #51 (comment)

Let me know if anything there seems wrong or needs updating, happy to try and help fix. See also #45

@coolaj86
Copy link
Author

coolaj86 commented Jun 4, 2015

I don't think anything is wrong with it, but it's no longer necessary. W3C's TextEncoder does it for you.

I'm saying you can move that to a polyfill and only load it for IE, Safari, and old Android.
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder

@feross
Copy link
Owner

feross commented Jun 6, 2015

This module has to work all the way back to IE6, so we're stuck with the code regardless. The real question is: what is the performance impact?

I opened an issue for this a while back (#45) and it turned out that TextEncoder was 5x slower in Chrome, probably because they haven't optimized the API yet. When it's faster, we should probably prefer it to our own solution :)

@jessetane
Copy link
Collaborator

This module is also about compatibility with node (http://blog.nodejs.org/2014/06/16/openssl-and-breaking-utf-8-change/). If your app targets platforms that support the native apis, why not use TextEncoder with typed arrays directly?

@feross
Copy link
Owner

feross commented Jun 6, 2015

Oh, interesting point @jessetane. I wonder how easy it would be to support these node quirks if we used TextEncoder...

@jessetane
Copy link
Collaborator

I mean, how much do people like the Buffer api? This module is important because it provides a bridge from node/npm into the browser, but imo it's just a shim for that purpose.

I think it's a bit confusing to lump an api for transcoding unicode <--> binary in with the api for working with binary data. The api for working with binary data is available now via typed arrays, the api for working with unicode has always been String (even though it still sucks) and the Text{En,De}coder apis provide a place to put the huge number of ways to convert between what are really two unrelated things.

Instead of optimizing our shim for node's Buffer, which was itself a shim for the non-existent typed array and Text{En,De}coder apis, perhaps we should just start using these newer and hopefully more well thought out apis directly. Thoughts?

@feross
Copy link
Owner

feross commented Jun 6, 2015

Are these new APIs available in node or iojs yet? Buffer is nice because
your modules can work in both environments without special case hacks.

Also, I personally quite like the Buffer API.
On Sat, Jun 6, 2015 at 2:42 PM jessetane notifications@github.com wrote:

I mean, how much do people like the Buffer api? This module is important
because it provides a bridge from node/npm into the browser, but imo it's
just a shim for that purpose.

I think it's a bit confusing to lump an api for transcoding unicode <-->
binary in with the api for working with binary data. The api for working
with binary data is available now via typed arrays, the api for working
with unicode has always been String (even though it still sucks
https://mathiasbynens.be/notes/javascript-unicode) and the
Text{En,De}coder apis provide a place to put the huge number of ways to
convert between what are really two unrelated things.

Instead of optimizing our shim for node's Buffer, which was itself a shim
for the non-existent typed array and Text{En,De}coder apis, perhaps we
should just start using these newer and hopefully more well thought out
apis directly. Thoughts?


Reply to this email directly or view it on GitHub
#60 (comment).

@jessetane
Copy link
Collaborator

TextEncoder no, though there is the polyfill @coolaj86 mentioned. I figure it's better to shim node than the browser since you generally care less about code size there. I have almost completely weened myself off Buffer - the DataView api is quite nice!

@feross
Copy link
Owner

feross commented Jun 7, 2015

Buffer is way faster than DataView, at least for now :)

@feross feross closed this as completed Jun 28, 2015
myfreeer added a commit to myfreeer/exceljs that referenced this issue Sep 12, 2020
Doing a profiling in chrome browser shows that the `Buffer.toString()` is using unexpected long cpu time. With the native TextDecoder it can get much faster in browsers supporting it.
In browsers not supporting TextDecoder, like Internet Explorer, we can fallback to `Buffer.toString()`.

References:
feross/buffer#268
feross/buffer#60
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
myfreeer added a commit to myfreeer/exceljs that referenced this issue Sep 12, 2020
Doing a profiling in chrome dev tools shows that the `Buffer.toString()` and `Buffer.from(string)` is using unexpected long cpu time. With the native TextDecoder and TextEncoder it can get much faster in browsers supporting it.
On browsers not supporting TextDecoder, like Internet Explorer, `Buffer.toString()` and `Buffer.from(string)` would not be changed.

References:
feross/buffer#268
feross/buffer#60
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder
myfreeer added a commit to myfreeer/exceljs that referenced this issue Oct 1, 2020
Doing a profiling in chrome dev tools shows that the `Buffer.toString()` and `Buffer.from(string)` is using unexpected long cpu time. With the native TextDecoder and TextEncoder it can get much faster in browsers supporting it.
On browsers not supporting TextDecoder, like Internet Explorer, `Buffer.toString()` and `Buffer.from(string)` would not be changed.

References:
feross/buffer#268
feross/buffer#60
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder
myfreeer added a commit to myfreeer/exceljs that referenced this issue Oct 3, 2020
Doing a profiling in chrome dev tools shows that the `Buffer.toString()` and `Buffer.from(string)` is using unexpected long cpu time. With the native TextDecoder and TextEncoder it can get much faster in browsers supporting it.
On browsers not supporting TextDecoder, like Internet Explorer, this would fallback to original `Buffer.toString()` and `Buffer.from(string)`.
This implements almost the same of exceljs#1458 in a non monkey-patching way covering xlsx only.
Closes exceljs#1458

References:
feross/buffer#268
feross/buffer#60
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder
myfreeer added a commit to myfreeer/exceljs that referenced this issue Oct 3, 2020
Doing a profiling in chrome dev tools shows that the `Buffer.toString()` and `Buffer.from(string)` is using unexpected long cpu time. With the native TextDecoder and TextEncoder it can get much faster in browsers supporting it.
On browsers not supporting TextDecoder, like Internet Explorer, this would fallback to original `Buffer.toString()` and `Buffer.from(string)`.
This implements almost the same of exceljs#1458 in a non monkey-patching way covering xlsx only.
Closes exceljs#1458

References:
feross/buffer#268
feross/buffer#60
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder
myfreeer added a commit to myfreeer/exceljs that referenced this issue Oct 5, 2020
Doing a profiling in chrome dev tools shows that the `Buffer.toString()` and `Buffer.from(string)` is using unexpected long cpu time. With the native TextDecoder and TextEncoder it can get much faster in browsers supporting it.
On browsers not supporting TextDecoder, like Internet Explorer, this would fallback to original `Buffer.toString()` and `Buffer.from(string)`.
This implements almost the same of exceljs#1458 in a non monkey-patching way covering xlsx only.
Closes exceljs#1458

References:
feross/buffer#268
feross/buffer#60
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder
myfreeer added a commit to myfreeer/exceljs that referenced this issue Oct 5, 2020
Doing a profiling in chrome dev tools shows that the `Buffer.toString()` and `Buffer.from(string)` is using unexpected long cpu time. With the native TextDecoder and TextEncoder it can get much faster in browsers supporting it.
On browsers not supporting TextDecoder, like Internet Explorer, this would fallback to original `Buffer.toString()` and `Buffer.from(string)`.
This implements almost the same of exceljs#1458 in a non monkey-patching way covering xlsx only.
Closes exceljs#1458

References:
feross/buffer#268
feross/buffer#60
https://developer.mozilla.org/en-US/docs/Web/API/TextDecoder
https://developer.mozilla.org/en-US/docs/Web/API/TextEncoder
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants