unicode support by rapropos · Pull Request #42 · square/js-jose

rapropos · 2016-08-04T20:25:31Z

Allow non-ASCII strings to be passed through encryption/decryption properly via UTF-8 encoding. Addresses #41.

csstaub · 2016-08-04T22:43:32Z

This seems good; but I'm not an expert on JavaScript. Our resident JS expert @alokmenghrajani is on vacation right now, but maybe I can find someone else to review it.

rapropos · 2016-08-04T22:55:56Z

If it is accepted, I will make a stab at tackling #34 (as the necessary logic for my use case relies on proper UTF-8 handling of JWT payloads).

alokmenghrajani · 2016-09-06T22:38:11Z

+ * https://github.com/google/closure-library
+ * @param str string
+ * @return Uint8Array
+ */


What do you think about calling encodeURIComponent() and walking the resulting string looking for '%' characters? I feel it's less error-prone to leverage code which already exists.

@steike points out that encodeURIComponent gives you some strictness (e.g. disallows \ud800 by itself) which is probably a good thing.

rapropos · 2016-09-07T20:16:04Z

I guess I simply figured that if Google bothered to implement this in their fundamental JavaScript library underpinning all their core web apps, they must have run into some problem with methods using browser functions. If it matters, my use case is large amounts of Japanese text, so I'm not dealing with "mostly ASCII with an occasional accented character", so one issue might be how much encodeURIComponent bloats the strings.

alokmenghrajani · 2016-09-07T20:42:49Z

Unless you can show that this version is significantly faster than calling encodeURIComponent, I would prefer to use encodeURIComponent/escape. Less code == less bugs.

You might also want to look at https://developer.mozilla.org/en-US/docs/Web/API/WindowBase64/btoa#Unicode_strings for the base64 part.

alokmenghrajani · 2016-09-07T20:51:05Z

+      out[c++] = String.fromCharCode(c1);
+    } else if (c1 > 191 && c1 < 224) {
+      c2 = bytes[pos++];
+      out[c++] = String.fromCharCode((c1 & 31) << 6 | c2 & 63);


I think the 365 constant is incorrect. Looking at https://en.wikipedia.org/wiki/UTF-8, the values we care about are:
192, 224, 240, 248, etc. (The values above 247 don't exist in javascript)

The 365 constant seems to have come in in this commit.

Yeah, still doesn't explain why they picked this value. If c1 is a byte, it makes no sense.

The incoming parameter is documented as {Uint8Array|Array<number>}, so perhaps in the latter case c1 isn't guaranteed to be <256.

rapropos · 2016-09-07T21:05:34Z

The Mozilla link you mentioned has a note saying "A better, more faithful and less expensive solution is to use typed arrays to do the conversion". I followed that "typed arrays" link and the implementation in this PR is intended to be the "solution #2" from there.

alokmenghrajani · 2016-09-07T21:46:50Z

This isn't a formal benchmark, but it seems calling atob gives the same result (if we call String.fromCharCode() first) and is faster for me (Chrome): https://jsfiddle.net/evcesa5h/.

Would you be ok taking the simpler/much less code approach first and switching to what you have authored if we run into issues?

rapropos · 2016-09-08T20:56:21Z

There are fundamentally four operations that need to exist: Unicode string to/from array of UTF-8 bytes and array of arbitrary bytes to/from base64 string. For end user convenience, it would be nice to have single entry points that can go from Unicode string to/from base64 string. Maybe btoa can safely do the job of what I have as Utils.Base64Url.encodeArray, as in your fiddle.

Bottom line: all I really care about is reliably getting Unicode strings back exactly as they were, regardless of whether the producer and consumer are operating in different browser environments. I understand and sympathize with your concerns about "too much code", but also think there is some benefit to minimizing exposure to browser compatibility (and realistically don't have the resources to test a bunch of different browsers and operating systems).

alokmenghrajani · 2016-09-08T22:07:45Z

This PR has bugs (e.g. \ud800 by itself, the 365 should probably be 248, allows overlongs, etc.). I don't think it's worth putting more time to fix these issue. I doubt escape/encodeURIComponent needs much cross-browser testing, it's been around for ages.

Jose spec mentions that the plaintext should be encoded as utf-8 bytes. Javascript by default uses utf-16, so we need to convert when encrypting/decrypting/signing. This commit is based on the work by rapropos (see #42). I prefer to minimize the code size by re-using existing string manipulation functions.

alokmenghrajani · 2016-09-15T20:15:41Z

see #46

Jose spec mentions that the plaintext should be encoded as utf-8 bytes. Javascript by default uses utf-16, so we need to convert when encrypting/decrypting/signing. This commit is based on the work by rapropos (see #42). I prefer to minimize the code size by re-using existing string manipulation functions.

unicode support

6dd89c7

rapropos force-pushed the unicode branch from f061dc5 to 6dd89c7 Compare August 4, 2016 20:52

alokmenghrajani reviewed Sep 6, 2016
View reviewed changes

alokmenghrajani reviewed Sep 7, 2016
View reviewed changes

alokmenghrajani mentioned this pull request Sep 15, 2016

Fix utf-8 handling. #46

Merged

2 tasks

alokmenghrajani closed this Sep 15, 2016

Conversation

rapropos commented Aug 4, 2016

Uh oh!

csstaub commented Aug 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rapropos commented Aug 4, 2016

Uh oh!

alokmenghrajani Sep 6, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rapropos commented Sep 7, 2016

Uh oh!

alokmenghrajani commented Sep 7, 2016

Uh oh!

alokmenghrajani Sep 7, 2016

Choose a reason for hiding this comment

Uh oh!

rapropos Sep 7, 2016

Choose a reason for hiding this comment

Uh oh!

alokmenghrajani Sep 7, 2016

Choose a reason for hiding this comment

Uh oh!

rapropos Sep 8, 2016

Choose a reason for hiding this comment

Uh oh!

rapropos commented Sep 7, 2016

Uh oh!

alokmenghrajani commented Sep 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rapropos commented Sep 8, 2016

Uh oh!

alokmenghrajani commented Sep 8, 2016

Uh oh!

alokmenghrajani commented Sep 15, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

csstaub commented Aug 4, 2016 •

edited

Loading

alokmenghrajani Sep 6, 2016 •

edited

Loading

alokmenghrajani commented Sep 7, 2016 •

edited

Loading