src: do not ignore IDNA conversion error #11549

TimothyGu · 2017-02-25T08:31:49Z

Currently, the ICU-based IDNA conversion methods only return errors on those passed along through a UErrorCode. However, according to ICU's documentation for uidna_nameToASCII(),

If any processing step fails, then pInfo->errors will be non-zero and the result might not be an ASCII string. The domain name might be modified according to the types of errors. Labels with severe errors will be left in (or turned into) their Unicode form.

The UErrorCode indicates an error only in exceptional cases, such as a U_MEMORY_ALLOCATION_ERROR.

In other words, when non-catastrophically invalid domains are passed, ToASCII() and ToUnicode() (and their downstream url.domainToASCII() and url.domainToUnicode()) currently return garbled domain names instead of errors.

This PR makes the C++ binding methods report errors in pInfo->errors in addition to UErrorCode, thereby fixing those aforementioned problems.

Also included in this PR are additional tests for invalid situations as well as documentation clarifications for the user-facing url.domainToASCII() and url.domainToUnicode().

Before vs. after

> url.domainToASCII('\ufffd.com')
'�.com'
> url.domainToUnicode('xn---\x03.com')
'xn---\u0003.com'
> process.binding('icu').toASCII('\ufffd.com')
'�.com'
> process.binding('icu').toUnicode('xn---\x03.com')
'xn---\u0003.com'
> process.binding('icu').toUnicode('xn--- .com')
'xn--- .com'

> url.domainToASCII('\ufffd.com')
''
> url.domainToUnicode('xn---\x03.com')
''
> process.binding('icu').toASCII('\ufffd.com')
Error: Cannot convert name to ASCII
    at repl:1:24
> process.binding('icu').toUnicode('xn---\x03.com')
Error: Cannot convert name to Unicode
    at repl:1:24
> process.binding('icu').toUnicode('xn--- .com')
Error: Cannot convert name to Unicode
    at repl:1:24

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
documentation is changed or added
commit message follows commit guidelines

Affected core subsystem(s)

TimothyGu · 2017-02-25T08:32:53Z

CI: https://ci.nodejs.org/job/node-test-pull-request/6585/

TimothyGu · 2017-02-25T09:53:34Z

Hopefully the issue with legacy url parser is fixed.

/cc @nodejs/intl @nodejs/url

New CI: https://ci.nodejs.org/job/node-test-pull-request/6586/

addaleax · 2017-02-25T16:09:59Z

doc/api/url.md

@@ -1007,7 +1008,8 @@ the new `URL` implementation but is not part of the WHATWG URL standard.
 * `domain` {String}
 * Returns: {String}

-Returns the Unicode serialization of the `domain`.
+Returns the Unicode serialization of the `domain`. If `domain` is an invalid


Btw, should this be deserialization, and mention that it is the inverse of domainToASCII?

It is serialization, since the domain is fully parsed and subsequently serialized from the parsed form. It's just that it uses a different algorithm for deserialization.

addaleax · 2017-02-25T16:12:15Z

src/node_i18n.cc

@@ -489,8 +492,11 @@ static void ToUnicode(const FunctionCallbackInfo<Value>& args) {
  CHECK_GE(args.Length(), 1);
  CHECK(args[0]->IsString());
  Utf8Value val(env->isolate(), args[0]);
+  // optional arg
+  bool lenient = args[1].As<Boolean>()->Value();


Can you update the args.Length() check above to use 2? Also, you probably want to add a CHECK(args[1]->IsBoolean()); or do args[1]->BooleanValue() instead.

I didn't update the check for argument length, since (as the comment is trying to say) it is an optional argument, so that existing usage of toUnicode(str) would still work. V8 automatically returns an Undefined for out-of-range args[] dereference.

Wasn't aware of BooleanValue(). Will use that instead.

addaleax · 2017-02-25T16:12:22Z

src/node_i18n.cc

@@ -508,8 +514,11 @@ static void ToASCII(const FunctionCallbackInfo<Value>& args) {
  CHECK_GE(args.Length(), 1);
  CHECK(args[0]->IsString());
  Utf8Value val(env->isolate(), args[0]);
+  // optional arg
+  bool lenient = args[1].As<Boolean>()->Value();


addaleax · 2017-02-25T16:15:00Z

src/node_i18n.cc

  MaybeStackBuffer<char> buf;
-  int32_t len = ToASCII(&buf, *val, val.length());
+  int32_t len = ToASCII(&buf, *val, val.length(), lenient);

  if (len < 0) {
    return env->ThrowError("Cannot convert name to ASCII");


Is this error part of any non-experimental API? Could we change it to Cannot encode name to ASCII as Punycode?

Yes for toASCII

> url.parse(`http://${'é'.repeat(230)}.com/`) Error: Cannot convert name to ASCII

addaleax · 2017-02-25T16:15:57Z

src/node_i18n.cc

  MaybeStackBuffer<char> buf;
-  int32_t len = ToUnicode(&buf, *val, val.length());
+  int32_t len = ToUnicode(&buf, *val, val.length(), lenient);

  if (len < 0) {
    return env->ThrowError("Cannot convert name to Unicode");


Is this error part of any non-experimental API? Could we change it to Cannot decode name as Punycode? (basically the same question I also posted below).

No; in fact the toUnicode JS function isn't used in the code base at all. Maybe we should just remove this method?

/cc @jasnell

If it's not used, it can be removed.

Remove which function specifically? The `i18n::ToUnicode' function is definitely used.

@jasnell, the exposed process.binding('icu').toUnicode() JS function.

addaleax · 2017-02-25T23:02:55Z

src/node_i18n.cc

@@ -493,7 +493,7 @@ static void ToUnicode(const FunctionCallbackInfo<Value>& args) {
  CHECK(args[0]->IsString());
  Utf8Value val(env->isolate(), args[0]);
  // optional arg
-  bool lenient = args[1].As<Boolean>()->Value();
+  bool lenient = args[1]->BooleanValue().FromJust();


Does this compile? Seems like the env->context() argument is missing

@addaleax, you are right. Forgot to push fde77b3

bnoordhuis · 2017-02-27T12:42:35Z

src/node_i18n.cc

  MaybeStackBuffer<char> buf;
-  int32_t len = ToUnicode(&buf, *val, val.length());
+  int32_t len = ToUnicode(&buf, *val, val.length(), lenient);

  if (len < 0) {
    return env->ThrowError("Cannot convert name to Unicode");


If it's not used, it can be removed.

joyeecheung

This should also fix the missing errors when parsing percent-encoded disallowed characters in hosts(https://github.com/nodejs/node/blob/master/test/fixtures/url-tests.js#L4499) since we are no longer ignoring UIDNA_ERROR_DISALLOWED, you can turn them on in this PR if you like.

jasnell · 2017-02-27T20:06:45Z

CI: https://ci.nodejs.org/job/node-test-pull-request/6605/

TimothyGu · 2017-02-27T20:41:26Z

@jasnell, did you see #11549 (comment)?

Old behavior can be restored using a special `lenient` mode.

- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests

TimothyGu · 2017-02-28T01:31:22Z

Test re-enabled per @joyeecheung. Will land tomorrow.

CI: https://ci.nodejs.org/job/node-test-pull-request/6619/

TimothyGu · 2017-03-01T02:33:04Z

Landed in a520508...7ceea2a. toUnicode() isn't removed for now, since it provides equivalence to the punycode module, and though unused in the code base it is well-tested.

Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>

- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>

PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>

Old behavior can be restored using a special `lenient` mode, as used in the legacy URL parser. PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>

- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>

PR-URL: #11549 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. i18n-api Issues and PRs related to the i18n implementation. labels Feb 25, 2017

TimothyGu added the whatwg-url Issues and PRs related to the WHATWG URL implementation. label Feb 25, 2017

TimothyGu force-pushed the idna branch from 50e7b4a to 0b3d177 Compare February 25, 2017 09:52

TimothyGu added the url Issues and PRs related to the legacy built-in url module. label Feb 25, 2017

TimothyGu changed the title ~~src: bail on IDNA conversion error~~ src: do not ignore IDNA conversion error Feb 25, 2017

addaleax reviewed Feb 25, 2017

View reviewed changes

addaleax approved these changes Feb 25, 2017

View reviewed changes

TimothyGu added dont-land-on-v4.x labels Feb 27, 2017

bnoordhuis approved these changes Feb 27, 2017

View reviewed changes

joyeecheung reviewed Feb 27, 2017

View reviewed changes

jasnell approved these changes Feb 27, 2017

View reviewed changes

TimothyGu added 3 commits February 27, 2017 16:40

src: do not ignore IDNA conversion error

a6fef08

Old behavior can be restored using a special `lenient` mode.

test: more comprehensive IDNA test cases

f4b0403

- Split the tests out to a separate file - Add invalid cases - Add tests for url.domainTo*() - Re-enable previously broken WPT URL parsing tests

doc: document WHATWG IDNA methods' error handling

d9a8369

TimothyGu force-pushed the idna branch from fde77b3 to d9a8369 Compare February 28, 2017 01:30

joyeecheung approved these changes Feb 28, 2017

View reviewed changes

TimothyGu closed this Mar 1, 2017

TimothyGu deleted the idna branch March 1, 2017 02:33

TimothyGu added this to Done in WHATWG URL implementation Mar 1, 2017

TimothyGu mentioned this pull request Mar 6, 2017

URL parser inconsistency. #11707

Closed

evanlucas mentioned this pull request Mar 8, 2017

v7.7.2 proposal #11745

Merged

TimothyGu mentioned this pull request Mar 9, 2017

Perhaps do not apply ToASCII for ASCII-only input whatwg/url#267

Closed

TimothyGu mentioned this pull request Mar 19, 2017

url: track WHATWG URL issue #216: Host parser UTF-8 failure #11000

Closed

rmisev mentioned this pull request Apr 26, 2017

domainToASCII and domainToUnicode in intl nodejs/Intl#44

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

src: do not ignore IDNA conversion error #11549

src: do not ignore IDNA conversion error #11549

TimothyGu commented Feb 25, 2017

TimothyGu commented Feb 25, 2017

TimothyGu commented Feb 25, 2017 •

edited

Loading

addaleax Feb 25, 2017

TimothyGu Feb 25, 2017 •

edited

Loading

addaleax Feb 25, 2017

TimothyGu Feb 25, 2017

addaleax Feb 25, 2017

addaleax Feb 25, 2017

TimothyGu Feb 25, 2017

addaleax Feb 25, 2017

TimothyGu Feb 25, 2017

bnoordhuis Feb 27, 2017

jasnell Feb 27, 2017

TimothyGu Feb 27, 2017

addaleax Feb 25, 2017

TimothyGu Feb 25, 2017

bnoordhuis Feb 27, 2017

joyeecheung left a comment

jasnell commented Feb 27, 2017

TimothyGu commented Feb 27, 2017

TimothyGu commented Feb 28, 2017

TimothyGu commented Mar 1, 2017 •

edited

Loading

src: do not ignore IDNA conversion error #11549

src: do not ignore IDNA conversion error #11549

Conversation

TimothyGu commented Feb 25, 2017

Checklist

Affected core subsystem(s)

TimothyGu commented Feb 25, 2017

TimothyGu commented Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

TimothyGu Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joyeecheung left a comment

Choose a reason for hiding this comment

jasnell commented Feb 27, 2017

TimothyGu commented Feb 27, 2017

TimothyGu commented Feb 28, 2017

TimothyGu commented Mar 1, 2017 • edited Loading

TimothyGu commented Feb 25, 2017 •

edited

Loading

TimothyGu Feb 25, 2017 •

edited

Loading

TimothyGu commented Mar 1, 2017 •

edited

Loading