From 70157b9cb7d703ee9c44ff56522c65829a599d67 Mon Sep 17 00:00:00 2001 From: Timothy Gu Date: Tue, 11 May 2021 00:59:30 -0700 Subject: [PATCH] url: forbid certain confusable changes from being introduced by toASCII MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The legacy url.parse() function attempts to convert Unicode domains (IDNs) into their ASCII/Punycode form through the use of the toASCII function. However, toASCII can introduce or remove various characters that at best invalidate the parsed URL, and at worst cause hostname spoofing: url.parse('http://bad.c℀.good.com/').href === 'http://bad.ca/c.good.com/' (from [1]) url.parse('http://\u00AD/bad.com').href === 'http:///bad.com/' While changes to the legacy URL parser are discouraged in general, the security implications here outweigh the desire for strict compatibility. This is since this commit only changes behavior when non-ASCII characters appear in the hostname, an unusual situation for most use cases. Additionally, despite the availability of the WHATWG URL API, url.parse remain widely deployed in the Node.js ecosystem, as exemplified by the recent un-deprecation of the legacy API. This change is similar in spirit to CPython 3.8's change [2] fixing bpo-36216 [3] aka CVE-2019-9636, which also occurred despite potential compatibility concerns. [1]: https://hackerone.com/reports/678487 [2]: https://github.com/python/cpython/commit/16e6f7dee7f02bb81aa6b385b982dcdda5b99286 [3]: https://bugs.python.org/issue36216 PR-URL: https://github.com/nodejs/node/pull/38631 Reviewed-By: James M Snell Reviewed-By: Rich Trott Reviewed-By: Matteo Collina Reviewed-By: Joyee Cheung --- doc/api/errors.md | 9 ++--- doc/api/url.md | 5 +++ lib/url.js | 32 +++++++++++++++-- test/parallel/test-url-parse-invalid-input.js | 34 +++++++++++++++++++ 4 files changed, 74 insertions(+), 6 deletions(-) diff --git a/doc/api/errors.md b/doc/api/errors.md index 3ff3c5c1797332..f199689f2dc432 100644 --- a/doc/api/errors.md +++ b/doc/api/errors.md @@ -1677,10 +1677,10 @@ An invalid URI was passed. ### `ERR_INVALID_URL` -An invalid URL was passed to the [WHATWG][WHATWG URL API] -[`URL` constructor][`new URL(input)`] to be parsed. The thrown error object -typically has an additional property `'input'` that contains the URL that failed -to parse. +An invalid URL was passed to the [WHATWG][WHATWG URL API] [`URL` +constructor][`new URL(input)`] or the legacy [`url.parse()`][] to be parsed. +The thrown error object typically has an additional property `'input'` that +contains the URL that failed to parse. ### `ERR_INVALID_URL_SCHEME` @@ -2824,6 +2824,7 @@ The native call from `process.cpuUsage` could not be processed. [`stream.write()`]: stream.md#stream_writable_write_chunk_encoding_callback [`subprocess.kill()`]: child_process.md#child_process_subprocess_kill_signal [`subprocess.send()`]: child_process.md#child_process_subprocess_send_message_sendhandle_options_callback +[`url.parse()`]: url.md#url_url_parse_urlstring_parsequerystring_slashesdenotehost [`util.getSystemErrorName(error.errno)`]: util.md#util_util_getsystemerrorname_err [`zlib`]: zlib.md [crypto digest algorithm]: crypto.md#crypto_crypto_gethashes diff --git a/doc/api/url.md b/doc/api/url.md index 2330755a6f1108..0aa42418993b3d 100644 --- a/doc/api/url.md +++ b/doc/api/url.md @@ -1232,6 +1232,11 @@ forward-slash characters (`/`) are required following the colon in the