url: enforce valid UTF-8 in WHATWG parser #11436

TimothyGu · 2017-02-17T05:17:59Z

This commit implements the Web IDL USVString conversion, which mandates all unpaired Unicode surrogates be turned into U+FFFD REPLACEMENT CHARACTER. It also disallows Symbols to be used as USVString, per spec.

Checklist

make -j4 test (UNIX), or vcbuild test (Windows) passes
tests and/or benchmarks are included
commit message follows commit guidelines

Affected core subsystem(s)

url

TimothyGu · 2017-02-17T05:20:56Z

CI: https://ci.nodejs.org/job/node-test-pull-request/6465/

TimothyGu · 2017-02-17T05:29:17Z

Some background on why I chose to implement it in C++. I first implemented a fairly optimized JS version, but found out that there is a significant disparity between the best-case performance (validsurr) and the worst-case performance (allinvalid). This, to me, is unacceptable. This C++ implementation is a bit slower for the best case, but a lot faster for the worst case.

JS implementation

function toUSVString(V) {
  if (typeof V === 'symbol')
    throw new TypeError();

  const S = String(V);
  const n = S.length;
  var U = '';
  var lastPos = 0;
  for (var i = 0; i < n; ++i) {
    const c = S.charCodeAt(i);

    if (c < 0xD800 || c > 0xDFFF) {
      continue;
    } else if (0xDC00 <= c && c <= 0xDFFF || i === n - 1) {
      if (lastPos < i)
        U += S.slice(lastPos, i);
      lastPos = i + 1;
      U += '\ufffd';
    } else {
      const d = S.charCodeAt(i + 1);
      if (0xDC00 <= d && d <= 0xDFFF) {
        ++i;
        continue;
      } else {
        if (lastPos < i)
          U += S.slice(lastPos, i);
        lastPos = i + 1;
        U += '\ufffd';
      }
    }
  }
  if (lastPos === 0)
    return V;
  if (lastPos < n)
    return U + S.slice(lastPos, n);
  return U;
}

Benchmark results

h/usvstring.js n=50000000 input="valid" type="js": 15,533,604.407488512
h/usvstring.js n=50000000 input="validsurr" type="js": 17,423,172.615309086
h/usvstring.js n=50000000 input="someinvalid" type="js": 3,411,650.9908266817
h/usvstring.js n=50000000 input="allinvalid" type="js": 2,352,265.169112214
h/usvstring.js n=50000000 input="valid" type="cpp": 16,506,065.368628675
h/usvstring.js n=50000000 input="validsurr" type="cpp": 15,627,668.561148917
h/usvstring.js n=50000000 input="someinvalid" type="cpp": 8,808,868.726046104
h/usvstring.js n=50000000 input="allinvalid" type="cpp": 8,750,259.202803183

targos · 2017-02-17T07:24:51Z

I think we already have somehow a valid implementation for toUSVString in the project. It is used when you call buffer.toString('utf8').

TimothyGu · 2017-02-17T08:28:42Z

@targos, it is a lot slower (10x slower for validsurr) because of string-to-Buffer conversion

targos · 2017-02-17T08:34:40Z

Of course, but can't we reuse it directly on the string, without the conversion?

TimothyGu · 2017-02-17T09:16:08Z

but can't we reuse it directly on the string, without the conversion?

No, unfortunately.

> Buffer.prototype.toString.call('a')
TypeError: this.utf8Slice is not a function
> Buffer.prototype.utf8Slice.call('a')
TypeError: argument should be a Buffer

That notwithstanding, the actual conversion is done under-the-hood at buffer creation-time in StringBytes::Write, which uses the same mechanism as the Utf8Value class, which would work as well, except it is less efficient than the implementation in this PR.

targos · 2017-02-17T10:03:09Z

I didn't mean use the JS function, rather the underlying V8 C++ implementation. I'm sorry for the confusion, I didn't see your comment in node_url.cc that explains why you are not doing that.

mscdex · 2017-02-17T11:44:58Z

src/util.cc

@@ -48,8 +48,7 @@ TwoByteValue::TwoByteValue(Isolate* isolate, Local<Value> value) {
  const size_t storage = string->Length() + 1;
  AllocateSufficientStorage(storage);

-  const int flags =
-      String::NO_NULL_TERMINATION | String::REPLACE_INVALID_UTF8;
+  const int flags = String::NO_NULL_TERMINATION;


I'm not sure this is right, since this would affect more than just the WHATWG URL implementation?

~~Indeed, this is not an acceptable change. You could turn the flags into an optional argument that defaults to what is now or make it a template trait.~~

EDIT: Objection withdrawn.

As I've explained in 424722af4541cd0eec1bf299aa8afcdff6284f52, this flag is actually a no-op. TwoByteValue::TwoByteValue uses v8::String::Write as opposed to v8::String::WriteUtf8. The flag is only respected by WriteUtf8, and hence misleading being applied here in TwoByteValue.

Oh, you're right, I missed that it's the UTF-16 version. Objection withdrawn.

bnoordhuis · 2017-02-17T11:52:44Z

src/node_url.cc

+      uint16_t c = value[i];
+      if (c < 0xD800 || c > 0xDFFF) {
+        continue;
+      } else if (0xDC00 <= c || i == n - 1) {


Avoid mixing normal and Yoda-style comparisons in the same block of code, it makes it harder to understand.

bnoordhuis · 2017-02-17T12:11:33Z

src/node_url.cc

+          value[i] = 0xFFFD;
+        }
+      }
+    }


You can probably get a bit of speedup if you split this loop in a simple linear scan for invalid characters, and a second one that fixes them up.

That optimizes for the common case that of valid UTF-8 and because you can do the scan in JS land with a regex (/[\uD800-\uDBFF]?[\uDC00-\uDFFF]/ if I read the logic correctly), you can avoid a call into C++ land.

Doing the scan in JS land penalizes the uncommon case because C++ lands needs to scan again. Whether the first scan should be in done in JS or C++ land depends on how common the uncommon case is. Not very, I expect.

@bnoordhuis, I'll benchmark this, but is a regex faster than character search these days? Also, the regex is more like /([^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF]([^\uDC00-\uDFFF]|$)/, though if there's a way to simplify that will probably help performance-wise as well.

Macros like this might help to make it a bit easier to read in the C++ land...

#define IS_HIGH_SURROGATE(c) (((c) & 0xFC00) == 0xD800) #define IS_LOW_SURROGATE(c) (((c) & 0xFC00) == 0xDC00)

I think the i++ part is a bit implicit(if I understand correctly this would make the second branch "unexpected" so "changed = true"). Testing for an unexpected low surrogate first, then for high surrogates checks the following character(or EOL) immediately instead of waiting until the next cycle could be easier to read.

@joyeecheung, I've taken up on your suggestion of using macros. However, after doing so, the branching looks like:

uint16_t d = buf[i + 1]; if (IS_SURROGATE_TRAIL(d)) { i++; } else { changed = true; buf[i] = 0xFFFD; }

I personally consider it a better style to always use the positive expression (without !) when an else clause is involved, though I'll see if it is necessary to perhaps add some likely/unlikely annotations if it helps performance.

What I had in mine is something like (sorry for the code dump):

for (size_t i = 0; i < n; ++i) { uint16_t c = value[i]; // paired low surrogate would have been consumed in the previous cycle // so this must be unpaired. if (IS_LOW_SURROGATE(c)) { changed = true; value[i] = UTF_REPLACEMENT; // or break if we optimize for the common case } else if (IS_HIGH_SURROGATE(c)) { ++i; // check the next character immediately if (i == n) { // ends with high surrogate, unpaired changed = true; value[i - 1] = UTF_REPLACEMENT; // or break } else { uint16_t d = value[i]; if (IS_LOW_SURROGATE(d)) { // paired continue; } else { // high surrogate without a following low surrogate, unpaired changed = true; value[i - 1] = UTF_REPLACEMENT; // or break } } } }

EDIT: d should be value[i]

Oh and..no need to replace the value[i] if value[i-1] is unpaired high surrogate

I'll benchmark this, but is a regex faster than character search these days?

I expect the answer is 'yes' for short strings. Calling into C++ land and copying out the string has quite a bit of overhead associated with it. It's amortized for long strings but it's probably quite substantial for short strings (and I assume most inputs will be relatively short.)

Also, the regex is more like /([^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF]([^\uDC00-\uDFFF]|$)/, though if there's a way to simplify that will probably help performance-wise as well.

A simple scan for /[\uD800-\uDFFF]/ followed by a call into C++ land when it tests true may already be good enough.

@joyeecheung, I don't think your code snippet would work for a "high high low" sequence, since in the process of checking for "high" i is advanced twice, so it would go directly to "low" for the next iteration.

@bnoordhuis, indeed a regex is a lot faster than iterating through the string in JS. The seemingly long regex doesn't actually have much adverse performance effect if the last ([^\uDC00-\uDFFF]|$) is turned into a negative lookahead. PTAL.

Oh yes, that is just the general idea typed in the comment box...though I think this can be solved by adding a nested branch for high surrogate and decrease the pointer?

(Just to be clear, I don't mean to block this, just think avoiding hex inside the code could make it easier to understand so threw my 2cents. It's kinda weird to do this kind of review inside comments :P)

bnoordhuis · 2017-02-17T12:13:00Z

src/util.cc

@@ -48,8 +48,7 @@ TwoByteValue::TwoByteValue(Isolate* isolate, Local<Value> value) {
  const size_t storage = string->Length() + 1;
  AllocateSufficientStorage(storage);

-  const int flags =
-      String::NO_NULL_TERMINATION | String::REPLACE_INVALID_UTF8;
+  const int flags = String::NO_NULL_TERMINATION;


~~Indeed, this is not an acceptable change. You could turn the flags into an optional argument that defaults to what is now or make it a template trait.~~

EDIT: Objection withdrawn.

bnoordhuis · 2017-02-17T12:13:45Z

test/common.js

@@ -598,8 +598,7 @@ exports.WPT = {
    try {
      fn();
    } catch (err) {
-      if (err instanceof Error)
-        err.message = `In ${desc}:\n  ${err.message}`;
+      console.error(`In ${desc}:`);


Left-over debug code?

No, it was intentional, since it seems like after the Error is constructed changing the message doesn't change stack. (See 56f6b8dbad36a46f07668b2b35574dc367e6d146.)

Hmm, yes, to patch the error message we would have to extend the Error class, something like:

class WPTError extends Error { constructor(desc, err) { super(`In ${desc}, ${err.name}: ${err.message}`); Error.captureStackTrace(this, WPTError); } } throw new WPTError(desc, err);

But then console.error() works too, though we would see two error traces this way?

hmm.. it appears to:

> var m = new Error('test') undefined > m.message = 'foo' 'foo' > m Error: foo at repl:1:9 at ContextifyScript.Script.runInThisContext (vm.js:23:33) at REPLServer.defaultEval (repl.js:340:29) at bound (domain.js:280:14) at REPLServer.runBound [as eval] (domain.js:293:12) at REPLServer.onLine (repl.js:537:10) at emitOne (events.js:101:20) at REPLServer.emit (events.js:189:7) at REPLServer.Interface._onLine (readline.js:238:10) at REPLServer.Interface._line (readline.js:582:8)

Oh, looks like it's the Error.captureStackTrace call in the AssertionError constructor "freezes" the stack. To recapture the stack, we need to:

diff --git a/test/common.js b/test/common.js index 5f7dc25..f74d5f0 100644 --- a/test/common.js +++ b/test/common.js @@ -598,8 +598,10 @@ exports.WPT = { try { fn(); } catch (err) { - if (err instanceof Error) + if (err instanceof Error) { err.message = `In ${desc}:\n ${err.message}`; + Error.captureStackTrace(err); + } throw err; } },

@joyeecheung, calling Error.captureStackTrace() there will only capture the stack of WPT.test, not the actual stack from err.

Hmm, yes, this probably deserves another PR to sort out. I am fine with a console.error work around.

jasnell · 2017-02-17T16:07:23Z

I'm wondering if a more efficient solution would be to make a USVStringValue alternative to UTF8Value (see https://github.com/nodejs/node/blob/master/src/node_url.cc#L1321). When the input is provided to the Parse() function within node_url.cc, it is currently interpreted as a UTF-8 string using UTF8Value. This ends up writing the string bytes out which means if we do the toUSVString() and then pass it off to Parse(), we'll be doing to write twice. By doing the conversion at that point, we can avoid the extra trip across the js/c++ boundary, we cover all of the setters, and there would be no need to export the toUSVString() function.

TimothyGu · 2017-02-17T16:56:09Z

I'm wondering if a more efficient solution would be to make a USVStringValue alternative to UTF8Value (see https://github.com/nodejs/node/blob/master/src/node_url.cc#L1321).

Certainly. I'll see what I can do for this.

there would be no need to export the toUSVString() function.

I think it'll still be needed for the URLSearchParams interface, which is implemented entirely in JS.

jasnell · 2017-02-17T17:04:46Z

I think it'll still be needed for the URLSearchParams interface

True. That said, I'm working on a C/C++ querystring parser implementation and evaluating the performance. It will likely make the most sense to keep that impl in JS land but we'll see what kind of numbers I can get.

TimothyGu · 2017-02-20T00:50:54Z

@jasnell, after investigating, I realized that a TwoByteValue-based class that USVString is not helpful, at least in context of this PR. The current C++ Parse() method used by the setters already uses Utf8Value, which does the same thing as ToUSVString(), so nothing to change there. On the other hand, certain setters operate on the stringified parameter in JS before calling the binding's Parse(), so an efficient ToUSVString() method is needed for those methods anyway.

While the double conversion cannot be gotten rid of in the setters, the domainTo*() JS methods are a different case. I've simply gotten rid of the JS wrappers to handle String conversion fully in C++.

TimothyGu · 2017-02-20T06:26:01Z

Tried to address comments as far as possible. Changes:

Use a regex as an initial pass in JS (@bnoordhuis)
Minimize usage of toUSVString in domainTo* (@jasnell)
Use macros (@joyeecheung)

PTAL.

CI: https://ci.nodejs.org/job/node-test-pull-request/6504/

joyeecheung · 2017-02-20T06:53:24Z

src/node_url.cc

+        value[i] = UNICODE_REPLACEMENT_CHARACTER;
+      } else {
+        uint16_t d = value[i + 1];
+        if (U16_IS_TRAIL(d)) {


Why are there two flavors of macros? If I understand correctly, U16_IS_SURROGATE_TRAIL means "we already know it is a surrogate, now we are testing if it is the trail surrogate", and U16_IS_TRAIL means "we are testing if it is the trail surrogate". If so, the first one can just be replaced by the second one?

It's probably micro-optimization, but U16_IS_SURROGATE_LEAD/TRAIL can be reduced to two instructions (on x86 at least, and + j(n)z), while U16_IS_LEAD/TRAIL each requires three (and + cmp + je). In most cases, the macros come for free anyway from ICU4C, which is why I took the liberty to use them.

joyeecheung · 2017-02-20T07:06:25Z

lib/internal/url.js

+    throw new TypeError('Cannot convert a Symbol value to a string');
+  const str = String(val);
+  // As of V8 5.5, str.search (and usvRe[@@search]) are slower than usvRe.exec.
+  const match = usvRe.exec(str);


Maybe just .test because we don't need the match to do anything further

Ooops, missed the match.index part below, never mind.

Review of an older version

TimothyGu · 2017-02-22T01:31:24Z

Rebased. New CI: https://ci.nodejs.org/job/node-test-pull-request/6537/

bnoordhuis · 2017-02-22T15:21:51Z

lib/internal/url.js

+function toUSVString(val) {
+  if (typeof val === 'symbol')
+    throw new TypeError('Cannot convert a Symbol value to a string');
+  const str = String(val);


I'd just val = '' + val if you want to coerce, that way you don't have to duplicate the 'is symbol?' check.

bnoordhuis · 2017-02-22T15:31:01Z

src/node_url.cc

+#define U16_IS_SURROGATE(c) (((c) & 0xF800) == 0xD800)
+#define U16_IS_SURROGATE_LEAD(c) (((c) & 0x400) == 0)
+#define U16_IS_SURROGATE_TRAIL(c) (((c) & 0x400) != 0)
+#endif


I'd turn these into inline functions. It adds an extra bit of type checking and avoids possible divergence between intl and non-intl builds.

bnoordhuis · 2017-02-22T15:39:53Z

src/node_url.cc

+    Local<String> str;
+    if (!args[0]->ToString(env->context()).ToLocal(&str))
+      return;
+    Utf8Value value(env->isolate(), str);


This seems like an unrelated change...

bnoordhuis · 2017-02-22T15:40:03Z

src/node_url.cc

+    Local<String> str;
+    if (!args[0]->ToString(env->context()).ToLocal(&str))
+      return;
+    Utf8Value value(env->isolate(), str);


...as does this.

TimothyGu · 2017-02-23T05:44:48Z

@bnoordhuis, comments addressed. PTAL.

CI: https://ci.nodejs.org/job/node-test-pull-request/6561/

TimothyGu · 2017-02-25T08:05:26Z

@bnoordhuis, ping?

joyeecheung · 2017-02-25T12:59:04Z

lib/internal/url.js

@@ -23,6 +23,17 @@ const IteratorPrototype = Object.getPrototypeOf(
  Object.getPrototypeOf([][Symbol.iterator]())
 );

+const usvRe =
+    /([^\uD800-\uDBFF]|^)[\uDC00-\uDFFF]|[\uD800-\uDBFF](?![\uDC00-\uDFFF])/;


Maybe a comment here explaining this regexp is testing if the string contains unpaired surrogates, or rename it to something self-explanatory..(for a name like usvRe I would expect it is testing all the characters are in the range of unicode scalar values)

+1. Changed it to unpairedSurrogateRe.

joyeecheung · 2017-02-25T13:29:22Z

src/node_url.cc

@@ -1351,6 +1368,41 @@ namespace url {
                            v8::NewStringType::kNormal).ToLocalChecked());
  }

+  static void ToUSVString(const FunctionCallbackInfo<Value>& args) {
+    Environment* env = Environment::GetCurrent(args);
+    CHECK_GE(args.Length(), 2);


CHECK_EQ?

All the existing functions use CHECK_GE, and I don't see a reason to be stricter than what this function actually uses.

Yeah there are functions in other files doing EQ...not sure if we have a convention or not, just think GE is implying there could be more args, which doesn't seem to be the case for this one, though I don't feel very strongly about this.

joyeecheung · 2017-02-25T13:31:42Z

src/node_url.cc

+    const size_t n = value.length();
+
+    const int64_t start = args[1]->IntegerValue(env->context()).FromJust();
+    CHECK_GE(start, 0);


Maybe another CHECK_LT(start, n)? Doesn't do any harm even if start is larger though.

I'm only checking start >= 0 because I'm converting start to a size_t, which is an unsigned type. In C++, signed-to-unsigned conversion, though a defined operation, acts oddly when the signed value is negative. On the other hand, n >= start is a more benign case, and I don't think requires the full effects of a runtime assertion (i.e. crashing).

Oh yep, not worth a full on abort ;)

joyeecheung · 2017-02-25T13:40:14Z

test/parallel/test-whatwg-url-properties.js

@@ -25,6 +25,9 @@ const expected = ['toString',
                  'username', 'password', 'host', 'hostname', 'port',
                  'pathname', 'search', 'searchParams', 'hash', 'toJSON'];

+const obj = { toString() { throw new Error('toString'); } };


I think these tests can be in test-whatwg-url-setters.js? This file is testing the property attributes are defined properly, the added tests are testing the setters are implemented properly.

test-whatwg-url-setters is a WPT-derived file, and therefore doesn't pertain to Node.js-specific things like error messages, for which these tests check.

I think you have already added non-WPT tests to that file? Most error message validation tests are placed this way(after WPT) though.

Right, I overlooked those other tests that have Node.js-specific ones following WPT. Should now be addressed in 786aead7de3898012512915a10f4685a13f478d0

bnoordhuis

LGTM with a style nit.

bnoordhuis · 2017-02-27T11:36:29Z

lib/internal/url.js

@@ -206,8 +217,10 @@ function onParseHashComplete(flags, protocol, username, password,

 class URL {
  constructor(input, base) {
+    // toUSVString unneeded


Can you punctuate these comments?

extremely minor nit: s/unneeded/is not needed. here and throughout

jasnell · 2017-02-27T22:02:57Z

lib/internal/url.js

@@ -206,8 +217,10 @@ function onParseHashComplete(flags, protocol, username, password,

 class URL {
  constructor(input, base) {
+    // toUSVString unneeded


extremely minor nit: s/unneeded/is not needed. here and throughout

Changing err.message after the construction of Error doesn't seem to change err.stack.

String::REPLACE_INVALID_UTF8 is only applied in V8's String::WriteUtf8() (i.e. Utf8Value).

This commit implements the Web IDL USVString conversion, which mandates all unpaired Unicode surrogates be turned into U+FFFD REPLACEMENT CHARACTER. It also disallows Symbols to be used as USVString per spec. Certain functions call into C++ methods in the binding that use the Utf8Value class to access string arguments. Utf8Value already does the normalization using V8's String::Write, so in those cases, instead of doing the full USVString normalization, only a symbol check is done (`'' + val`, which uses ES's ToString, versus `String()` which has special provisions for symbols).

TimothyGu · 2017-02-28T00:34:45Z

@bnoordhuis, @jasnell, comments addressed. Will land tomorrow if nothing comes up.

CI: https://ci.nodejs.org/job/node-test-commit/8160/

TimothyGu · 2017-03-01T02:39:57Z

Landed in 7ceea2a...6123ed5.

Changing err.message after the construction of Error doesn't seem to change err.stack. PR-URL: #11436 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

String::REPLACE_INVALID_UTF8 is only applied in V8's String::WriteUtf8() (i.e. Utf8Value). PR-URL: #11436 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

This commit implements the Web IDL USVString conversion, which mandates all unpaired Unicode surrogates be turned into U+FFFD REPLACEMENT CHARACTER. It also disallows Symbols to be used as USVString per spec. Certain functions call into C++ methods in the binding that use the Utf8Value class to access string arguments. Utf8Value already does the normalization using V8's String::Write, so in those cases, instead of doing the full USVString normalization, only a symbol check is done (`'' + val`, which uses ES's ToString, versus `String()` which has special provisions for symbols). PR-URL: #11436 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

PR-URL: #11436 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

evanlucas · 2017-03-07T11:22:15Z

This is not landing cleanly on v7.x-staging. Want to submit a backport PR?

nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. dont-land-on-v4.x whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Feb 17, 2017

TimothyGu mentioned this pull request Feb 17, 2017

url: track status of a spec-compliant URLSearchParams #10821

Closed

11 tasks

mscdex reviewed Feb 17, 2017

View reviewed changes

bnoordhuis previously requested changes Feb 17, 2017

View reviewed changes

TimothyGu force-pushed the url-usvstring branch 2 times, most recently from 0098309 to 032f311 Compare February 20, 2017 06:23

joyeecheung reviewed Feb 20, 2017

View reviewed changes

TimothyGu force-pushed the url-usvstring branch from 032f311 to 69c0754 Compare February 22, 2017 01:30

bnoordhuis requested changes Feb 22, 2017

View reviewed changes

TimothyGu force-pushed the url-usvstring branch from 69c0754 to e48d3d3 Compare February 23, 2017 05:41

joyeecheung reviewed Feb 25, 2017

View reviewed changes

bnoordhuis approved these changes Feb 27, 2017

View reviewed changes

jasnell approved these changes Feb 27, 2017

View reviewed changes

TimothyGu added 4 commits February 27, 2017 16:27

test: fix WPT.test()'s error handling

db850d7

Changing err.message after the construction of Error doesn't seem to change err.stack.

src: remove misleading flag in TwoByteValue

2a2b4c7

String::REPLACE_INVALID_UTF8 is only applied in V8's String::WriteUtf8() (i.e. Utf8Value).

benchmark: add USVString conversion benchmark

19b37dc

TimothyGu force-pushed the url-usvstring branch from 786aead to 19b37dc Compare February 28, 2017 00:32

TimothyGu closed this Mar 1, 2017

TimothyGu deleted the url-usvstring branch March 1, 2017 02:40

TimothyGu added a commit that referenced this pull request Mar 1, 2017

benchmark: add USVString conversion benchmark

6123ed5

PR-URL: #11436 Reviewed-By: Ben Noordhuis <info@bnoordhuis.nl> Reviewed-By: James M Snell <jasnell@gmail.com>

evanlucas added the dont-land-on-v7.x label Mar 7, 2017

jasnell mentioned this pull request Apr 4, 2017

8.0.0 Release Proposal #12220

Closed

TimothyGu mentioned this pull request Apr 19, 2017

(v7.x backport) url: updates to the WHATWG URL parser #12507

Merged

4 tasks

TimothyGu added backport-requested-v7.x and removed dont-land-on-v7.x labels Apr 19, 2017

TimothyGu added backported-to-v7.x and removed backport-requested-v7.x labels Apr 27, 2017

url: enforce valid UTF-8 in WHATWG parser #11436

url: enforce valid UTF-8 in WHATWG parser #11436

Conversation

TimothyGu commented Feb 17, 2017

Checklist

Affected core subsystem(s)

TimothyGu commented Feb 17, 2017

TimothyGu commented Feb 17, 2017 • edited Loading

targos commented Feb 17, 2017

TimothyGu commented Feb 17, 2017

targos commented Feb 17, 2017

TimothyGu commented Feb 17, 2017

targos commented Feb 17, 2017

Choose a reason for hiding this comment

bnoordhuis Feb 17, 2017 • edited Loading

Choose a reason for hiding this comment

TimothyGu Feb 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joyeecheung Feb 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnoordhuis Feb 19, 2017 • edited by TimothyGu Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnoordhuis Feb 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joyeecheung Feb 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jasnell commented Feb 17, 2017

TimothyGu commented Feb 17, 2017 • edited Loading

jasnell commented Feb 17, 2017

TimothyGu commented Feb 20, 2017

TimothyGu commented Feb 20, 2017

Choose a reason for hiding this comment

TimothyGu Feb 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyGu commented Feb 22, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyGu commented Feb 23, 2017

TimothyGu commented Feb 25, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyGu Feb 25, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bnoordhuis left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TimothyGu commented Feb 28, 2017

TimothyGu commented Mar 1, 2017

evanlucas commented Mar 7, 2017

TimothyGu commented Feb 17, 2017 •

edited

Loading

bnoordhuis Feb 17, 2017 •

edited

Loading

TimothyGu Feb 17, 2017 •

edited

Loading

joyeecheung Feb 18, 2017 •

edited

Loading

bnoordhuis Feb 19, 2017 •

edited by TimothyGu

Loading

bnoordhuis Feb 17, 2017 •

edited

Loading

joyeecheung Feb 17, 2017 •

edited

Loading

TimothyGu commented Feb 17, 2017 •

edited

Loading

TimothyGu Feb 20, 2017 •

edited

Loading

TimothyGu Feb 25, 2017 •

edited

Loading