Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querystring: avoid conflicts with Object.prototype and __proto__ #5722

Closed
wants to merge 1 commit into from
Closed

querystring: avoid conflicts with Object.prototype and __proto__ #5722

wants to merge 1 commit into from

Conversation

WebReflection
Copy link
Member

Currently, the literal returned object is swallowing all
methods and properties inherited from the Object.prototype.
This includes the __proto__ special case and this PR
aim is to fix all these inherited properties at once,
using the in operator before checking the array of
known keys.

This PR has been previously described and tested in #5650
and it has been made from scratch to avoid confusion.

The benchmark shows overall 4 out of 5 tests are 2% up to 22%
faster and one test, the worst case scenario with many pairs,
slower between 3% and 14% than before.

@Fishrock123 Fishrock123 added the querystring Issues and PRs related to the built-in querystring module. label Mar 15, 2016
@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2016

I finally had some time today to do some perf testing around this. I compared the changes in this PR against current master and both the current node benchmarks and benchmark.js-based benchmarks agree, there is a perf regression for each benchmark input anywhere from 1% to 20% (most are towards the higher end though).

On the other hand, if we take the existing code and merely add a check for if (key === '__proto__') { defineProperty(..); } else { obj[key] = value; ] } inside the key array check block, then the regression is much less signficant, at most 6%. I think having just this check should be safe since it's a special property, unlike the other inherited properties which can be overwritten without a problem.

A third option that had some success was instead creating obj with a null prototype and then only checking if (obj[key] === undefined) instead of checking a key array. This resulted in either no regression, or perf improvements on all but the last benchmark input (many unique keys), where there was ~16% regression. I am not quite sure why that is the case though, since if it was an issue about switching to dictionary mode, that should have already been happening with the existing implementation.

@WebReflection
Copy link
Member Author

Benchmarks they told me to do showed improvements, not regressions, for most common cases where there are not so many pairs. Only in the pairs case there's obviously some regression because of the double in plus keys.indexOf(key) check.

I'd like to see/know/try these benchmarks you are talking about but I'd also would like to raise a concern about targeting __proto__ and __proto__ only.

The bug this PR is trying to solve is any accessor that could be set for whatever obtrusive, polyfill, or valid reason, in the Object.prototype.

Targeting __proto__ only is a patch that doesn't guard, nor grant, any solution for the short and long term.

However, all tests I've seen about null where comparing {} VS Object.create(null) which is a global scope lookup for Object and a method access for create plus the invoke, instead of comparing {} with {__proto__: null} which is easy to optimize at runtime for V8 or any other engine and the (IMO) only legit usage of dunder __proto__.

Have you benchmarked that too?

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2016

I'm using the benchmarks from benchmarks/querystring/querystring-parse.js (I also verified with benchmark.js-based versions of the same exact benchmarks and inputs).

I'm not sure what you're saying about Object.create(null). I did not benchmark {__proto__: null}, but I would assume it would be similar to Object.create(null). Object.create(null) does have overhead compared to merely using {}, but that overhead is typically much more limited since you no longer have to do complicated property checks because there are no inherited properties and you can set __proto__ just fine with a normal obj[key]=value. What I couldn't understand was why setting a lot of keys (as in the manypairs benchmark case) on an object with a null prototype was causing such a regression compared to both the other benchmark cases as well as the existing code in master.

@WebReflection
Copy link
Member Author

Actually, this is the bench.js file I am trying

for (var i = 0, l = 1000, o = {}; i < l; o[parseInt(Math.random() * l)] = i++);
var keys = Object.keys(o);

var i = 0, r1 = 0;
console.time('the `in` operator');
while (i < l) {
  // String(i++) implicit
  // but to have a fair bench ...
  if (String(i++) in o) r1++;
}
console.timeEnd('the `in` operator');


var i = 0, r2 = 0;
console.time('the indexOf() operation');
while (i < l) {
  if (keys.indexOf(String(i++)) !== -1) r2++;
}
console.timeEnd('the indexOf() operation');

console.assert(r1 === r2, 'should have same result, instead: ' + r1 + ' & ' + r2);

And not a single run showed indexOf faster than a in operator.

The average is

$ node bench.js 
the `in` operator: 0.857ms
the indexOf() operation: 4.249ms

I'm really curious to see your benchmarks because things don't look right to me.
Thanks for clarifications.

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2016

@WebReflection The benchmarks are here. It's not always enough to test differences like that in isolation.

@WebReflection
Copy link
Member Author

Yeah, sorry for the overlap, I've received your notification after I've sent the comment but pelase double check my very simple bench that compares in VS indexOf and it's always faster.

Trying to answer your other concerns:

{__proto__: null} would not be sufficient according to what you said previously, right, since any accessor could be attached to Object.prototype?

That's the point of creating {__proto__:null} which is the literal expression for Object.create(null). Since like Object.create(null) a {__proto__:null} instanceof Object is also false, if you use var hash = {__proto__: null}; and then if (key in hash) .... pair ... else .... new you don't suffer problems inherited from Object.prototype, it's safe for __proto__ as well like an Object.create(null) would be for node > 0.10.

TL;DR ... every discussion I've seen (recently) about {} VS Object.create(null) is not considering {__proto__: null} as literal shortcut for Object.create(null)

Regardless the null object, my tests are all faster than current indexOf so I'm not sure why yours are worst. What are we doing differently?

@MylesBorins
Copy link
Member

@WebReflection
Copy link
Member Author

I'm sorry @mscdex but I think you are reading benchmarks upside down.

Both me and @evanlucas can confirm my PR is on average up to 22% faster than current master:

#5650 (comment)

#5650 (comment)

only the last benchmark is 3% up to 14% slower.

Would you mind reconsidering your opinion about this PR? Because I think you are reading the bench in the wrong way.

To sum it up: the first 6 benchmarks are always faster with this PR, only the last bench, which is the worst case, is slower for explained reasons ( the in check and the indexOf due pairs )

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2016

@WebReflection Typically the binary with the new changes is supposed to be first and the old binary is second on the command line (after the -r -g). Are you sure you didn't accidentally swap them?

I am pretty sure the benchmark results on my end are not the "wrong way" as I have been running node.js benchmarks for quite some time. I have even double checked that the binaries are correct by verifying the output of node -pe "require('querystring').parse.toString()".

@WebReflection
Copy link
Member Author

so you are saying that this comment is showing that this PR is slower, while I am saying all tests but last are negative because the PR is faster, right?

Then I don't understand why the only test that has many pairs and will involve both in operation and the indexOf one would be always faster with an extra operation.

That makes absolutely no logic sense to me, but of course I've no idea what V8 would do there to make a double check faster than a single one.

If you are sure 100% this PR is slower (it shouldn't, logically speaking) I can try using {__proto__: null} but that changes surrounding expectations of the resulting object and, usually, it breaks everything because most JS developers still do obj.hasOwnProperty ... assuming every object inherits from Object.prototype

If I've got the test completely wrong though, apologies for pushing twice this PR and thinking you read it wrong. (but I'm going to investigate with some V8 core person about this 'cause again, it doesn't feel right to me)

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2016

FWIW I just benchmarked {__proto__: null} and it is actually the worst yet, ~40-50% regression across all the benchmark inputs. I'm quite surprised there is that much of a difference between it and master (compared to Object.create(null) and master), so v8 must be performing some special optimization in the Object.create(null) case.

@MylesBorins
Copy link
Member

@WebReflection your current PR is failing the linter. Please update and force push

@WebReflection
Copy link
Member Author

Dho ..it's a diff from previous PR , weird it's failing but I'll force push later.

About {__Proto__: null} ... is that the case on a micro benchmark too? As in is always faster than indexOf but in this bench is not, I wouldn't be surprised same happens for literals null objects ...but then we need an exorcism rather than a V8 expert here :D

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2016

@WebReflection I'm only benchmarking with the querystring-parse.js benchmarks because that's ultimately what matters most. When using Object.create(null)/{__proto__:null}, I'm not using in or indexOf(), I'm merely checking if (obj[key] === undefined).

@mscdex mscdex added the semver-major PRs that contain breaking changes and should be released in the next major version. label Mar 15, 2016
@WebReflection
Copy link
Member Author

@mscdex if that's your test you should really go for the in operator which is demonstrated already it's faster. Accessing a property and comparing it with a possibly overwritten reference as undefined could be is not exactly the same as using key in obj

Currently, the literal returned object is swallowing all
methods and properties inherited from the Object.prototype.
This includes the `__proto__` special case and this PR
aim is to fix all these inherited properties at once,
using the `in` operator before checking the array of
known keys.
@WebReflection
Copy link
Member Author

@thealphanerd I've push forced after a commit --amend ... I hope it's OK now

@WebReflection
Copy link
Member Author

so @mscdex with this test:

var r1, r2, i, l = 1000;

i = 0;
console.time('Object.create(null)');
while (i++ < l) r1 = Object.create(null);
console.timeEnd('Object.create(null)');


i = 0;
console.time('{__proto__:null}');
while (i++ < l) r2 = {__proto__:null};
console.timeEnd('{__proto__:null}');


console.assert(Object.keys(r1).join(',') === Object.keys(r2).join(','), 'should have same keys');

I have these results:

[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.945ms
{__proto__:null}: 0.180ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.916ms
{__proto__:null}: 0.174ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.927ms
{__proto__:null}: 0.177ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.944ms
{__proto__:null}: 0.177ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.914ms
{__proto__:null}: 0.175ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.960ms
{__proto__:null}: 0.181ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.933ms
{__proto__:null}: 0.176ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.948ms
{__proto__:null}: 0.176ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.916ms
{__proto__:null}: 0.173ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.935ms
{__proto__:null}: 0.177ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.936ms
{__proto__:null}: 0.202ms
[webreflection@archibold test]$ node bench.js 
Object.create(null): 0.951ms
{__proto__:null}: 0.178ms

As unreliable as it is, it looks reasonable to me.

@mscdex
Copy link
Contributor

mscdex commented Mar 15, 2016

@WebReflection Like I said before, you cannot always rely on simple benchmarks in isolation like that. You need to test the changes (e.g. using the existing benchmarks) in core itself to get more realistic measurements. From my benchmarking (using the existing benchmarks) comparing the two, {__proto__:null} was consistently ~50% slower than current master.

Also, using in for the first check along with Object.create(null), I only see the multivaluemany benchmark consistently ~10% slower, the rest seem to be mostly unaffected.

@WebReflection
Copy link
Member Author

well, if isolation isn't good, what makes you think the artificial benchmark tells the truth?

At this point I am hoping for some V8 expert to tell me what's going on because I can isolate all fast paths but apparently putting them together is slower? Again, nonsense to my understanding on how V8 would optimize.

That being said the key === "__proto__" check is not a real solution to the bigger problem, it's a hack on a known property that won't scale and won't be robust enough, imo.

So what are you proposing to fix the bug #5642 in a broader way?

@jasnell
Copy link
Member

jasnell commented Apr 4, 2016

Alternative fix: #6044

@jasnell
Copy link
Member

jasnell commented Apr 18, 2016

Recommend closing this in favor of #6044

@jasnell
Copy link
Member

jasnell commented Apr 19, 2016

Closing now that #6055 has landed.

@jasnell jasnell closed this Apr 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
querystring Issues and PRs related to the built-in querystring module. semver-major PRs that contain breaking changes and should be released in the next major version.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants