Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querystring: fix empty pairs handling #11234

Closed

Conversation

@mscdex
Copy link
Contributor

commented Feb 8, 2017

This is a follow up to #10967, which is (currently) a semver-major.

This PR fixes outstanding issues stemming from the aforementioned PR and includes the tests from #11171. FWIW the original issue for all of this was that empty query string key/value pairs before the end of the string were not being ignored like an empty key/value at the end of the string would be.

I've also included some improvements to the parser in general, along with a couple of new benchmarks to highlight some of those improvements. Here are the results for benchmark/querystring/querystring-parse.js:

                                                                  improvement confidence      p.value
 querystring/querystring-parse.js n=1000000 type="altspaces"          84.35 %        *** 1.191823e-35
 querystring/querystring-parse.js n=1000000 type="encodefake"          2.96 %          * 4.610320e-02
 querystring/querystring-parse.js n=1000000 type="encodelast"          4.78 %         ** 7.673085e-03
 querystring/querystring-parse.js n=1000000 type="encodemany"          2.83 %            5.099655e-02
 querystring/querystring-parse.js n=1000000 type="manyblankpairs"    192.66 %        *** 4.461978e-48
 querystring/querystring-parse.js n=1000000 type="manypairs"          -0.14 %            9.307657e-01
 querystring/querystring-parse.js n=1000000 type="multicharsep"        1.65 %            2.795571e-01
 querystring/querystring-parse.js n=1000000 type="multivalue"          1.40 %            3.885362e-01
 querystring/querystring-parse.js n=1000000 type="multivaluemany"      1.82 %            4.116370e-01
 querystring/querystring-parse.js n=1000000 type="noencode"           -0.45 %            8.223732e-01

CI: https://ci.nodejs.org/job/node-test-pull-request/6279/

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)
  • querystring
mscdex added 2 commits Feb 8, 2017
querystring: fix empty pairs and optimize parse()
This commit fixes handling of empty pairs that occur before the end
of the query string so that they are also ignored.

Additionally, some optimizations have been made, including:

* Avoid unnecessary code execution where possible
* Use a lookup table when checking for hex characters
* Avoid forced decoding when '+' characters are encountered and we
are using the default decoder

Fixes: #10454

@mscdex mscdex force-pushed the mscdex:querystring-fix-empty-pairs-handling branch to 7b72507 Feb 8, 2017

encodeCheck = 1;
continue;
} else if (encodeCheck > 0) {
// eslint-disable-next-line no-extra-boolean-cast

This comment has been minimized.

Copy link
@mscdex

mscdex Feb 8, 2017

Author Contributor

If anyone is curious, this was added because there seems to be a positive difference when explicitly converting to a boolean in the conditional vs. implicitly converting the numeric value (or undefined) to a boolean. eslint doesn't know that isHexTable contains non-booleans, so I explicitly silence these "errors."

This comment has been minimized.

Copy link
@micnic

micnic Feb 10, 2017

Contributor

Isn't it better to use isHexTable[code] > 0 as a more readable option? The performance seems to be quite similar

This comment has been minimized.

Copy link
@mscdex

mscdex Feb 10, 2017

Author Contributor

That's debatable. I think the explicit conversion is easier to reason about than remembering how undefined values compare with numeric values.

This comment has been minimized.

Copy link
@micnic

micnic Feb 10, 2017

Contributor

I agree, in this case I'd propose isHexTable[code] === 1 as an alternative

This comment has been minimized.

Copy link
@seishun

seishun Feb 10, 2017

Member

If anyone is curious, this was added because there seems to be a positive difference when explicitly converting to a boolean in the conditional vs. implicitly converting the numeric value (or undefined) to a boolean.

Could you clarify what you mean by the "positive difference"?

This comment has been minimized.

Copy link
@mscdex

mscdex Feb 10, 2017

Author Contributor

Could you clarify what you mean by the "positive difference"?

An increase in performance?

This comment has been minimized.

Copy link
@seishun

seishun Feb 10, 2017

Member

That's interesting. AFAIK if (foo) has exactly the same behavior as if (!!foo), so I'm not sure why the performance would be different.

If there really is a difference, then I propose adding a comment explaining that.

// timed loop. Instead, just execute the function a "sufficient" number of
// times before the timed loop to ensure the function is optimized just once.
if (type === 'multicharsep') {
for (i = 0; i < n; i += 1)

This comment has been minimized.

Copy link
@joyeecheung

joyeecheung Feb 8, 2017

Member

Does this need to be n? Refs: v8/src/runtime-profiler.cc, this might change but 1e3 should probabaly be safe.

// Number of times a function has to be seen on the stack before it is
// compiled for baseline.
static const int kProfilerTicksBeforeBaseline = 1;
// Number of times a function has to be seen on the stack before it is
// optimized.
static const int kProfilerTicksBeforeOptimization = 2;
// If the function optimization was disabled due to high deoptimization count,
// but the function is hot and has been seen on the stack this number of times,
// then we try to reenable optimization for this function.
static const int kProfilerTicksBeforeReenablingOptimization = 250;
// If a function does not have enough type info (according to
// FLAG_type_info_threshold), but has seen a huge number of ticks,
// optimize it as it is.
static const int kTicksWhenNotEnoughTypeInfo = 100;
// We only have one byte to store the number of ticks.
STATIC_ASSERT(kProfilerTicksBeforeOptimization < 256);
STATIC_ASSERT(kProfilerTicksBeforeReenablingOptimization < 256);
STATIC_ASSERT(kTicksWhenNotEnoughTypeInfo < 256);

This comment has been minimized.

Copy link
@mscdex

mscdex Feb 8, 2017

Author Contributor

Originally I used n / 2, but I seemed to get better (less variable) results with n, especially considering the default value of n. Either way, n should be enough no matter what, so I kept it.

@joyeecheung joyeecheung referenced this pull request Feb 9, 2017
3 of 3 tasks complete
@stevenvachon

This comment has been minimized.

Copy link

commented Feb 9, 2017

Please merge asap.

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Feb 9, 2017

@stevenvachon Other @nodejs/collaborators need to sign off on this first.

@micnic
micnic approved these changes Feb 10, 2017
@stevenvachon

This comment has been minimized.

Copy link

commented Feb 13, 2017

Hate to nag, but how about now? I'd really like to play with this in a nightly build.

jasnell added a commit that referenced this pull request Feb 13, 2017
querystring: fix empty pairs and optimize parse()
This commit fixes handling of empty pairs that occur before the end
of the query string so that they are also ignored.

Additionally, some optimizations have been made, including:

* Avoid unnecessary code execution where possible
* Use a lookup table when checking for hex characters
* Avoid forced decoding when '+' characters are encountered and we
are using the default decoder

Fixes: #10454
PR-URL: #11234
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Nicu Micleușanu <micnic90@gmail.com>
jasnell added a commit that referenced this pull request Feb 13, 2017
test: improve querystring.parse assertion messages
PR-URL: #11234
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Nicu Micleușanu <micnic90@gmail.com>
@jasnell

This comment has been minimized.

Copy link
Member

commented Feb 13, 2017

Landed in ff785fd...8bcc122

@jasnell jasnell closed this Feb 13, 2017

@mscdex mscdex deleted the mscdex:querystring-fix-empty-pairs-handling branch Feb 14, 2017

@TimothyGu TimothyGu referenced this pull request Feb 14, 2017
11 of 11 tasks complete
@joyeecheung joyeecheung referenced this pull request Feb 14, 2017
3 of 3 tasks complete
TimothyGu added a commit that referenced this pull request Feb 17, 2017
test: turn on WPT tests on empty param pairs
This specific bug was fixed by #11234. This commit turns on the tests
accordingly.

PR-URL: #11369
Ref: #11234
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
Reviewed-By: Colin Ihrig <cjihrig@gmail.com>
@stevenvachon

This comment has been minimized.

Copy link

commented Feb 28, 2017

This has a "dont-land-on-v7.x" label, yet it appears to be working in v7.6

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2017

@stevenvachon This didn't land in v7.x. Can you explain what you mean by "working?" For example, querystring.parse('a&&b') still returns { a: '', '': '', b: '' } in v7.6.0, however master returns { a: '', b: '' } (due to this PR being merged in master).

I should also point out that querystring.parse('a&b&') has always resulted in { a: '', b: '' }. This PR was for fixing the case when an empty pair does not occur at the end of the string.

@stevenvachon

This comment has been minimized.

Copy link

commented Feb 28, 2017

This issue was referenced in #11171 as the solution. My comment there, containing a test case that was failing on 7.5.x is now passing on 7.6

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2017

@stevenvachon With the test case you're referencing I get the same result on node v0.10.41, v4.0.0, v6.5.0, v7.5.0, v7.6.0, and master:

> querystring.parse('a=&a=value&a=')
{ a: [ '', 'value', '' ] }
@stevenvachon

This comment has been minimized.

Copy link

commented Feb 28, 2017

Shit.. I dunno what's going on then. My original issue is not reproducible with 7.5, which was available days before posting.

@mscdex

This comment has been minimized.

Copy link
Contributor Author

commented Feb 28, 2017

@stevenvachon Ah ok, so that was originally using WHATWG URL, which does its parsing in a specific way and its implementation has changed often since node v7.0.0. FWIW though I tried node v7.4.0, v7.5.0, and v7.6.0 and got the same expected result you showed in your original issue, namely:

> const url = new (require("url").URL)("http://www.domain.com:123/dir/file.html?var=&var=value&var=#hash");
undefined
> console.log( Array.from(url.searchParams) );
[ [ 'var', '' ], [ 'var', 'value' ], [ 'var', '' ] ]
undefined
@stevenvachon

This comment has been minimized.

Copy link

commented Feb 28, 2017

Thank you. Sorry for wasting space in here. I'll have to update a number of test suites as I'd built around this supposed issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
7 participants
You can’t perform that action at this time.