Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querystring, url: repeated '&' in a paramsString will be skipped #10967

Closed

Conversation

watilde
Copy link
Member

@watilde watilde commented Jan 23, 2017

Updates:

  • skip the parsing process if a pair of a key and a value is empty
  • add test cases for querystring and URLSearchParams

Fixes: #10454

/cc @nodejs/url

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

querystring, url, url-whatwg, test

@nodejs-github-bot nodejs-github-bot added querystring Issues and PRs related to the built-in querystring module. dont-land-on-v7.x labels Jan 23, 2017
@watilde watilde force-pushed the feature/fixes-querystring-parse branch from 7d22fc5 to 0701371 Compare January 23, 2017 21:13
@watilde watilde changed the title url: repeated '&' in a paramsString will be skipped querystring, url: repeated '&' in a paramsString will be skipped Jan 23, 2017
@TimothyGu
Copy link
Member

I believe the consensus was to freeze the querystring module, and put all new work onto URLSearchParams.

@jasnell, can you weigh in on this?

@mscdex
Copy link
Contributor

mscdex commented Jan 23, 2017

@TimothyGu I don't recall that. Perhaps you're thinking about the url module instead?

@TimothyGu
Copy link
Member

Regardless of any previous planning, I am not sure this PR (or any PRs that change the behavior of this module) is the way to go.

  1. A change in the behavior of this module is almost definitively semver-major. It may also cause ripple effects in the entire ecosystem, though I am unsure how much of an effect this might bring.
  2. querystring module has applications more than simply parsing and serializing URL query strings. The fact that sep, eq, and decodeURIComponent are modifiable means that other usage, including those that depend on the current behavior, is possible. I cannot find it right now, but I have even read a Node.js "tip" somewhere that encourages using this module for performant generic serialized format.
  3. Even if the goal to make querystring WHATWG-compliant is accepted, there are other issues whose fixes may or must be semver-major as well.
    1. Serialization of spaces as + rather than %20, left paren as ) rather than %27, etc.: url: Improve WHATWG URLSearchParams spec compliance #9484 (comment).
    2. This module returns an object/hashmap, which is fundamentally incompatible with the list format WHATWG URL Standard mandates. For example, qs.stringify(qs.parse('a=1&b=2&a=3')) returns 'a=1&a=3&b=2' instead of 'a=1&b=2&a=3'.
  4. Proposal to eventually move URLSearchParams parsing to native code.

@jasnell
Copy link
Member

jasnell commented Jan 24, 2017

I wouldn't describe it as consensus. In my opinion significant changes to the the existing url module should be avoided, particularly semver-major or semver-minor. If it's a bug fix then I'm good with it. I'd rather keep things pretty steady state but this change makes sense. It's also worth noting that the URLSearchParams implementation currently still makes use of the querystring module.

@joyeecheung
Copy link
Member

joyeecheung commented Jan 24, 2017

I think this looks more like a bugfix, albeit a breaking one...I doubt anyone would rely on the behavior of parsing && and getting an empty-ish object instead of treating it as a bug (although it's harmless for common usage AFAICT) ?

In fact you can't even reserialize to the original string with this bug. No error would be thrown if you only use the Node.js built-in module, but it could be a problem when it ends up in the hand of other modules in other languages.

> qs.parse('a=1&&b=1')
{ a: '1', '': '', b: '1' }
> qs.stringify(qs.parse('a=1&&b=1'))
'a=1&=&b=1'

FWIW, in Python:

>>> parse_qs('a=1&&b=2')
{'a': ['1'], 'b': ['2']}

PHP:

php > $foo = array();
php > parse_str('a=1&&b=2', $foo);
php > var_dump($foo);
array(2) {
  ["a"]=>
  string(1) "1"
  ["b"]=>
  string(1) "2"
}

Ruby:

2.3.1 :003 > CGI.parse("a=1&&b=2")
 => {"a"=>["1"], "b"=>["2"]}

@joyeecheung
Copy link
Member

Regarding @TimothyGu 's concerns I think 1 might not be that significant seeing what other languages behaves. 2 would be more possible, but I can't think of a reason for relying on an empty key (if it's just for custom searialization), though some people could rely on an empty value, maybe changing || to && on L281 would be better?
As for 3&4, I think it doesn't hurt to fix the querystring module, whether URLSearchParams is gonna get reimplemented or not. It even makes more sense to land a fix in querystring if URLSearchParams is gonna be fixed separately.

@joyeecheung joyeecheung added url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation. labels Jan 24, 2017
@watilde
Copy link
Member Author

watilde commented Jan 24, 2017

I've tested the above case @joyeecheung mentioned, and it seems to work fine in this branch

$ ./out/Release/node
> var qs = require('querystring')
undefined
> qs.parse('a=1&&b=1')
{ a: '1', b: '1' }
> qs.stringify(qs.parse('a=1&&b=1'))
'a=1&b=1'

About changing || to &&, we have cases that the key or the value can be empty like the following. I tried updating it once, but some of the test cases threw some errors.

> qs.parse('=a&b')
{ '': 'a', b: '' }

else
obj[key] = [curValue, value];

if (key.length > 0 || value.length > 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just key.length > 0? Empty values are not that uncommon (e.g. qs.parse('b=&x=1'))

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just took a look at the spec and this bit of code, I think the current way is the best :D. The only one that needs to be ignored is &&, otherwise it doesn't match what the spec needs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for following up the code and the spec hardly!

@joyeecheung
Copy link
Member

@jasnell
Copy link
Member

jasnell commented Jan 24, 2017

I agree that it's a bugfix. Parsing && as meaning an empty key makes exceedingly little sense. I'm definitely +1 on getting this fixed.

Copy link
Member

@jasnell jasnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there

@@ -51,6 +51,8 @@ const qsTestCases = [
__defineGetter__: 'baz' }],
// See: https://github.com/joyent/node/issues/3058
['foo&bar=baz', 'foo=&bar=baz', { foo: '', bar: 'baz' }],
['&&foo=bar&&', 'foo=bar', { foo: 'bar' }],
['&&&&', '', {}],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be helpful to also test the following cases:

`a=b&c&d=e`
`a=b&c=&d=e`
`a=b&=c&d=e`
`a=b&=&c=d`

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right! Adding the cases that the key or the value is empty sounds useful for revealing the current behaviour. I've added them :)

@jasnell
Copy link
Member

jasnell commented Jan 24, 2017

New CI since the test was updated: https://ci.nodejs.org/job/node-test-pull-request/6032/

@watilde
Copy link
Member Author

watilde commented Jan 25, 2017

Some of the nice performance improvements in querystring were added! Resolved conflicts.

@watilde
Copy link
Member Author

watilde commented Jan 25, 2017

['a=b&c&d=e', 'a=b&c=&d=e', { a: 'b', c: '', d: 'e' }],
['a=b&c=&d=e', 'a=b&c=&d=e', { a: 'b', c: '', d: 'e' }],
['a=b&=c&d=e', 'a=b&=c&d=e', { a: 'b', '': 'c', d: 'e' }],
['a=b&=&c=d', 'a=b&c=d', { a: 'b', c: 'd' }],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be parsed as { '': '', a: 'b', c: 'd' }. Try new URLSearchParams('&=&').get('').

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow I didn't know that, thanks! Then I will try to support the case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've updated! The logic was changed to simple one :)

@watilde watilde force-pushed the feature/fixes-querystring-parse branch from 666b945 to f81201a Compare February 1, 2017 20:39
@watilde
Copy link
Member Author

watilde commented Feb 1, 2017

I made improvements in performance again, and now it seems to be better than my last update.

@mscdex
Copy link
Contributor

mscdex commented Feb 1, 2017

@watilde That benchmark output looks truncated (the benchmark/compare.js script by default performs more runs than that). Having the output piped to RScript benchmark/compare.R will yield more useful information as @joyeecheung pointed out.

@watilde
Copy link
Member Author

watilde commented Feb 2, 2017

@mscdex That's right indeed! Then I just tried it, and here is the result:

@joyeecheung
Copy link
Member

@joaocgreis
Copy link
Member

Aborted CI, only arm-fanned was pending because of nodejs/build#611

@jasnell
Copy link
Member

jasnell commented Feb 2, 2017

jasnell pushed a commit that referenced this pull request Feb 2, 2017
* update state machine in parse
  * repeated sep should be adjusted
  * `&=&=` should be `{ '': [ '', '' ] }`
* add test cases for querystring and URLSearchParams

Fixes: #10454
PR-URL: #10967
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
@jasnell
Copy link
Member

jasnell commented Feb 2, 2017

Landed in 4e259b2

@jasnell jasnell closed this Feb 2, 2017
@TimothyGu
Copy link
Member

Should this be marked as semver-major?

@mscdex
Copy link
Contributor

mscdex commented Feb 2, 2017

I was hoping to look at this this weekend before this landed...

@mscdex mscdex added the semver-major PRs that contain breaking changes and should be released in the next major version. label Feb 2, 2017
@TimothyGu TimothyGu moved this from Existing spec to Done in WHATWG URL implementation Feb 2, 2017
@jasnell
Copy link
Member

jasnell commented Feb 2, 2017

oy.. my apologies @mscdex... if you indicated that in the thread and I missed it, I do apologize.

@jasnell
Copy link
Member

jasnell commented Feb 2, 2017

if this ends up not being acceptable to you @mscdex, then we should be able to do a quick revert

@watilde watilde deleted the feature/fixes-querystring-parse branch February 2, 2017 21:21
@watilde watilde restored the feature/fixes-querystring-parse branch February 2, 2017 21:23
@watilde
Copy link
Member Author

watilde commented Feb 2, 2017

Isn't this patch update since it's just bug-fix?

@mscdex
Copy link
Contributor

mscdex commented Feb 2, 2017

@watilde It might be a bug fix for the whatwg url module, but it's arguably not for the non-whatwg url module since there is no real spec that it adheres to. If this change were made to whatwg url's own querystring implementation that would be different and semver-major could just be applied to the non-whatwg url changes.

@jasnell #10967 (comment) I probably won't get time until this weekend to review this, so I don't think a 'quick revert' will happen unless else everyone agrees.

@joyeecheung
Copy link
Member

I think we can wail until @mscdex 's review to decide then. Personally I think it's fine to accept this if it doesn't introduce any new behavior other than fixing #10454 because the old behavior looks buggy to me (see #10967 (comment)).

@mscdex
Copy link
Contributor

mscdex commented Feb 2, 2017

@joyeecheung Calling it a 'bug fix' is debatable. One could also argue that stringify() should not include include key/value pairs that are both empty. I'd personally rather be on the safe side and land it in v8.0.0.

@joyeecheung
Copy link
Member

@mscdex Sorry for being vague, I am fine with marking this semver-major.

krydos pushed a commit to krydos/node that referenced this pull request Feb 25, 2017
* update state machine in parse
  * repeated sep should be adjusted
  * `&=&=` should be `{ '': [ '', '' ] }`
* add test cases for querystring and URLSearchParams

Fixes: nodejs#10454
PR-URL: nodejs#10967
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
@jasnell jasnell mentioned this pull request Apr 4, 2017
@watilde watilde deleted the feature/fixes-querystring-parse branch April 11, 2017 18:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
querystring Issues and PRs related to the built-in querystring module. semver-major PRs that contain breaking changes and should be released in the next major version. url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

8 participants