New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

querystring, url: repeated '&' in a paramsString will be skipped #10967

Closed
wants to merge 1 commit into
base: master
from

Conversation

Projects
8 participants
@watilde
Member

watilde commented Jan 23, 2017

Updates:

  • skip the parsing process if a pair of a key and a value is empty
  • add test cases for querystring and URLSearchParams

Fixes: #10454

/cc @nodejs/url

Checklist
  • make -j4 test (UNIX), or vcbuild test (Windows) passes
  • tests and/or benchmarks are included
  • commit message follows commit guidelines
Affected core subsystem(s)

querystring, url, url-whatwg, test

@watilde watilde changed the title from url: repeated '&' in a paramsString will be skipped to querystring, url: repeated '&' in a paramsString will be skipped Jan 23, 2017

@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Jan 23, 2017

Member

I believe the consensus was to freeze the querystring module, and put all new work onto URLSearchParams.

@jasnell, can you weigh in on this?

Member

TimothyGu commented Jan 23, 2017

I believe the consensus was to freeze the querystring module, and put all new work onto URLSearchParams.

@jasnell, can you weigh in on this?

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Jan 23, 2017

Contributor

@TimothyGu I don't recall that. Perhaps you're thinking about the url module instead?

Contributor

mscdex commented Jan 23, 2017

@TimothyGu I don't recall that. Perhaps you're thinking about the url module instead?

@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Jan 24, 2017

Member

Regardless of any previous planning, I am not sure this PR (or any PRs that change the behavior of this module) is the way to go.

  1. A change in the behavior of this module is almost definitively semver-major. It may also cause ripple effects in the entire ecosystem, though I am unsure how much of an effect this might bring.
  2. querystring module has applications more than simply parsing and serializing URL query strings. The fact that sep, eq, and decodeURIComponent are modifiable means that other usage, including those that depend on the current behavior, is possible. I cannot find it right now, but I have even read a Node.js "tip" somewhere that encourages using this module for performant generic serialized format.
  3. Even if the goal to make querystring WHATWG-compliant is accepted, there are other issues whose fixes may or must be semver-major as well.
    1. Serialization of spaces as + rather than %20, left paren as ) rather than %27, etc.: #9484 (comment).
    2. This module returns an object/hashmap, which is fundamentally incompatible with the list format WHATWG URL Standard mandates. For example, qs.stringify(qs.parse('a=1&b=2&a=3')) returns 'a=1&a=3&b=2' instead of 'a=1&b=2&a=3'.
  4. Proposal to eventually move URLSearchParams parsing to native code.
Member

TimothyGu commented Jan 24, 2017

Regardless of any previous planning, I am not sure this PR (or any PRs that change the behavior of this module) is the way to go.

  1. A change in the behavior of this module is almost definitively semver-major. It may also cause ripple effects in the entire ecosystem, though I am unsure how much of an effect this might bring.
  2. querystring module has applications more than simply parsing and serializing URL query strings. The fact that sep, eq, and decodeURIComponent are modifiable means that other usage, including those that depend on the current behavior, is possible. I cannot find it right now, but I have even read a Node.js "tip" somewhere that encourages using this module for performant generic serialized format.
  3. Even if the goal to make querystring WHATWG-compliant is accepted, there are other issues whose fixes may or must be semver-major as well.
    1. Serialization of spaces as + rather than %20, left paren as ) rather than %27, etc.: #9484 (comment).
    2. This module returns an object/hashmap, which is fundamentally incompatible with the list format WHATWG URL Standard mandates. For example, qs.stringify(qs.parse('a=1&b=2&a=3')) returns 'a=1&a=3&b=2' instead of 'a=1&b=2&a=3'.
  4. Proposal to eventually move URLSearchParams parsing to native code.
@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Jan 24, 2017

Member

I wouldn't describe it as consensus. In my opinion significant changes to the the existing url module should be avoided, particularly semver-major or semver-minor. If it's a bug fix then I'm good with it. I'd rather keep things pretty steady state but this change makes sense. It's also worth noting that the URLSearchParams implementation currently still makes use of the querystring module.

Member

jasnell commented Jan 24, 2017

I wouldn't describe it as consensus. In my opinion significant changes to the the existing url module should be avoided, particularly semver-major or semver-minor. If it's a bug fix then I'm good with it. I'd rather keep things pretty steady state but this change makes sense. It's also worth noting that the URLSearchParams implementation currently still makes use of the querystring module.

@joyeecheung

This comment has been minimized.

Show comment
Hide comment
@joyeecheung

joyeecheung Jan 24, 2017

Member

I think this looks more like a bugfix, albeit a breaking one...I doubt anyone would rely on the behavior of parsing && and getting an empty-ish object instead of treating it as a bug (although it's harmless for common usage AFAICT) ?

In fact you can't even reserialize to the original string with this bug. No error would be thrown if you only use the Node.js built-in module, but it could be a problem when it ends up in the hand of other modules in other languages.

> qs.parse('a=1&&b=1')
{ a: '1', '': '', b: '1' }
> qs.stringify(qs.parse('a=1&&b=1'))
'a=1&=&b=1'

FWIW, in Python:

>>> parse_qs('a=1&&b=2')
{'a': ['1'], 'b': ['2']}

PHP:

php > $foo = array();
php > parse_str('a=1&&b=2', $foo);
php > var_dump($foo);
array(2) {
  ["a"]=>
  string(1) "1"
  ["b"]=>
  string(1) "2"
}

Ruby:

2.3.1 :003 > CGI.parse("a=1&&b=2")
 => {"a"=>["1"], "b"=>["2"]}
Member

joyeecheung commented Jan 24, 2017

I think this looks more like a bugfix, albeit a breaking one...I doubt anyone would rely on the behavior of parsing && and getting an empty-ish object instead of treating it as a bug (although it's harmless for common usage AFAICT) ?

In fact you can't even reserialize to the original string with this bug. No error would be thrown if you only use the Node.js built-in module, but it could be a problem when it ends up in the hand of other modules in other languages.

> qs.parse('a=1&&b=1')
{ a: '1', '': '', b: '1' }
> qs.stringify(qs.parse('a=1&&b=1'))
'a=1&=&b=1'

FWIW, in Python:

>>> parse_qs('a=1&&b=2')
{'a': ['1'], 'b': ['2']}

PHP:

php > $foo = array();
php > parse_str('a=1&&b=2', $foo);
php > var_dump($foo);
array(2) {
  ["a"]=>
  string(1) "1"
  ["b"]=>
  string(1) "2"
}

Ruby:

2.3.1 :003 > CGI.parse("a=1&&b=2")
 => {"a"=>["1"], "b"=>["2"]}
@joyeecheung

This comment has been minimized.

Show comment
Hide comment
@joyeecheung

joyeecheung Jan 24, 2017

Member

Regarding @TimothyGu 's concerns I think 1 might not be that significant seeing what other languages behaves. 2 would be more possible, but I can't think of a reason for relying on an empty key (if it's just for custom searialization), though some people could rely on an empty value, maybe changing || to && on L281 would be better?
As for 3&4, I think it doesn't hurt to fix the querystring module, whether URLSearchParams is gonna get reimplemented or not. It even makes more sense to land a fix in querystring if URLSearchParams is gonna be fixed separately.

Member

joyeecheung commented Jan 24, 2017

Regarding @TimothyGu 's concerns I think 1 might not be that significant seeing what other languages behaves. 2 would be more possible, but I can't think of a reason for relying on an empty key (if it's just for custom searialization), though some people could rely on an empty value, maybe changing || to && on L281 would be better?
As for 3&4, I think it doesn't hurt to fix the querystring module, whether URLSearchParams is gonna get reimplemented or not. It even makes more sense to land a fix in querystring if URLSearchParams is gonna be fixed separately.

@watilde

This comment has been minimized.

Show comment
Hide comment
@watilde

watilde Jan 24, 2017

Member

I've tested the above case @joyeecheung mentioned, and it seems to work fine in this branch

$ ./out/Release/node
> var qs = require('querystring')
undefined
> qs.parse('a=1&&b=1')
{ a: '1', b: '1' }
> qs.stringify(qs.parse('a=1&&b=1'))
'a=1&b=1'

About changing || to &&, we have cases that the key or the value can be empty like the following. I tried updating it once, but some of the test cases threw some errors.

> qs.parse('=a&b')
{ '': 'a', b: '' }
Member

watilde commented Jan 24, 2017

I've tested the above case @joyeecheung mentioned, and it seems to work fine in this branch

$ ./out/Release/node
> var qs = require('querystring')
undefined
> qs.parse('a=1&&b=1')
{ a: '1', b: '1' }
> qs.stringify(qs.parse('a=1&&b=1'))
'a=1&b=1'

About changing || to &&, we have cases that the key or the value can be empty like the following. I tried updating it once, but some of the test cases threw some errors.

> qs.parse('=a&b')
{ '': 'a', b: '' }
Show outdated Hide outdated lib/querystring.js
@joyeecheung

This comment has been minimized.

Show comment
Hide comment
@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Jan 24, 2017

Member

I agree that it's a bugfix. Parsing && as meaning an empty key makes exceedingly little sense. I'm definitely +1 on getting this fixed.

Member

jasnell commented Jan 24, 2017

I agree that it's a bugfix. Parsing && as meaning an empty key makes exceedingly little sense. I'm definitely +1 on getting this fixed.

@jasnell

Almost there

Show outdated Hide outdated test/parallel/test-querystring.js
@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell
Member

jasnell commented Jan 24, 2017

New CI since the test was updated: https://ci.nodejs.org/job/node-test-pull-request/6032/

@watilde

This comment has been minimized.

Show comment
Hide comment
@watilde

watilde Jan 25, 2017

Member

Some of the nice performance improvements in querystring were added! Resolved conflicts.

Member

watilde commented Jan 25, 2017

Some of the nice performance improvements in querystring were added! Resolved conflicts.

@watilde

This comment has been minimized.

Show comment
Hide comment
Show outdated Hide outdated test/parallel/test-querystring.js
@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Jan 26, 2017

Member

Can you post some benchmark results?

Member

TimothyGu commented Jan 26, 2017

Can you post some benchmark results?

@watilde

This comment has been minimized.

Show comment
Hide comment
@watilde

watilde Jan 26, 2017

Member

@TimothyGu Sure thing! Here are some benchmark results of querystring-parse that I ran five times in a row:

It seems that no slowdown was seen even we've added the line. I think it's because the case we're caring can be just an edge case.

Member

watilde commented Jan 26, 2017

@TimothyGu Sure thing! Here are some benchmark results of querystring-parse that I ran five times in a row:

It seems that no slowdown was seen even we've added the line. I think it's because the case we're caring can be just an edge case.

querystring, url: repeated sep in a paramsString should be adjusted
+ update state machine in parse
  + repeated sep should be adjusted
  + `&=&=` should be `{ '': [ '', '' ] }`
+ add test cases for querystring and URLSearchParams

Fixes: #10454
@watilde

This comment has been minimized.

Show comment
Hide comment
@watilde

watilde Feb 1, 2017

Member

I made improvements in performance again, and now it seems to be better than my last update.

Member

watilde commented Feb 1, 2017

I made improvements in performance again, and now it seems to be better than my last update.

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Feb 1, 2017

Contributor

@watilde That benchmark output looks truncated (the benchmark/compare.js script by default performs more runs than that). Having the output piped to RScript benchmark/compare.R will yield more useful information as @joyeecheung pointed out.

Contributor

mscdex commented Feb 1, 2017

@watilde That benchmark output looks truncated (the benchmark/compare.js script by default performs more runs than that). Having the output piped to RScript benchmark/compare.R will yield more useful information as @joyeecheung pointed out.

@watilde

This comment has been minimized.

Show comment
Hide comment
@watilde

watilde Feb 2, 2017

Member

@mscdex That's right indeed! Then I just tried it, and here is the result:

Member

watilde commented Feb 2, 2017

@mscdex That's right indeed! Then I just tried it, and here is the result:

@joyeecheung

This comment has been minimized.

Show comment
Hide comment
@joaocgreis

This comment has been minimized.

Show comment
Hide comment
@joaocgreis

joaocgreis Feb 2, 2017

Member

Aborted CI, only arm-fanned was pending because of nodejs/build#611

Member

joaocgreis commented Feb 2, 2017

Aborted CI, only arm-fanned was pending because of nodejs/build#611

@jasnell

This comment has been minimized.

Show comment
Hide comment
Member

jasnell commented Feb 2, 2017

jasnell added a commit that referenced this pull request Feb 2, 2017

querystring, url: handle repeated sep in search
* update state machine in parse
  * repeated sep should be adjusted
  * `&=&=` should be `{ '': [ '', '' ] }`
* add test cases for querystring and URLSearchParams

Fixes: #10454
PR-URL: #10967
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Timothy Gu <timothygu99@gmail.com>
@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Feb 2, 2017

Member

Landed in 4e259b2

Member

jasnell commented Feb 2, 2017

Landed in 4e259b2

@jasnell jasnell closed this Feb 2, 2017

@TimothyGu

This comment has been minimized.

Show comment
Hide comment
@TimothyGu

TimothyGu Feb 2, 2017

Member

Should this be marked as semver-major?

Member

TimothyGu commented Feb 2, 2017

Should this be marked as semver-major?

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Feb 2, 2017

Contributor

I was hoping to look at this this weekend before this landed...

Contributor

mscdex commented Feb 2, 2017

I was hoping to look at this this weekend before this landed...

@mscdex mscdex added the semver-major label Feb 2, 2017

@TimothyGu TimothyGu moved this from Existing spec to Done in WHATWG URL implementation Feb 2, 2017

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Feb 2, 2017

Member

oy.. my apologies @mscdex... if you indicated that in the thread and I missed it, I do apologize.

Member

jasnell commented Feb 2, 2017

oy.. my apologies @mscdex... if you indicated that in the thread and I missed it, I do apologize.

@jasnell

This comment has been minimized.

Show comment
Hide comment
@jasnell

jasnell Feb 2, 2017

Member

if this ends up not being acceptable to you @mscdex, then we should be able to do a quick revert

Member

jasnell commented Feb 2, 2017

if this ends up not being acceptable to you @mscdex, then we should be able to do a quick revert

@watilde watilde deleted the watilde:feature/fixes-querystring-parse branch Feb 2, 2017

@watilde watilde restored the watilde:feature/fixes-querystring-parse branch Feb 2, 2017

@watilde

This comment has been minimized.

Show comment
Hide comment
@watilde

watilde Feb 2, 2017

Member

Isn't this patch update since it's just bug-fix?

Member

watilde commented Feb 2, 2017

Isn't this patch update since it's just bug-fix?

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Feb 2, 2017

Contributor

@watilde It might be a bug fix for the whatwg url module, but it's arguably not for the non-whatwg url module since there is no real spec that it adheres to. If this change were made to whatwg url's own querystring implementation that would be different and semver-major could just be applied to the non-whatwg url changes.

@jasnell #10967 (comment) I probably won't get time until this weekend to review this, so I don't think a 'quick revert' will happen unless else everyone agrees.

Contributor

mscdex commented Feb 2, 2017

@watilde It might be a bug fix for the whatwg url module, but it's arguably not for the non-whatwg url module since there is no real spec that it adheres to. If this change were made to whatwg url's own querystring implementation that would be different and semver-major could just be applied to the non-whatwg url changes.

@jasnell #10967 (comment) I probably won't get time until this weekend to review this, so I don't think a 'quick revert' will happen unless else everyone agrees.

@joyeecheung

This comment has been minimized.

Show comment
Hide comment
@joyeecheung

joyeecheung Feb 2, 2017

Member

I think we can wail until @mscdex 's review to decide then. Personally I think it's fine to accept this if it doesn't introduce any new behavior other than fixing #10454 because the old behavior looks buggy to me (see #10967 (comment)).

Member

joyeecheung commented Feb 2, 2017

I think we can wail until @mscdex 's review to decide then. Personally I think it's fine to accept this if it doesn't introduce any new behavior other than fixing #10454 because the old behavior looks buggy to me (see #10967 (comment)).

@mscdex

This comment has been minimized.

Show comment
Hide comment
@mscdex

mscdex Feb 2, 2017

Contributor

@joyeecheung Calling it a 'bug fix' is debatable. One could also argue that stringify() should not include include key/value pairs that are both empty. I'd personally rather be on the safe side and land it in v8.0.0.

Contributor

mscdex commented Feb 2, 2017

@joyeecheung Calling it a 'bug fix' is debatable. One could also argue that stringify() should not include include key/value pairs that are both empty. I'd personally rather be on the safe side and land it in v8.0.0.

@joyeecheung

This comment has been minimized.

Show comment
Hide comment
@joyeecheung

joyeecheung Feb 2, 2017

Member

@mscdex Sorry for being vague, I am fine with marking this semver-major.

Member

joyeecheung commented Feb 2, 2017

@mscdex Sorry for being vague, I am fine with marking this semver-major.

KryDos added a commit to KryDos/node that referenced this pull request Feb 25, 2017

querystring, url: handle repeated sep in search
* update state machine in parse
  * repeated sep should be adjusted
  * `&=&=` should be `{ '': [ '', '' ] }`
* add test cases for querystring and URLSearchParams

Fixes: nodejs#10454
PR-URL: nodejs#10967
Reviewed-By: James M Snell <jasnell@gmail.com>
Reviewed-By: Timothy Gu <timothygu99@gmail.com>

@jasnell jasnell referenced this pull request Apr 4, 2017

Closed

8.0.0 Release Proposal #12220

@watilde watilde deleted the watilde:feature/fixes-querystring-parse branch Apr 11, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment