Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unreserved characters are escaped when making a HTTP request #33037

Closed
szmarczak opened this issue Apr 24, 2020 · 21 comments
Closed

Unreserved characters are escaped when making a HTTP request #33037

szmarczak opened this issue Apr 24, 2020 · 21 comments
Labels
known limitation Issues that are identified as known limitations. url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.

Comments

@szmarczak
Copy link
Member

  • Version: >=10.20.1
  • Platform: all
  • Subsystem: http

What steps will reproduce the bug?

const http = require('http');

const url = new URL('http://httpbin.org/anything?a=~');
url.search = url.searchParams.toString(); // trigger normalization

const request = http.get(url, response => {
    const chunks = [];

    response.on('data', chunk => {
        chunks.push(chunk);
    });

    response.once('end', () => {
        console.log(request._header);
        // Actual:
        // GET /anything?a=%7E
        //
        // Expected:
        // GET /anything?a=~
    });
})

How often does it reproduce? Is there a required condition?

Always.

What is the expected behavior?

GET /anything?a=~

What do you see instead?

GET /anything?a=%7E

Additional information

According to tools.ietf.org/html/rfc3986#section-2.3 tilde is unreserved and should not encoded by URI producers.

sindresorhus/got#1180 (comment)

@himself65 himself65 added url Issues and PRs related to the legacy built-in url module. confirmed-bug Issues with confirmed bugs. http Issues or PRs related to the http subsystem. and removed url Issues and PRs related to the legacy built-in url module. labels Apr 24, 2020
@bnoordhuis
Copy link
Member

The url.searchParams.toString() call is what changes ~ to %7E. I'm not 100% sure but it looks according to spec to me. Firefox behaves the same way, FWIW.

The spec says the application/x-www-form-urlencoded serializer should be applied:

The stringification behavior must return the serialization of the URLSearchParams object’s list.

@himself65 I see you tagged this with confirmed-bug. Can you elaborate?

@himself65 himself65 removed the confirmed-bug Issues with confirmed bugs. label Apr 24, 2020
@himself65
Copy link
Member

oh, sry for I didn't look up the spec🤦‍.

and just a little confused me because
image

@Hamper
Copy link

Hamper commented Apr 24, 2020

Isn't that bug in URLSearchParams? Because in spec https://tools.ietf.org/html/rfc3986#section-2.3 we see that ~ is unreserved character and

   For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.

Also in https://url.spec.whatwg.org/ we see that percent encoding must be used only for code points greater than U+007E (~) but not equal this symbol

@bnoordhuis
Copy link
Member

@Hamper https://url.spec.whatwg.org/#concept-urlencoded-byte-serializer is the mapping for individual bytes. ~ must be percent-encoded according to that.

@jasnell
Copy link
Member

jasnell commented Apr 24, 2020

The URL parser uses the WHATWG URL Standard as the normative reference and not rfc3986 and the two definitely differ with regards to some of the escaping rules.

That said, I do think this is a bug... not necessarily in our impl but in the URL Standard spec itself. Definitely warrants more investigation.

Specifically, look at section https://url.spec.whatwg.org/#url-parsing and search for "query state" ...

image

@jasnell jasnell reopened this Apr 24, 2020
@jasnell
Copy link
Member

jasnell commented Apr 24, 2020

@bnoordhuis ... the section you reference deals specifically with application/x-www-form-urlencoded serialization, which is different from both URL and URLSearchParams. While the URL Standard does provide a specification for application/x-www-form-urlencoded parsing and serialization, it does not require or even recommend it's use for URL parsing and handling of URLSearchParams. They are separate things. The issue opened here is in regards to URL parsing and not application/x-www-form-urlencoded

@bnoordhuis
Copy link
Member

@jasnell I think you overlooked the line that says url.search = url.searchParams.toString() - it's the .toString() that encodes to application/x-www-form-urlencoded and that's per spec.

@jasnell
Copy link
Member

jasnell commented Apr 24, 2020

Ah, yep, there it is. As I said, I don't think this is a bug in our implementation but I do think it's a bug in the spec itself because given the rules defined elsewhere in the doc ~ is handled differently. Let's keep this issue open and get clarification from the URL Standard folks on what the correct behavior should be.

/cc @annevk ... when you get a moment, could you help us resolve this issue? :-)

@Hamper
Copy link

Hamper commented Apr 24, 2020

@bnoordhuis but if you do

const url = new URL('http://httpbin.org/anything?a=~');
url.searchParams.sort()
url.toString()

we can see same bug

@jasnell
Copy link
Member

jasnell commented Apr 24, 2020

Yep... there's definitely an inconsistency there...

C:\Users\jasne\Projects\tmp>node
Welcome to Node.js v14.0.0.
Type ".help" for more information.
> const u = new URL('http://example.com/?a=~')
undefined
> u.toString()
'http://example.com/?a=~'
> u.searchParams.sort()
undefined
> u.toString()
'http://example.com/?a=%7E'
>

@jasnell jasnell added url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation. and removed http Issues or PRs related to the http subsystem. labels Apr 24, 2020
@annevk
Copy link

annevk commented Apr 24, 2020

See whatwg/url#18. URLSearchParams is aligned with forms, .search with URLs. I personally think the discrepancy is okay, but I can see how it's confusing. More importantly, I'm not really sure how to resolve it without a lot of likely breaking changes.

@jasnell
Copy link
Member

jasnell commented Apr 24, 2020

Hmm... even with .search, however, mutations that occur via u.searchParams (such as sort) will cause .search to use the escaped version. I can definitely appreciate maintaining the difference between URLSearchParams and .search but I'm wondering if it would be at least possible to make it such that when URLSearchParams is used to mutate a URL, the URL rules are used rather than the form-encoding rules?

@annevk
Copy link

annevk commented Apr 24, 2020

I guess that's something we hadn't considered yet. I'm not sure what the complexity of that would be offhand, but if someone was willing to make that a bit more detailed (probably in a fresh issue on whatwg/url) I'd be happy to help find out if there's interest in aligning on something like that.

@jasnell
Copy link
Member

jasnell commented Apr 24, 2020

Issue opened. Will keep this issue open until that is resolved. Thank you for the clarifications @annevk :-)

@szmarczak
Copy link
Member Author

Does the WHATWG URL spec override the URL normalization in the HTTP spec?

@jasnell
Copy link
Member

jasnell commented Apr 24, 2020

@szmarczak ... that's a total grey area lol. All of the various HTTP specifications normatively reference the IETF URL specification in some way, and most HTTP server and middle box implementations follow suit. The WHATWG URL spec, however, is what browsers / useragents generally follow and the WHATWG spec has largely emerged as the de facto standard. There's always been a bit of contention on the areas where they differ.

@bnoordhuis
Copy link
Member

Keep will this issue open until that is resolved.

I think this issue should be closed until the WHATWG issue reaches a conclusion, if indeed it ever does. Right now this issue is completely inactionable.

@szmarczak
Copy link
Member Author

I think this issue should be closed

I strongly disagree. It's easy to forget the issue that way and we don't want that. A more appropriate thing to do would be rather to mark this issue as blocked.

@bnoordhuis
Copy link
Member

There are almost a thousand open issues. No one is going to look at this one once it drops off the first page or two, label or no label.

@jasnell jasnell added the known limitation Issues that are identified as known limitations. label Apr 28, 2020
@jasnell
Copy link
Member

jasnell commented Apr 28, 2020

Quick update on whatwg/url#478 ... I have a general proposed fix on the table (to require that URLQueryParam stringification follow URL rules when .searchParams is used to mutate the URL). The WHATWG folks are working that through their process to determine how other implementers may feel about the change. If it looks like the change will have a good chance of making it through, I will open a PR there with a draft spec change. Once that is accepted, we'll need a PR here to fix and close this issue.

Specifically, the proposed change is internal only, and would only impact using the .searchParams object associated with the URL to mutate the URL.. that is...

This would follow URL encoding rules:

u.searchParams.sort()

But this would not, however...

u.search = u.searchParams.toString()

Assuming the change proposal does not go through at the WHATWG, then this will need to be closed without changing behavior as our priority is on following the spec and we do not want to unilaterally diverge from that. It would, however, be good to add documentation to url.md.

@bnoordhuis

...Right now this issue is completely inactionable

Except that it is not. See above.

jasnell added a commit to jasnell/node that referenced this issue May 4, 2020
The WHATWG URL spec is not going to change this behavior so
let's document it

Signed-off-by: James M Snell <jasnell@gmail.com>

Fixes: nodejs#33037
@jasnell
Copy link
Member

jasnell commented May 4, 2020

Alrighty, well, it's looking like changes to the WHATWG spec aren't going to happen, which is fine. Added a PR that documents the discrepancy.

@jasnell jasnell closed this as completed May 6, 2020
jasnell added a commit that referenced this issue May 6, 2020
The WHATWG URL spec is not going to change this behavior so
let's document it

Signed-off-by: James M Snell <jasnell@gmail.com>

Fixes: #33037

PR-URL: #33236
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
codebytere pushed a commit that referenced this issue May 7, 2020
The WHATWG URL spec is not going to change this behavior so
let's document it

Signed-off-by: James M Snell <jasnell@gmail.com>

Fixes: #33037

PR-URL: #33236
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
codebytere pushed a commit that referenced this issue Jun 7, 2020
The WHATWG URL spec is not going to change this behavior so
let's document it

Signed-off-by: James M Snell <jasnell@gmail.com>

Fixes: #33037

PR-URL: #33236
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Rich Trott <rtrott@gmail.com>
Reviewed-By: Joyee Cheung <joyeec9h3@gmail.com>
Reviewed-By: Zeyu Yang <himself65@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
known limitation Issues that are identified as known limitations. url Issues and PRs related to the legacy built-in url module. whatwg-url Issues and PRs related to the WHATWG URL implementation.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants