-
-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't force query string normalization #1234
Comments
Unfortunately we cannot do anything about this :( |
Sorry, I hadn't seen the previous issues, but i think that something can be done. #1180 and #1202 described that behavior as a bug, i don't think of it as bug, NodeJS decided to go with the WHATWG URL specs, and it follows it. What can be done is giving the user a choice about the standard (or whether the normalization should take place), maybe with a flag into the options.
Another thing that can be done is to only reimplement the The last thing that came to my mind is to remove this code (core/index.ts:713)
That would be a little hack to the Javascript url parser. It seems that when a url is parsed ( Actually the thing is to have a little discussion about what has to be done: leaving things the way they are blaming the NodeJS specs, or giving the user tools to decide by himself. All that said, I'm willing to help with the implementation of any solution. |
Unfortunately the WHATWG URL is used in the Node.js |
Ok, but what about leaving the query string as it is when it's inside the URL as the
|
That's the behavior we want to avoid. The proper "fix" would be to pass an object instead of a URL instance to the |
So with
Because Event if it respects the HTTP/1.1 (RFC 2616) and URI (RFC 3986) RFCs? RFC 3986 section 2.2 RFC 2616 section 6.2.2 describes an HTTP URL as |
The spec is buggy. If you want to get this fixed faster, feel free to send a temporary fix to Node.js. On Got side we just enforce normalization. |
My previous example was a little misleading. But that's wrong. With this last example i think this is going to be a "bug report" instead of a "feature request" |
According to this RFC it's still valid because the value is empty:
|
The RFC states that query components are often (not always) in the form of "key=value". |
That's right. But |
The problem is that I can't find a single standard that states that the query string must be interpreted as key-value and be in the form of A lot of servers parse it like that (and also offer access to the raw string) but to me it seems not to be mandatory. Event the WHATWG URL spec does not states that the query string must be |
I found two replies on stackoverflow and stackexchange that pretty much sum up what I've said about the "query component" in an URI. https://stackoverflow.com/questions/39266970/what-is-the-difference-between-url-parameters-and-query-strings#39294675 |
|
The spec is fine, but Got tries to enforce
You are trying to normalize things that should not be normalized (because there's no standard). |
It's not. Even one of their maintainers says it's broken: https://twitter.com/domenic/status/1257377082704900096
@sindresorhus One of the workarounds would be to consider the query in the input and the |
What i really meant with
was that that's not the problem. I know this "Feature request" started as "I've a problem with ~, damn WHATWG spec", but after this comment #1234 (comment) , it became a "Bug report" on the incorrect handling of the query string. I wanted to discuss exhaustively about this issue because, as you said, it would be a breaking change. In particular I think it is going to break the merging of the URL params. |
Any news about this issue? Should this issue be reopened? |
decodeURI(url)
before sending a request
May I ask how I think the only correct solution is to leave the URL intact (if given as a string). BTW, this issue isn't breaking some random small HTTP server, it's breaking compatibility with things like Akamai's "Adaptive Media Delivery" |
decodeURI(url)
before sending a requestdecodeURIComponent(url)
before sending a request
Sorry, I meant |
My doubts remain the same as before. This time you have First example: Second example: |
Invalid URL. The
Will be fixed. |
It's not invalid, your statement would be true if the format applied to the query string was What about a custom format that don't use percent escapes? You can argue that
and not
|
The WHATWG URL standard (query string) specifies that only these characters may be unescaped: https://url.spec.whatwg.org/#url-code-points
|
decodeURIComponent(url)
before sending a requestdecodeURIComponent(query)
before sending a request
I don't think that statement is true. (they also make an example against it) There's why:
Sorry, I missed it... (it's only written in the implementation of the parser) |
Wrong example. They're talking about query string, not search params.
No problem, I just figured out that too. Was wondering why |
My bad, I phrased it wrong. |
But there are inconsistencies in the normalization... |
I'm sorry, I can't understand what you are referring to, can you explain |
This occurs with basic auth as well: const url = new URL('http://host');
url.username = '=user';
url.password = '=pass';
got(new URL(url));
// sends "%3Duser" and "%3Dpass" to `http.request` |
You can pass a perfectly valid |
@stevenvachon You need to compute the |
The URL Class of WHATWG that states that In any case if you are using If you're using |
decodeURIComponent(query)
before sending a request
What problem are you trying to solve?
In an url like
http://example.org/random?param=SOMETHING~SOMETHING
the special character~
is percent-encoded before the request, resulting inhttp://example.org/random?param=SOMETHING%7ESOMETHING
which is not supported (decoded) by some HTTP servers.As described by RFC 3986 in section 2.3
"
URI comparison implementations do not always perform normalization prior to comparison. For consistency, percent-encoded octets in the ranges of ALPHA (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E), underscore (%5F), or tilde (%7E) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.
"
Also in RFC 3986, section 6.2.2.2
"
The percent-encoding mechanism is a frequent source of variance among otherwise identical URIs. In addition to the case normalization issue noted above, some URI producers percent-encode octets that do not require percent-encoding, resulting in URIs that are equivalent to their non-encoded counterparts. These URIs should be normalized by decoding any percent-encoded octet that corresponds to an unreserved character, as described in Section 2.3.
"
The percent-encoding of
~
by got happen because NodeJS follows the "WHATWG URL API" (https://nodejs.org/api/url.html#url_the_whatwg_url_api) which misses~
from the unreserved characters (https://url.spec.whatwg.org/#interface-urlsearchparams, the Note below the example, and https://url.spec.whatwg.org/#urlencoded-serializing) and, by the way, includes*
.Describe the feature
My proposal is to add a flag to the options to prevent the normalization by skipping the append and delete of "_GOT_INTERNAL_TRIGGER_NORMALIZATION"
Checklist
The text was updated successfully, but these errors were encountered: