-
-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question mark eaten from request URL if no params specified #2912
Comments
This is not a bug in URL parsing: quite the opposite. Requests does its best to normalise URLs where it is safe to do so. This is necessary to ensure that various stages of URL building work appropriately, and that we don't accidentally end up with invalid headers (e.g. multiple question marks or none at all when parameters are present). A trailing question mark with no parameters is superfluous (and arguably somewhat invalid), so requests kindly strips it off for you. This is entirely intentional. For what it's worth, my opinion is that this API is poorly designed. It relies on the idea that no middle-box will rewrite that URL to remove the empty query part, which is a risk. Generally speaking, the better API would be to have a sub-resource or a proper query string (e.g. However, both browsers and curl will, if requested, send the empty query string. Having chatted to @bagder I don't think this is a deliberate decision on the part of those entities, so much as it just happens to fall out of the way they handle query strings. Given that we have a duty to be better than the Python standard library, that means I think we should come up with a way to make it possible, at least when using the PreparedRequest API. Sadly, I think the only way we can fix this is to change the library we use to parse URLs to one that can safely inform us of the difference between these two cases. @sigmavirus24, do you think your URL parsing library would do better here? I think this is a reasonable request, but we can't get to it before 3.0.0 because moving to a new URL parsing library will probably break a whole lot of stuff. |
I would appreciate clarification that would help me to understand the rules behind the URL parsers behaviour at large. Is that URL parsers interpret the following from RFC 1738, section 3.3, that the "?" (question mark) is not part of the query string?
Does also the grammar in RFC 3986, Appendix A., indicate the "?" is not part of the actual query string?
If so, for URL like
parsers rightly assume the query component as undefined, hence superfluous. Is my understanding correct? |
Your understanding is correct, at least as far as the standard library URL parser goes. That is not necessarily generally true of URL parsers, which is why I asked @sigmavirus24 if his URL parser has the same behaviour or not. In particular, it's not clear to me that an absent query part is semantically identical to an empty query part: Appendix A of RFC 3986 does appear to allow zero-length query strings. Again, I think this is strictly acceptable, but it's likely to be extremely fragile. |
@Lukasa I see. Yes, I'm concerned about the standard library URL parser, but also curious what is a common practice in other parsers. So, thanks for the clarifications so far. From your earlier response:
That is exactly what worries me, so I'm trying to understand what are the super fragile traits exactly and what are the ways the API may fails. |
Sure. So, my reason for thinking it's fragile is mostly because this is clearly a grey area of the URL specification. URLs and URL normalisation are fraught topics, and as a result it's unwise to build an API that assumes that two semantically equivalent URLs with different representations will be transmitted without potentially being translated between those representations. What's not clear to me is whether the zero-length query string is one of them. I think this is because it's simply in a grey area: some entities treat it as different to no query string at all, others do not. For that reason it's fragile, not wrong: while it may work in many cases, I think it's likely that it'd be a source of subtle bugs. |
@Lukasa I think we strip it but I can imagine a way to avoid that. |
@Lukasa Thanks very much. I sympathise with your critique. |
FYI, Out of curiosity, not to prove my point, I've probed a bit the landscape of HTTP client libraries. I've tried my target question-mark-powered API with popular clients for
Here is complete project https://github.com/mloskot/http-url-test The result is: 4 of those libraries behave like the Python Requests:
It makes a wild statistics of ~30% clients affected by a fragile question-mark-powered resource API. |
I received some pointers to further details in RFC 3986 (see Is question mark in URL part of query string?). In particular, 6.2.3. Scheme-Based Normalization which somewhat applies to this issue and the "question mark" handling. Quoting:
Where "examples above" refers to:
|
So I agree that we should be persisting the query part. The problem is that our supporting libraries don't make it clear whether there was no query part or just an empty one. |
So rfc3986 now supports this use case as of 0.3.1. |
FYI, here is an example of API similar to the one in my issue above: https://confluence.ucop.edu/display/Curation/ARK
|
@mloskot Sadly, lots of people make bad decisions. =( |
@mloskot constantly posting information here will not get this fixed any faster. You're merely spamming 781 people (- the people who have already unsubscribed). |
@Lukasa Yes. That example surprised me and I thought I will share it as an example of bad design..
You are misreading my intention - I'm not trying to press anything at all. |
Workaround: you could probably percent-encode the question mark. |
I'd just like some way to remove the url that requests is inserting into my url. I never asked requests to insert a '?' into my url... |
@internaught What code do you have that causes requests to insert a question-mark? |
@Lukasa I'm simply doing this: return requests.get(
'https://my-{}-api-url.com/api/0.1'.format(
random.randrange(1, 9)),
'/thing/my_thing/{}'.format(other_thing), headers=self.headers
) Thanks for the quick reply! |
@internaught You have a comma between the two parts of your URL. That means your second string is being passed to the |
@Lukasa thanks for the help, working great now. Simple mistake, thanks! |
I'll leave this here for the next guy bitten by this... I was able to work around the behavior of requests by putting a space after the question mark. |
Here is an URL with question mark included, but without any parameters:
That URL (w/ question mark) controls server response in such way that it allows to request certain metadata about an item, instead of representation of an item iteself:
The above works well.
Requests, however, eat-or-skip the question mark if the
params
argument is missing/None
/{}
, so my request is responded with the resource representation, instead of the metadata:That seem to indicate a bug in URL parsing.
The text was updated successfully, but these errors were encountered: