Question mark eaten from request URL if no params specified #2912

mloskot · 2015-12-02T00:30:13Z

Here is an URL with question mark included, but without any parameters:

url='http://server.com/api/item.json?'

That URL (w/ question mark) controls server response in such way that it allows to request certain metadata about an item, instead of representation of an item iteself:

import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url)
print (r.data)
{ "ItemMetadata" : { ... }}

The above works well.

Requests, however, eat-or-skip the question mark if the params argument is missing/None/{}, so my request is responded with the resource representation, instead of the metadata:

from requests import get
r = get(url, stream=True)
print(r.text)
{ "Item" : { ... }}

That seem to indicate a bug in URL parsing.

The text was updated successfully, but these errors were encountered:

Lukasa · 2015-12-02T08:36:58Z

This is not a bug in URL parsing: quite the opposite.

Requests does its best to normalise URLs where it is safe to do so. This is necessary to ensure that various stages of URL building work appropriately, and that we don't accidentally end up with invalid headers (e.g. multiple question marks or none at all when parameters are present). A trailing question mark with no parameters is superfluous (and arguably somewhat invalid), so requests kindly strips it off for you. This is entirely intentional.

For what it's worth, my opinion is that this API is poorly designed. It relies on the idea that no middle-box will rewrite that URL to remove the empty query part, which is a risk. Generally speaking, the better API would be to have a sub-resource or a proper query string (e.g. /api/item.json?metadata=true, or /api/item/metadata.json). This API design is super fragile, and I guarantee that it'll be prone to breaking in mysterious ways. I doubt we're the only framework that strips it. In fact, anything that relies on Python's standard urlsplit function will probably do exactly what we do and ignore the query part.

However, both browsers and curl will, if requested, send the empty query string. Having chatted to @bagder I don't think this is a deliberate decision on the part of those entities, so much as it just happens to fall out of the way they handle query strings.

Given that we have a duty to be better than the Python standard library, that means I think we should come up with a way to make it possible, at least when using the PreparedRequest API. Sadly, urlsplit makes this really hard for us, because it doesn't have a difference in the query portion of the URL for http://http2bin.org/get' andhttp://http2bin.org/get?`. That makes our lives really tricky because I'd rather not hand roll URL parsing if I can possibly avoid it.

I think the only way we can fix this is to change the library we use to parse URLs to one that can safely inform us of the difference between these two cases. @sigmavirus24, do you think your URL parsing library would do better here?

I think this is a reasonable request, but we can't get to it before 3.0.0 because moving to a new URL parsing library will probably break a whole lot of stuff.

mloskot · 2015-12-02T11:31:53Z

This is not a bug in URL parsing: quite the opposite.

I would appreciate clarification that would help me to understand the rules behind the URL parsers behaviour at large.

Is that URL parsers interpret the following from RFC 1738, section 3.3, that the "?" (question mark) is not part of the query string?

http://<host>:<port>/<path>?<searchpart>

Does also the grammar in RFC 3986, Appendix A., indicate the "?" is not part of the actual query string?

URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

If so, for URL like http://server.com/item.json?, while applying the suggested (de|re)composition algorithm, i.e.:

      if defined(query) then
         append "?" to result;
         append query to result;
      endif;

parsers rightly assume the query component as undefined, hence superfluous.

Is my understanding correct?

Lukasa · 2015-12-02T11:34:51Z

Your understanding is correct, at least as far as the standard library URL parser goes. That is not necessarily generally true of URL parsers, which is why I asked @sigmavirus24 if his URL parser has the same behaviour or not. In particular, it's not clear to me that an absent query part is semantically identical to an empty query part: Appendix A of RFC 3986 does appear to allow zero-length query strings.

Again, I think this is strictly acceptable, but it's likely to be extremely fragile.

mloskot · 2015-12-02T12:01:20Z

@Lukasa I see. Yes, I'm concerned about the standard library URL parser, but also curious what is a common practice in other parsers. So, thanks for the clarifications so far.

From your earlier response:

This API design is super fragile, and I guarantee that it'll be prone to breaking in mysterious ways.

That is exactly what worries me, so I'm trying to understand what are the super fragile traits exactly and what are the ways the API may fails.

Lukasa · 2015-12-02T12:26:22Z

Sure. So, my reason for thinking it's fragile is mostly because this is clearly a grey area of the URL specification. URLs and URL normalisation are fraught topics, and as a result it's unwise to build an API that assumes that two semantically equivalent URLs with different representations will be transmitted without potentially being translated between those representations.

What's not clear to me is whether the zero-length query string is one of them. I think this is because it's simply in a grey area: some entities treat it as different to no query string at all, others do not. For that reason it's fragile, not wrong: while it may work in many cases, I think it's likely that it'd be a source of subtle bugs.

sigmavirus24 · 2015-12-02T13:56:36Z

@Lukasa I think we strip it but I can imagine a way to avoid that.

mloskot · 2015-12-02T14:03:13Z

@Lukasa Thanks very much. I sympathise with your critique.

mloskot · 2015-12-03T23:28:57Z

FYI,

Out of curiosity, not to prove my point, I've probed a bit the landscape of HTTP client libraries.

I've tried my target question-mark-powered API with popular clients for

Python (http.client, requests, urllib, urllib3)
Node (request, shred, unirest, urllib)
C# (WebRequest, WebClient, HttpClient, EasyHttp, RestSharp)
Go (http/net)
D (std.net.curl - powered by libcurl)

Here is complete project https://github.com/mloskot/http-url-test

The result is: 4 of those libraries behave like the Python Requests:

Python: Requests
Node: shred
Go:
- net/http: Question mark eaten from URL if no query parameters specified golang/go#13488
- Question mark eaten from URL if no query parameters specified parnurzeal/gorequest#65

It makes a wild statistics of ~30% clients affected by a fragile question-mark-powered resource API.

mloskot · 2015-12-06T20:33:34Z

I received some pointers to further details in RFC 3986 (see Is question mark in URL part of query string?). In particular, 6.2.3. Scheme-Based Normalization which somewhat applies to this issue and the "question mark" handling. Quoting:

Normalization should not remove delimiters when their associated component is empty unless licensed to do so by the scheme specification. For example, the URI http://example.com/? cannot be assumed to be equivalent to any of the examples above.

Where "examples above" refers to:

http://example.com
http://example.com/
http://example.com:/
http://example.com:80/

Lukasa · 2015-12-06T20:41:00Z

So I agree that we should be persisting the query part. The problem is that our supporting libraries don't make it clear whether there was no query part or just an empty one.

sigmavirus24 · 2015-12-06T23:40:08Z

So rfc3986 now supports this use case as of 0.3.1.

mloskot · 2015-12-10T16:16:50Z

FYI, here is an example of API similar to the one in my issue above:

https://confluence.ucop.edu/display/Curation/ARK

a brief metadata record if you append a single question mark to the ARK
(...)
Adding '??' to the end should return a policy statement.

Lukasa · 2015-12-10T16:23:50Z

@mloskot Sadly, lots of people make bad decisions. =(

sigmavirus24 · 2015-12-10T16:25:26Z

@mloskot constantly posting information here will not get this fixed any faster. You're merely spamming 781 people (- the people who have already unsubscribed).

mloskot · 2015-12-10T16:45:59Z

@Lukasa Yes. That example surprised me and I thought I will share it as an example of bad design..

@sigmavirus24

constantly posting information here will not get this fixed any faster.

You are misreading my intention - I'm not trying to press anything at all.
I'm simply sharing some further details which I believe are relevant to this issue.
I'm sorry if I caused any annoyance.

kennethreitz · 2016-01-30T04:26:44Z

Workaround: you could probably percent-encode the question mark.

cabrinha · 2016-04-08T07:48:30Z

I'd just like some way to remove the url that requests is inserting into my url. I never asked requests to insert a '?' into my url...

Lukasa · 2016-04-08T07:49:23Z

@internaught What code do you have that causes requests to insert a question-mark?

cabrinha · 2016-04-08T07:55:54Z

@Lukasa I'm simply doing this:

return requests.get(
            'https://my-{}-api-url.com/api/0.1'.format(
                random.randrange(1, 9)),
            '/thing/my_thing/{}'.format(other_thing), headers=self.headers
        )

Thanks for the quick reply!

Lukasa · 2016-04-08T07:58:15Z

@internaught You have a comma between the two parts of your URL. That means your second string is being passed to the params parameter of the requests.get call, which means requests assumes it's a query string. Replace the comma with a + and you'll be fine.

cabrinha · 2016-04-08T07:59:50Z

@Lukasa thanks for the help, working great now. Simple mistake, thanks!

poleguy · 2017-10-20T19:01:14Z

I'll leave this here for the next guy bitten by this...
The mini-circuits ZTM boxes use a bare ? to query the state. https://www.minicircuits.com/softwaredownload/ztm_rcm.html

I was able to work around the behavior of requests by putting a space after the question mark.

Lukasa added Feature Request Planned Breaking API Change labels Dec 2, 2015

Lukasa added this to the 3.0.0 milestone Dec 2, 2015

Lukasa self-assigned this Dec 2, 2015

This was referenced Dec 4, 2015

net/http: Question mark eaten from URL if no query parameters specified golang/go#13488

Closed

Question mark eaten from URL if no query parameters specified parnurzeal/gorequest#65

Open

mloskot closed this as completed May 11, 2017

p0pr0ck5 mentioned this issue Jun 8, 2017

fix(router) ensure transparent proxying of URIs Kong/kong#2519

Merged

psf locked and limited conversation to collaborators Oct 21, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question mark eaten from request URL if no params specified #2912

Question mark eaten from request URL if no params specified #2912

mloskot commented Dec 2, 2015 •

edited

Loading

Lukasa commented Dec 2, 2015

mloskot commented Dec 2, 2015

Lukasa commented Dec 2, 2015

mloskot commented Dec 2, 2015

Lukasa commented Dec 2, 2015

sigmavirus24 commented Dec 2, 2015

mloskot commented Dec 2, 2015

mloskot commented Dec 3, 2015

mloskot commented Dec 6, 2015

Lukasa commented Dec 6, 2015

sigmavirus24 commented Dec 6, 2015

mloskot commented Dec 10, 2015

Lukasa commented Dec 10, 2015

sigmavirus24 commented Dec 10, 2015

mloskot commented Dec 10, 2015

kennethreitz commented Jan 30, 2016

cabrinha commented Apr 8, 2016

Lukasa commented Apr 8, 2016

cabrinha commented Apr 8, 2016

Lukasa commented Apr 8, 2016

cabrinha commented Apr 8, 2016

poleguy commented Oct 20, 2017

Question mark eaten from request URL if no params specified #2912

Question mark eaten from request URL if no params specified #2912

Comments

mloskot commented Dec 2, 2015 • edited Loading

Lukasa commented Dec 2, 2015

mloskot commented Dec 2, 2015

Lukasa commented Dec 2, 2015

mloskot commented Dec 2, 2015

Lukasa commented Dec 2, 2015

sigmavirus24 commented Dec 2, 2015

mloskot commented Dec 2, 2015

mloskot commented Dec 3, 2015

mloskot commented Dec 6, 2015

Lukasa commented Dec 6, 2015

sigmavirus24 commented Dec 6, 2015

mloskot commented Dec 10, 2015

Lukasa commented Dec 10, 2015

sigmavirus24 commented Dec 10, 2015

mloskot commented Dec 10, 2015

kennethreitz commented Jan 30, 2016

cabrinha commented Apr 8, 2016

Lukasa commented Apr 8, 2016

cabrinha commented Apr 8, 2016

Lukasa commented Apr 8, 2016

cabrinha commented Apr 8, 2016

poleguy commented Oct 20, 2017

mloskot commented Dec 2, 2015 •

edited

Loading