Scheme and host erroneously passed to HTTPConnection request method #8

Closed
mapio opened this Issue Nov 14, 2011 · 6 comments

Projects

None yet

4 participants

@mapio
mapio commented Nov 14, 2011

I think there is a problem in the use of httplib.HTTPConnection method request when called at
line 213 of urllib3/connectionpool.py where you pass it the full URL, containing the scheme and host, instead of just the path (and query part), as show in httplib usage examples.

This ends up in a wrong HTTP request performed to the server. To see it, you can for instance run

python -m SimpleHTTPServer

in a shell and then, in another one, run

python -c 'from urllib3 import PoolManager; http = PoolManager(); http.request( "GET", "http://localhost:8000/this/is/an/example" )'

and compare what the access log in the first shell reports as compared to what happens if you do

curl  "http://localhost:8000/this/is/an/example"

I can submit a patch, but I'm not an urllib3 expert so I will probably miss some other place where the same error occurs.

@shazow
Owner
shazow commented Nov 14, 2011

Hi there, thank you for the report!

This was a conscious decision, but perhaps not the correct one.

The goal was to reduce complexity and avoid inexplicit behaviour. That is, when you make a request to "http://localhost:8000/this/is/an/example", that's exactly the request that urllib3 should be making.

The current workaround to achieve what you want is:

from urllib3 import PoolManager
http = PoolManager()
conn = http.connection_from_url("http://localhost:8000")
response = conn.request("GET", "/this/is/an/example")

When we do PoolManager.request, it does the same thing behind the scenes except it doesn't strip away the host like we did here manually. I agree that there should be an option to strip away the host (perhaps even by default).

Should this option be specified in the PoolManager constructor? Such as PoolManager(strip_host=True).

But then when should the stripping occur? If it happens in urlopen, then should we backport the same functionality outside of PoolManager? (ie. into ConnectionPool objects.)

@n1m3
n1m3 commented Nov 14, 2011

The request should definitely be made with the path (and the query) only, because urllib3 is a HTTP/1.1 client.
RFC2616:

To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the
absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.

@kennethreitz
Contributor

Excellent info. Thanks :)

@kennethreitz
Contributor

This isn't a bug in urllib3. It's doing exactly what it's told.

@shazow
Owner
shazow commented Jun 23, 2012

Btw, if anyone is in dire need, here's a handy basic recipe for doing "proper" url passing with redirection in urllib3:

import urlparse
import urllib3

http = urllib3.PoolManager()

def request(method, url, conn=None):
    if conn:
        # Request within the current host connection (used for redirect handling)
        if not url.startswith('/'):
            url = '/' + url
        r = conn.request(method, url, redirect=False, assert_same_host=False)
    else:
        p = urlparse.urlparse(url)
        conn = http.connection_from_host(p.hostname, p.port, p.scheme)
        r = conn.request(method, p.path, redirect=False, assert_same_host=False)

    is_redirect = r.get_redirect_location()
    if not is_redirect:
        return r
    print "Redirecting: %s" % is_redirect

    if '://' not in is_redirect:
        # Redirect to same host
        return request('GET', is_redirect, conn)

    return request('GET', is_redirect)
@shazow shazow added a commit that referenced this issue Jul 1, 2012
@shazow New request and redirect logic for PoolManager.
* Built-in redirect will switch method to 'GET' if status code is 303. (Issue #11)
* ``urllib3.PoolManager`` strips the scheme and host before sending the request uri. (Issue #8)
da66d83
@shazow shazow closed this Aug 4, 2012
@shazow
Owner
shazow commented Aug 4, 2012

Fixed in v1.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment