Scheme and host erroneously passed to HTTPConnection request method #8

mapio opened this Issue Nov 14, 2011 · 6 comments


None yet

4 participants

mapio commented Nov 14, 2011

I think there is a problem in the use of httplib.HTTPConnection method request when called at
line 213 of urllib3/ where you pass it the full URL, containing the scheme and host, instead of just the path (and query part), as show in httplib usage examples.

This ends up in a wrong HTTP request performed to the server. To see it, you can for instance run

python -m SimpleHTTPServer

in a shell and then, in another one, run

python -c 'from urllib3 import PoolManager; http = PoolManager(); http.request( "GET", "http://localhost:8000/this/is/an/example" )'

and compare what the access log in the first shell reports as compared to what happens if you do

curl  "http://localhost:8000/this/is/an/example"

I can submit a patch, but I'm not an urllib3 expert so I will probably miss some other place where the same error occurs.

shazow commented Nov 14, 2011

Hi there, thank you for the report!

This was a conscious decision, but perhaps not the correct one.

The goal was to reduce complexity and avoid inexplicit behaviour. That is, when you make a request to "http://localhost:8000/this/is/an/example", that's exactly the request that urllib3 should be making.

The current workaround to achieve what you want is:

from urllib3 import PoolManager
http = PoolManager()
conn = http.connection_from_url("http://localhost:8000")
response = conn.request("GET", "/this/is/an/example")

When we do PoolManager.request, it does the same thing behind the scenes except it doesn't strip away the host like we did here manually. I agree that there should be an option to strip away the host (perhaps even by default).

Should this option be specified in the PoolManager constructor? Such as PoolManager(strip_host=True).

But then when should the stripping occur? If it happens in urlopen, then should we backport the same functionality outside of PoolManager? (ie. into ConnectionPool objects.)

n1m3 commented Nov 14, 2011

The request should definitely be made with the path (and the query) only, because urllib3 is a HTTP/1.1 client.

To allow for transition to absoluteURIs in all requests in future versions of HTTP, all HTTP/1.1 servers MUST accept the
absoluteURI form in requests, even though HTTP/1.1 clients will only generate them in requests to proxies.


Excellent info. Thanks :)


This isn't a bug in urllib3. It's doing exactly what it's told.

shazow commented Jun 23, 2012

Btw, if anyone is in dire need, here's a handy basic recipe for doing "proper" url passing with redirection in urllib3:

import urlparse
import urllib3

http = urllib3.PoolManager()

def request(method, url, conn=None):
    if conn:
        # Request within the current host connection (used for redirect handling)
        if not url.startswith('/'):
            url = '/' + url
        r = conn.request(method, url, redirect=False, assert_same_host=False)
        p = urlparse.urlparse(url)
        conn = http.connection_from_host(p.hostname, p.port, p.scheme)
        r = conn.request(method, p.path, redirect=False, assert_same_host=False)

    is_redirect = r.get_redirect_location()
    if not is_redirect:
        return r
    print "Redirecting: %s" % is_redirect

    if '://' not in is_redirect:
        # Redirect to same host
        return request('GET', is_redirect, conn)

    return request('GET', is_redirect)
@shazow shazow added a commit that referenced this issue Jul 1, 2012
@shazow New request and redirect logic for PoolManager.
* Built-in redirect will switch method to 'GET' if status code is 303. (Issue #11)
* ``urllib3.PoolManager`` strips the scheme and host before sending the request uri. (Issue #8)
@shazow shazow closed this Aug 4, 2012
shazow commented Aug 4, 2012

Fixed in v1.5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment