Skip to content

Handling of double slashes in path info for relative URLs #491

@arnaud-fontaine

Description

@arnaud-fontaine

Since commit 7486573 and the introduction of absolute URLs support in HTTP requests, it is not possible anymore to have a double-slash at the beginning of PATH_INFO. Actually, the environ variables from serving.py:WSGIRequestHandler.make_environ() is completely wrong when requesting 'http://localhost:5001//supplySupply' (same issue with current werkzeug Git HEAD) and raise a 404 Not Found error:

{'CONTENT_LENGTH': '',
 'CONTENT_TYPE': '',
 'HTTP_ACCEPT': '*/*',
 'HTTP_CONNECTION': 'Keep-Alive',
 'HTTP_HOST': 'supplySupply',
 'HTTP_USER_AGENT': 'Wget/1.15',
 'PATH_INFO': '',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': 45501,
 'REQUEST_METHOD': 'GET',
 'SCRIPT_NAME': '',
 'SERVER_NAME': '127.0.0.1',
 'SERVER_PORT': '5001',
 'SERVER_PROTOCOL': 'HTTP/1.1',
 'SERVER_SOFTWARE': 'Werkzeug/0.9.4',
 'werkzeug.server.shutdown': <function shutdown_server at 0x24356e0>,
 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f5a67e261e0>,
 'wsgi.input': <socket._fileobject object at 0x2432ed0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': False,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}
127.0.0.1 - - [04/Feb/2014 12:25:47] "GET //supplySupply HTTP/1.1" 404 -

As you can see, HTTP_HOST has been wrongly set to 'supplySupply' and PATH_INFO to '' because of urls.py:url_parse() which considers '//supplySupply' to be a netloc and not the path itself.

It seems that Python urlparse() behaves the same even though I'm not sure why because if scheme is not given, AFAIU from RFC 3986, scheme should always be given... Anyhow, this issue is similar to this one reported on urllib2.urlopen() some years ago:
http://bugs.python.org/issue2776

As a side note, Most HTTP server such as Apache and Zope just ignore double-slashes, so URLs such as 'http://localhost:80//path/info' and 'http://localhost:80/path/info' are the same.

FTR, here is the environ dict set in previous version of werkzeug (0.8.3, before using urlparse in WSGIRequestHandler.make_environ()):

{'CONTENT_LENGTH': '',
 'CONTENT_TYPE': '',
 'HTTP_ACCEPT': '*/*',
 'HTTP_CONNECTION': 'Keep-Alive',
 'HTTP_HOST': 'localhost:5001',
 'HTTP_USER_AGENT': 'Wget/1.15 (linux-gnu)',
 'PATH_INFO': '//supplySupply',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': 47148,
 'REQUEST_METHOD': 'GET',
 'SCRIPT_NAME': '',
 'SERVER_NAME': '127.0.0.1',
 'SERVER_PORT': '5001',
 'SERVER_PROTOCOL': 'HTTP/1.1',
 'SERVER_SOFTWARE': 'Werkzeug/0.8.3',
 'werkzeug.server.shutdown': <function shutdown_server at 0x24dd578>,
 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7fb58fb4b1e0>,
 'wsgi.input': <socket._fileobject object at 0x24dc2d0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': False,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}
127.0.0.1 - - [04/Feb/2014 13:37:45] "GET //supplySupply HTTP/1.1" 200 -

Here is the sample application I used to reproduce this bug easily:

from werkzeug.exceptions import HTTPException
from werkzeug.routing import Map, Rule, NotFound, RequestRedirect

url_map = Map([Rule('/supplySupply')])

def application(environ, start_response):
    urls = url_map.bind_to_environ(environ)
    try:
        endpoint, args = urls.match()
    except HTTPException, e:
        return e(environ, start_response)
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['Rule points to %r with arguments %r' % (endpoint, args)]

from werkzeug.serving import run_simple
run_simple('127.0.0.1', 5001, application, use_debugger=True, use_reloader=True)

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions