Handling of double slashes in path info for relative URLs #491

arnaud-fontaine · 2014-02-04T04:46:43Z

Since commit 7486573 and the introduction of absolute URLs support in HTTP requests, it is not possible anymore to have a double-slash at the beginning of PATH_INFO. Actually, the environ variables from serving.py:WSGIRequestHandler.make_environ() is completely wrong when requesting 'http://localhost:5001//supplySupply' (same issue with current werkzeug Git HEAD) and raise a 404 Not Found error:

{'CONTENT_LENGTH': '',
 'CONTENT_TYPE': '',
 'HTTP_ACCEPT': '*/*',
 'HTTP_CONNECTION': 'Keep-Alive',
 'HTTP_HOST': 'supplySupply',
 'HTTP_USER_AGENT': 'Wget/1.15',
 'PATH_INFO': '',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': 45501,
 'REQUEST_METHOD': 'GET',
 'SCRIPT_NAME': '',
 'SERVER_NAME': '127.0.0.1',
 'SERVER_PORT': '5001',
 'SERVER_PROTOCOL': 'HTTP/1.1',
 'SERVER_SOFTWARE': 'Werkzeug/0.9.4',
 'werkzeug.server.shutdown': <function shutdown_server at 0x24356e0>,
 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f5a67e261e0>,
 'wsgi.input': <socket._fileobject object at 0x2432ed0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': False,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}
127.0.0.1 - - [04/Feb/2014 12:25:47] "GET //supplySupply HTTP/1.1" 404 -

As you can see, HTTP_HOST has been wrongly set to 'supplySupply' and PATH_INFO to '' because of urls.py:url_parse() which considers '//supplySupply' to be a netloc and not the path itself.

It seems that Python urlparse() behaves the same even though I'm not sure why because if scheme is not given, AFAIU from RFC 3986, scheme should always be given... Anyhow, this issue is similar to this one reported on urllib2.urlopen() some years ago:
http://bugs.python.org/issue2776

As a side note, Most HTTP server such as Apache and Zope just ignore double-slashes, so URLs such as 'http://localhost:80//path/info' and 'http://localhost:80/path/info' are the same.

FTR, here is the environ dict set in previous version of werkzeug (0.8.3, before using urlparse in WSGIRequestHandler.make_environ()):

{'CONTENT_LENGTH': '',
 'CONTENT_TYPE': '',
 'HTTP_ACCEPT': '*/*',
 'HTTP_CONNECTION': 'Keep-Alive',
 'HTTP_HOST': 'localhost:5001',
 'HTTP_USER_AGENT': 'Wget/1.15 (linux-gnu)',
 'PATH_INFO': '//supplySupply',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': 47148,
 'REQUEST_METHOD': 'GET',
 'SCRIPT_NAME': '',
 'SERVER_NAME': '127.0.0.1',
 'SERVER_PORT': '5001',
 'SERVER_PROTOCOL': 'HTTP/1.1',
 'SERVER_SOFTWARE': 'Werkzeug/0.8.3',
 'werkzeug.server.shutdown': <function shutdown_server at 0x24dd578>,
 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7fb58fb4b1e0>,
 'wsgi.input': <socket._fileobject object at 0x24dc2d0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': False,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}
127.0.0.1 - - [04/Feb/2014 13:37:45] "GET //supplySupply HTTP/1.1" 200 -

Here is the sample application I used to reproduce this bug easily:

from werkzeug.exceptions import HTTPException
from werkzeug.routing import Map, Rule, NotFound, RequestRedirect

url_map = Map([Rule('/supplySupply')])

def application(environ, start_response):
    urls = url_map.bind_to_environ(environ)
    try:
        endpoint, args = urls.match()
    except HTTPException, e:
        return e(environ, start_response)
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['Rule points to %r with arguments %r' % (endpoint, args)]

from werkzeug.serving import run_simple
run_simple('127.0.0.1', 5001, application, use_debugger=True, use_reloader=True)

The text was updated successfully, but these errors were encountered:

… scheme is not set. This fixes pallets#491.

arnaud-fontaine · 2014-02-04T05:39:30Z

Here is a patch for url_parse(). I'm not sure which one is the best, either patching urlparse() like I did or modifying serving.py:WSGIRequestHandler.make_environ() similarly to what it used to be before commit 7486573... If you think the second way is better, then I can submit another patch.

untitaker · 2014-02-04T20:27:30Z

Patching url_parse seems more sensible to me, not only because i authored the commit you're referring to, but also because i think url_parse could get more URLs like that. I'm not sure if those are even valid though.

untitaker · 2014-02-04T20:29:28Z

But actually i can't reproduce this issue with either gunicorn + Apache or the builtin server.

arnaud-fontaine · 2014-02-05T09:25:45Z

Well, with the sample code I provided above, I can reproduce the problem straightaway (404 Not Found), so I'm not sure what you mean then... Initially, I stumbled upon this issue with some code using Flask relying in turn on Werkzeug.

I'm not sure either this kind of URL are valid as I may have misinterpreted RFC 3986 and/or I don't understand the rationale behind Python Standard Library url_parse() behavior... Anyone?

untitaker · 2014-02-05T09:43:19Z

Ok i admit i only tried out one of my Flask apps if they return a 404 if i add some double slashes into the path, but they handled it very well. I will try your code later.

untitaker · 2014-02-05T13:15:47Z

I can reproduce this bug only with http://127.0.0.1:5001//supplySupply, but not with http://127.0.0.1:5001///supplySupply or http://127.0.0.1:5001////supplySupply.

untitaker · 2014-02-05T13:31:46Z

Your patch also seems to break for schemeless URLs:

rv = urls.url_parse('//foobar.example.com/test')
self.assert_strict_equal(rv.scheme, '')
self.assert_strict_equal(rv.host, 'foobar.example.com')
self.assert_strict_equal(rv.path, '/test')

Actually i now have to revert my opinion about url_parse having to be patched. It's make_environ which provides a syntactically valid but semantically incorrect URL.

EDIT: http://stackoverflow.com/a/20524044 and this from the RFC:

 If a URI does not contain an authority component, then the path cannot begin
 with two slash characters ("//").

I still wonder how one would find out if //foobar.com/test contains an
authority component (schemeless URL), or if this is just a path (scheme- and
hostless). I am not sure if the HTTP
standard allows schemeless URLs:

URIs in HTTP can be represented in absolute form or relative to some known base
URI [11], depending upon the context of their use.

[11] only refers to the relevant RFC for relative URLs, and there's no
mentioning of the keyword scheme in the whole HTTP RFC. I think we should
start to disallow either schemeless or absolute URLs in HTTP requests.

arnaud-fontaine · 2014-02-07T07:57:20Z

Yes, the patch break schemeless URLs, but I'm not sure how this is supposed to work if no scheme has been provided or at least it does not make sense to me in Werkzeug context. As far as I understand RFC 3986, scheme should always be given... My only concern with my patch on url_parse() is that the behavior would be different from Python urlparse() and I'm not sure that's the way to go...

By the way, why there would be absolute URLs in HTTP requests to start with? Would you like me to submit another patch for make_environ() instead? (similar to what have been done there: http://bugs.python.org/issue2776).

Thanks for your review and quick responses!

untitaker · 2014-02-07T09:48:38Z

I don't see a reason why absolute URLs should be supported, but by the HTTP RFC it is valid and if we don't implement it Werkzeug would choke on some clients.

untitaker · 2014-08-22T21:37:50Z

There hasn't been a clear outcome on this issue. Is it still one?

kparal · 2015-05-13T13:29:51Z

Sorry, is this supposed to have been fixed? I assume it wasn't.

There are some reports around the internet mentioning this issue, e.g.:
semanticize/semanticizer#21

And with our own project, we've hit this as well. We have a Flask application deployed with werkzeug and access it from a different tool which supports defining a URL of the Flask app in a config file. If the user sets http://0.0.0.0:5000/api/v1.0, we then append something like /jobs and everything works fine. But if the user sets http://0.0.0.0:5000/api/v1.0/ (a trailing slash), the result is http://0.0.0.0:5000/api/v1.0//jobs and that doesn't work, per this bug, yielding 404 Not Found.

It's not that difficult to work around, but it would be nice if it worked out of the box. Thanks.

untitaker · 2015-05-14T17:27:21Z

It wasn't fixed, but just stale.

arnaud-fontaine added a commit to arnaud-fontaine/werkzeug that referenced this issue Feb 4, 2014

Do not consider PATH_INFO starting with a double-slash as a netloc if…

61baef0

… scheme is not set. This fixes pallets#491.

untitaker closed this as completed Aug 22, 2014

untitaker reopened this May 14, 2015

This comment has been minimized.

Sign in to view

davidism mentioned this issue Jun 5, 2017

Groups of slashes in path should be normalized and redirected #1132

Closed

davidism added the routing label Jan 20, 2019

robnagler mentioned this issue Nov 18, 2019

//foo/bar routes to bar, and "foo" is host radiasoft/sirepo#2062

Closed

davidism mentioned this issue Jan 7, 2020

dev server accepts two slashes at start of path #1696

Merged

davidism added server and removed routing labels Jan 7, 2020

davidism added this to the 1.0.0 milestone Jan 7, 2020

davidism closed this as completed in #1696 Jan 7, 2020

github-actions bot locked as resolved and limited conversation to collaborators Nov 13, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling of double slashes in path info for relative URLs #491

Handling of double slashes in path info for relative URLs #491

arnaud-fontaine commented Feb 4, 2014

arnaud-fontaine commented Feb 4, 2014

untitaker commented Feb 4, 2014

untitaker commented Feb 4, 2014

arnaud-fontaine commented Feb 5, 2014

untitaker commented Feb 5, 2014

untitaker commented Feb 5, 2014

untitaker commented Feb 5, 2014

arnaud-fontaine commented Feb 7, 2014

untitaker commented Feb 7, 2014

untitaker commented Aug 22, 2014

kparal commented May 13, 2015

untitaker commented May 14, 2015

This comment has been minimized.

Handling of double slashes in path info for relative URLs #491

Handling of double slashes in path info for relative URLs #491

Comments

arnaud-fontaine commented Feb 4, 2014

arnaud-fontaine commented Feb 4, 2014

untitaker commented Feb 4, 2014

untitaker commented Feb 4, 2014

arnaud-fontaine commented Feb 5, 2014

untitaker commented Feb 5, 2014

untitaker commented Feb 5, 2014

untitaker commented Feb 5, 2014

arnaud-fontaine commented Feb 7, 2014

untitaker commented Feb 7, 2014

untitaker commented Aug 22, 2014

kparal commented May 13, 2015

untitaker commented May 14, 2015

This comment has been minimized.