Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of double slashes in path info for relative URLs #491

Open
arnaud-fontaine opened this Issue Feb 4, 2014 · 13 comments

Comments

Projects
None yet
5 participants
@arnaud-fontaine
Copy link

arnaud-fontaine commented Feb 4, 2014

Since commit 7486573 and the introduction of absolute URLs support in HTTP requests, it is not possible anymore to have a double-slash at the beginning of PATH_INFO. Actually, the environ variables from serving.py:WSGIRequestHandler.make_environ() is completely wrong when requesting 'http://localhost:5001//supplySupply' (same issue with current werkzeug Git HEAD) and raise a 404 Not Found error:

{'CONTENT_LENGTH': '',
 'CONTENT_TYPE': '',
 'HTTP_ACCEPT': '*/*',
 'HTTP_CONNECTION': 'Keep-Alive',
 'HTTP_HOST': 'supplySupply',
 'HTTP_USER_AGENT': 'Wget/1.15',
 'PATH_INFO': '',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': 45501,
 'REQUEST_METHOD': 'GET',
 'SCRIPT_NAME': '',
 'SERVER_NAME': '127.0.0.1',
 'SERVER_PORT': '5001',
 'SERVER_PROTOCOL': 'HTTP/1.1',
 'SERVER_SOFTWARE': 'Werkzeug/0.9.4',
 'werkzeug.server.shutdown': <function shutdown_server at 0x24356e0>,
 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7f5a67e261e0>,
 'wsgi.input': <socket._fileobject object at 0x2432ed0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': False,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}
127.0.0.1 - - [04/Feb/2014 12:25:47] "GET //supplySupply HTTP/1.1" 404 -

As you can see, HTTP_HOST has been wrongly set to 'supplySupply' and PATH_INFO to '' because of urls.py:url_parse() which considers '//supplySupply' to be a netloc and not the path itself.

It seems that Python urlparse() behaves the same even though I'm not sure why because if scheme is not given, AFAIU from RFC 3986, scheme should always be given... Anyhow, this issue is similar to this one reported on urllib2.urlopen() some years ago:
http://bugs.python.org/issue2776

As a side note, Most HTTP server such as Apache and Zope just ignore double-slashes, so URLs such as 'http://localhost:80//path/info' and 'http://localhost:80/path/info' are the same.

FTR, here is the environ dict set in previous version of werkzeug (0.8.3, before using urlparse in WSGIRequestHandler.make_environ()):

{'CONTENT_LENGTH': '',
 'CONTENT_TYPE': '',
 'HTTP_ACCEPT': '*/*',
 'HTTP_CONNECTION': 'Keep-Alive',
 'HTTP_HOST': 'localhost:5001',
 'HTTP_USER_AGENT': 'Wget/1.15 (linux-gnu)',
 'PATH_INFO': '//supplySupply',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': 47148,
 'REQUEST_METHOD': 'GET',
 'SCRIPT_NAME': '',
 'SERVER_NAME': '127.0.0.1',
 'SERVER_PORT': '5001',
 'SERVER_PROTOCOL': 'HTTP/1.1',
 'SERVER_SOFTWARE': 'Werkzeug/0.8.3',
 'werkzeug.server.shutdown': <function shutdown_server at 0x24dd578>,
 'wsgi.errors': <open file '<stderr>', mode 'w' at 0x7fb58fb4b1e0>,
 'wsgi.input': <socket._fileobject object at 0x24dc2d0>,
 'wsgi.multiprocess': False,
 'wsgi.multithread': False,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 0)}
127.0.0.1 - - [04/Feb/2014 13:37:45] "GET //supplySupply HTTP/1.1" 200 -

Here is the sample application I used to reproduce this bug easily:

from werkzeug.exceptions import HTTPException
from werkzeug.routing import Map, Rule, NotFound, RequestRedirect

url_map = Map([Rule('/supplySupply')])

def application(environ, start_response):
    urls = url_map.bind_to_environ(environ)
    try:
        endpoint, args = urls.match()
    except HTTPException, e:
        return e(environ, start_response)
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return ['Rule points to %r with arguments %r' % (endpoint, args)]

from werkzeug.serving import run_simple
run_simple('127.0.0.1', 5001, application, use_debugger=True, use_reloader=True)

arnaud-fontaine added a commit to arnaud-fontaine/werkzeug that referenced this issue Feb 4, 2014

@arnaud-fontaine

This comment has been minimized.

Copy link
Author

arnaud-fontaine commented Feb 4, 2014

Here is a patch for url_parse(). I'm not sure which one is the best, either patching urlparse() like I did or modifying serving.py:WSGIRequestHandler.make_environ() similarly to what it used to be before commit 7486573... If you think the second way is better, then I can submit another patch.

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented Feb 4, 2014

Patching url_parse seems more sensible to me, not only because i authored the commit you're referring to, but also because i think url_parse could get more URLs like that. I'm not sure if those are even valid though.

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented Feb 4, 2014

But actually i can't reproduce this issue with either gunicorn + Apache or the builtin server.

@arnaud-fontaine

This comment has been minimized.

Copy link
Author

arnaud-fontaine commented Feb 5, 2014

Well, with the sample code I provided above, I can reproduce the problem straightaway (404 Not Found), so I'm not sure what you mean then... Initially, I stumbled upon this issue with some code using Flask relying in turn on Werkzeug.

I'm not sure either this kind of URL are valid as I may have misinterpreted RFC 3986 and/or I don't understand the rationale behind Python Standard Library url_parse() behavior... Anyone?

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented Feb 5, 2014

Ok i admit i only tried out one of my Flask apps if they return a 404 if i add some double slashes into the path, but they handled it very well. I will try your code later.

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented Feb 5, 2014

I can reproduce this bug only with http://127.0.0.1:5001//supplySupply, but not with http://127.0.0.1:5001///supplySupply or http://127.0.0.1:5001////supplySupply.

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented Feb 5, 2014

Your patch also seems to break for schemeless URLs:

rv = urls.url_parse('//foobar.example.com/test')
self.assert_strict_equal(rv.scheme, '')
self.assert_strict_equal(rv.host, 'foobar.example.com')
self.assert_strict_equal(rv.path, '/test')

Actually i now have to revert my opinion about url_parse having to be patched. It's make_environ which provides a syntactically valid but semantically incorrect URL.

EDIT: http://stackoverflow.com/a/20524044 and this from the RFC:

 If a URI does not contain an authority component, then the path cannot begin
 with two slash characters ("//").

I still wonder how one would find out if //foobar.com/test contains an
authority component (schemeless URL), or if this is just a path (scheme- and
hostless). I am not sure if the HTTP
standard
allows schemeless URLs:

URIs in HTTP can be represented in absolute form or relative to some known base
URI [11], depending upon the context of their use.

[11] only refers to the relevant RFC for relative URLs, and there's no
mentioning of the keyword scheme in the whole HTTP RFC. I think we should
start to disallow either schemeless or absolute URLs in HTTP requests.

@arnaud-fontaine

This comment has been minimized.

Copy link
Author

arnaud-fontaine commented Feb 7, 2014

Yes, the patch break schemeless URLs, but I'm not sure how this is supposed to work if no scheme has been provided or at least it does not make sense to me in Werkzeug context. As far as I understand RFC 3986, scheme should always be given... My only concern with my patch on url_parse() is that the behavior would be different from Python urlparse() and I'm not sure that's the way to go...

By the way, why there would be absolute URLs in HTTP requests to start with? Would you like me to submit another patch for make_environ() instead? (similar to what have been done there: http://bugs.python.org/issue2776).

Thanks for your review and quick responses!

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented Feb 7, 2014

I don't see a reason why absolute URLs should be supported, but by the HTTP RFC it is valid and if we don't implement it Werkzeug would choke on some clients.

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented Aug 22, 2014

There hasn't been a clear outcome on this issue. Is it still one?

@untitaker untitaker closed this Aug 22, 2014

@kparal

This comment has been minimized.

Copy link

kparal commented May 13, 2015

Sorry, is this supposed to have been fixed? I assume it wasn't.

There are some reports around the internet mentioning this issue, e.g.:
semanticize/semanticizer#21

And with our own project, we've hit this as well. We have a Flask application deployed with werkzeug and access it from a different tool which supports defining a URL of the Flask app in a config file. If the user sets http://0.0.0.0:5000/api/v1.0, we then append something like /jobs and everything works fine. But if the user sets http://0.0.0.0:5000/api/v1.0/ (a trailing slash), the result is http://0.0.0.0:5000/api/v1.0//jobs and that doesn't work, per this bug, yielding 404 Not Found.

It's not that difficult to work around, but it would be nice if it worked out of the box. Thanks.

@untitaker

This comment has been minimized.

Copy link
Member

untitaker commented May 14, 2015

It wasn't fixed, but just stale.

@untitaker untitaker reopened this May 14, 2015

@lzecca78

This comment has been minimized.

Copy link

lzecca78 commented Mar 2, 2017

is there any news about this issue?
In a docker cluster such as Docker Swarm Mode, expose http-socket (along with the routing mesh feature of Swarm Mode) in every uwsgi.ini service, should be the best option ever if this issue there wasn't (basically i should do shotgun surgery in all the code of every microservice to find where there is a double slash, when every http server has the merge_slash function out-of-the-box).
This kind of issue forced me to choose for a nginx uwsgi proxy instead that is not designed to be reliable with the ephemeral docker containers lifecycle (for example nginx if doesn't resolve the name in the uwsgi_params, doesn't even start the service).
Please fix this issue because in 2017 with containers is something not optional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.