New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of double slashes in path info for relative URLs #491
Comments
… scheme is not set. This fixes pallets#491.
Here is a patch for url_parse(). I'm not sure which one is the best, either patching urlparse() like I did or modifying serving.py:WSGIRequestHandler.make_environ() similarly to what it used to be before commit 7486573... If you think the second way is better, then I can submit another patch. |
Patching url_parse seems more sensible to me, not only because i authored the commit you're referring to, but also because i think url_parse could get more URLs like that. I'm not sure if those are even valid though. |
But actually i can't reproduce this issue with either gunicorn + Apache or the builtin server. |
Well, with the sample code I provided above, I can reproduce the problem straightaway (404 Not Found), so I'm not sure what you mean then... Initially, I stumbled upon this issue with some code using Flask relying in turn on Werkzeug. I'm not sure either this kind of URL are valid as I may have misinterpreted RFC 3986 and/or I don't understand the rationale behind Python Standard Library url_parse() behavior... Anyone? |
Ok i admit i only tried out one of my Flask apps if they return a 404 if i add some double slashes into the path, but they handled it very well. I will try your code later. |
I can reproduce this bug only with |
Your patch also seems to break for schemeless URLs:
Actually i now have to revert my opinion about url_parse having to be patched. It's make_environ which provides a syntactically valid but semantically incorrect URL. EDIT: http://stackoverflow.com/a/20524044 and this from the RFC:
I still wonder how one would find out if
|
Yes, the patch break schemeless URLs, but I'm not sure how this is supposed to work if no scheme has been provided or at least it does not make sense to me in Werkzeug context. As far as I understand RFC 3986, scheme should always be given... My only concern with my patch on url_parse() is that the behavior would be different from Python urlparse() and I'm not sure that's the way to go... By the way, why there would be absolute URLs in HTTP requests to start with? Would you like me to submit another patch for make_environ() instead? (similar to what have been done there: http://bugs.python.org/issue2776). Thanks for your review and quick responses! |
I don't see a reason why absolute URLs should be supported, but by the HTTP RFC it is valid and if we don't implement it Werkzeug would choke on some clients. |
There hasn't been a clear outcome on this issue. Is it still one? |
Sorry, is this supposed to have been fixed? I assume it wasn't. There are some reports around the internet mentioning this issue, e.g.: And with our own project, we've hit this as well. We have a Flask application deployed with werkzeug and access it from a different tool which supports defining a URL of the Flask app in a config file. If the user sets It's not that difficult to work around, but it would be nice if it worked out of the box. Thanks. |
It wasn't fixed, but just stale. |
Since commit 7486573 and the introduction of absolute URLs support in HTTP requests, it is not possible anymore to have a double-slash at the beginning of PATH_INFO. Actually, the environ variables from serving.py:WSGIRequestHandler.make_environ() is completely wrong when requesting 'http://localhost:5001//supplySupply' (same issue with current werkzeug Git HEAD) and raise a 404 Not Found error:
As you can see, HTTP_HOST has been wrongly set to 'supplySupply' and PATH_INFO to '' because of urls.py:url_parse() which considers '//supplySupply' to be a netloc and not the path itself.
It seems that Python urlparse() behaves the same even though I'm not sure why because if scheme is not given, AFAIU from RFC 3986, scheme should always be given... Anyhow, this issue is similar to this one reported on urllib2.urlopen() some years ago:
http://bugs.python.org/issue2776
As a side note, Most HTTP server such as Apache and Zope just ignore double-slashes, so URLs such as 'http://localhost:80//path/info' and 'http://localhost:80/path/info' are the same.
FTR, here is the environ dict set in previous version of werkzeug (0.8.3, before using urlparse in WSGIRequestHandler.make_environ()):
Here is the sample application I used to reproduce this bug easily:
The text was updated successfully, but these errors were encountered: