New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handling of dots in the path #329
Comments
I'm wondering whether this isn't potentially a WebOb issue. We should check. |
A tool to scan for XXX comments I long forgot about might be useful. We sort of accidentally seem to have relied on webob's normalizing behavior, which makes the dots go away before they enter the framework, is that correct? But we seem to have discovered a flaw in it. I think this should be fixed on the WebOb side -- I just looked for any issues surrounding this but couldn't find any. We should create a test case for webob path handling and if it's still flawed in a recent release, create an issue there. We then either switch to a bugfix release or put guard code in our own codebase for the time being (until we can switch to that release). Maybe the webob people will declare it a non-flaw, in which case we should just implement it our own safety code. I think it's the job of the framework to normalize this way as clearly not doing so can lead to problems. |
I actually begin to think that we've relied on the client's normalizing behavior. Curl, wget as well as all browsers I can find collapse '..', unless they are quoted. There is nothing in WebOb, or indeed in CPython that does anything with the path. It either arrives with dots or not. Interestingly, there is a function in CPython that collapses the path, but it's not used to manipulate the path of the request: The curl developer also blogged about it: http://daniel.haxx.se/blog/2013/07/30/dotdot-removal-in-libcurl/ So I'm beginning to suspect that this is something we need to fix ourselves. What do you think? |
Let us fix it for ourselves, and create an issue in webob. Last time I tried to get something into WebOb (the forwarded header) it got a bit stalled. Where do we fix it for ourselves is the next question. I'd be inclined to do it very early on after the request is created, so that all tweens will get the normalized path too. |
Agreed. I think we can either change the path_info on the request after it has been initialised by WebOb. So right after this line. Or we can change the PATH_INFO in the environment before Webob processes it. I think in any case that path_info is what we're looking for, from glancing at Webob's source. Not sure which approach is better. |
I implemented it differently, adding it to This fixes the problem my original static files example had. |
As always, let me know what you think. If all is okay I'll merge this into the master. |
Fixing it in traject.parse_path means that people who use path_info directly still get to normalize this themselves, isn't it? What about tween code? It may be okay for us to do so -- there is a drawback to try to normalize stuff outside of webob, but that was my original motivation to do this earlier in the publisher. |
That's a good point, I haven't thought of that. I'll move it up the stack. |
Question: does this catch the whole issue that we started with, where '%2E' is sent? I figured I'd read normpath. Some things for concern may be:
|
I moved and refactored the code quite a bit. I got it a bit wrong with the git merge, but that's why this is in a separate branch ;). See https://github.com/morepath/morepath/compare/issue_329?expand=1 for a complete comparison.
Yes it does.
I added more tests that make sure never get a '.', but always a '/' as is seems to be the default.
Does it? I added tests in the updated version that say it doesn't ;)
Do you mean like this |
Thanks. Looks read to merge to me! |
Cool, I'll do that right away. |
When requesting a path in Morepath, for example
http://localhost/blog/../post
the resulting path inside Morepath is/post
(before the path is consumed), mostly because that's how WebOb seems to interpret the path.As discussed on the mailing-list, this behavior changes if we replace the
.
with%2E
.http://localhost/blog/%2E%2E/post
will then lead to the following path/blog/../post
(before the path is consumed).To make matters worse, the second example only works on certain browsers, as some browsers will normalize the
%2E%2E
away, before sending the request to the server.I was wondering if this behavior is really what we want. It may be useful in special cases to have these paths handled differently, but following the path of least surprises, I would expect the dots to be normalized away. Especially since most browsers do that anyway.
Searching the Morepath code for the handling of dots I also stumbled upon this, in test_traject.py:
# XXX removing /./ from paths and checking for ../
So it seems like there some thought was put into this.
Therefore I would love to discuss this:
How is Morepath meant to handle dots in the path in general?
And should that path handling include some kind of normalization?
Related:
https://groups.google.com/forum/#!topic/morepath/XOiwC0v_LXE
https://groups.google.com/forum/#!topic/morepath/ORArAUoQ6uI
http://stackoverflow.com/questions/3856693/a-url-resource-that-is-a-dot-2e
The text was updated successfully, but these errors were encountered: