Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
Handling of dots in the path #329
When requesting a path in Morepath, for example
As discussed on the mailing-list, this behavior changes if we replace the
To make matters worse, the second example only works on certain browsers, as some browsers will normalize the
I was wondering if this behavior is really what we want. It may be useful in special cases to have these paths handled differently, but following the path of least surprises, I would expect the dots to be normalized away. Especially since most browsers do that anyway.
Searching the Morepath code for the handling of dots I also stumbled upon this, in test_traject.py:
So it seems like there some thought was put into this.
Therefore I would love to discuss this:
How is Morepath meant to handle dots in the path in general?
A tool to scan for XXX comments I long forgot about might be useful.
We sort of accidentally seem to have relied on webob's normalizing behavior, which makes the dots go away before they enter the framework, is that correct?
But we seem to have discovered a flaw in it. I think this should be fixed on the WebOb side -- I just looked for any issues surrounding this but couldn't find any. We should create a test case for webob path handling and if it's still flawed in a recent release, create an issue there. We then either switch to a bugfix release or put guard code in our own codebase for the time being (until we can switch to that release).
Maybe the webob people will declare it a non-flaw, in which case we should just implement it our own safety code. I think it's the job of the framework to normalize this way as clearly not doing so can lead to problems.
I actually begin to think that we've relied on the client's normalizing behavior. Curl, wget as well as all browsers I can find collapse '..', unless they are quoted. There is nothing in WebOb, or indeed in CPython that does anything with the path. It either arrives with dots or not.
Interestingly, there is a function in CPython that collapses the path, but it's not used to manipulate the path of the request:
The curl developer also blogged about it: http://daniel.haxx.se/blog/2013/07/30/dotdot-removal-in-libcurl/
So I'm beginning to suspect that this is something we need to fix ourselves.
What do you think?
Let us fix it for ourselves, and create an issue in webob. Last time I tried to get something into WebOb (the forwarded header) it got a bit stalled.
Where do we fix it for ourselves is the next question. I'd be inclined to do it very early on after the request is created, so that all tweens will get the normalized path too.
I think we can either change the path_info on the request after it has been initialised by WebOb. So right after this line.
Or we can change the PATH_INFO in the environment before Webob processes it.
I think in any case that path_info is what we're looking for, from glancing at Webob's source.
Not sure which approach is better.
added a commit
Jun 4, 2015
Fixing it in traject.parse_path means that people who use path_info directly still get to normalize this themselves, isn't it? What about tween code? It may be okay for us to do so -- there is a drawback to try to normalize stuff outside of webob, but that was my original motivation to do this earlier in the publisher.
Question: does this catch the whole issue that we started with, where '%2E' is sent?
I figured I'd read normpath. Some things for concern may be:
I moved and refactored the code quite a bit. I got it a bit wrong with the git merge, but that's why this is in a separate branch ;). See https://github.com/morepath/morepath/compare/issue_329?expand=1 for a complete comparison.
Yes it does.
I added more tests that make sure never get a '.', but always a '/' as is seems to be the default.
Does it? I added tests in the updated version that say it doesn't ;)
Do you mean like this