Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of dots in the path #329

Closed
href opened this issue May 18, 2015 · 13 comments
Closed

Handling of dots in the path #329

href opened this issue May 18, 2015 · 13 comments
Labels
Milestone

Comments

@href
Copy link
Member

href commented May 18, 2015

When requesting a path in Morepath, for example http://localhost/blog/../post the resulting path inside Morepath is /post (before the path is consumed), mostly because that's how WebOb seems to interpret the path.

As discussed on the mailing-list, this behavior changes if we replace the . with %2E. http://localhost/blog/%2E%2E/post will then lead to the following path /blog/../post (before the path is consumed).

To make matters worse, the second example only works on certain browsers, as some browsers will normalize the %2E%2E away, before sending the request to the server.

I was wondering if this behavior is really what we want. It may be useful in special cases to have these paths handled differently, but following the path of least surprises, I would expect the dots to be normalized away. Especially since most browsers do that anyway.

Searching the Morepath code for the handling of dots I also stumbled upon this, in test_traject.py:

# XXX removing /./ from paths and checking for ../

So it seems like there some thought was put into this.

Therefore I would love to discuss this:

How is Morepath meant to handle dots in the path in general?
And should that path handling include some kind of normalization?

Related:
https://groups.google.com/forum/#!topic/morepath/XOiwC0v_LXE
https://groups.google.com/forum/#!topic/morepath/ORArAUoQ6uI
http://stackoverflow.com/questions/3856693/a-url-resource-that-is-a-dot-2e

@faassen faassen added the bug label May 25, 2015
@faassen faassen added this to the 0.11 milestone May 25, 2015
@faassen
Copy link
Member

faassen commented May 25, 2015

I'm wondering whether this isn't potentially a WebOb issue. We should check.

@faassen
Copy link
Member

faassen commented May 25, 2015

A tool to scan for XXX comments I long forgot about might be useful.

We sort of accidentally seem to have relied on webob's normalizing behavior, which makes the dots go away before they enter the framework, is that correct?

But we seem to have discovered a flaw in it. I think this should be fixed on the WebOb side -- I just looked for any issues surrounding this but couldn't find any. We should create a test case for webob path handling and if it's still flawed in a recent release, create an issue there. We then either switch to a bugfix release or put guard code in our own codebase for the time being (until we can switch to that release).

Maybe the webob people will declare it a non-flaw, in which case we should just implement it our own safety code. I think it's the job of the framework to normalize this way as clearly not doing so can lead to problems.

@href
Copy link
Member Author

href commented May 26, 2015

We sort of accidentally seem to have relied on webob's normalizing behavior, which makes the dots go away before they enter the framework, is that correct?

I actually begin to think that we've relied on the client's normalizing behavior. Curl, wget as well as all browsers I can find collapse '..', unless they are quoted. There is nothing in WebOb, or indeed in CPython that does anything with the path. It either arrives with dots or not.

Interestingly, there is a function in CPython that collapses the path, but it's not used to manipulate the path of the request:
https://github.com/python/cpython/blob/aa8ea3a6be22c92e774df90c6a6ee697915ca8ec/Lib/http/server.py#L831

The curl developer also blogged about it: http://daniel.haxx.se/blog/2013/07/30/dotdot-removal-in-libcurl/

So I'm beginning to suspect that this is something we need to fix ourselves.
Though I guess there's no harm in discussing this with Webob.

What do you think?

@faassen
Copy link
Member

faassen commented May 26, 2015

Let us fix it for ourselves, and create an issue in webob. Last time I tried to get something into WebOb (the forwarded header) it got a bit stalled.

Where do we fix it for ourselves is the next question. I'd be inclined to do it very early on after the request is created, so that all tweens will get the normalized path too.

@href
Copy link
Member Author

href commented May 26, 2015

Agreed.

I think we can either change the path_info on the request after it has been initialised by WebOb. So right after this line.

Or we can change the PATH_INFO in the environment before Webob processes it.

I think in any case that path_info is what we're looking for, from glancing at Webob's source.

Not sure which approach is better.

href pushed a commit that referenced this issue Jun 4, 2015
@href
Copy link
Member Author

href commented Jun 4, 2015

I implemented it differently, adding it to traject.parse_path. It seems saner to do it there, than to muck around with the environment or the request's path_info.

This fixes the problem my original static files example had.

@href
Copy link
Member Author

href commented Jun 4, 2015

As always, let me know what you think. If all is okay I'll merge this into the master.

@faassen
Copy link
Member

faassen commented Jun 4, 2015

Fixing it in traject.parse_path means that people who use path_info directly still get to normalize this themselves, isn't it? What about tween code? It may be okay for us to do so -- there is a drawback to try to normalize stuff outside of webob, but that was my original motivation to do this earlier in the publisher.

@href
Copy link
Member Author

href commented Jun 4, 2015

That's a good point, I haven't thought of that. I'll move it up the stack.

@faassen
Copy link
Member

faassen commented Jun 4, 2015

Question: does this catch the whole issue that we started with, where '%2E' is sent?

I figured I'd read normpath. Some things for concern may be:

  • It converts the empty path into the '.'. Does this occur in Morepath situation? I see code that seems to handle this, but no test by itself at least here. I guess it trips up another test.
  • POSIX apparently allows and leaves alone '//'. What does this mean for us?
  • can we have a test where there are no initial slashes? I'd rather be sure...

@href
Copy link
Member Author

href commented Jun 4, 2015

I moved and refactored the code quite a bit. I got it a bit wrong with the git merge, but that's why this is in a separate branch ;). See https://github.com/morepath/morepath/compare/issue_329?expand=1 for a complete comparison.

Question: does this catch the whole issue that we started with, where '%2E' is sent?

Yes it does. %2E was never actually sent though, it was turned into dots, but some clients wouldn't normalize away these dots. Now we basically behave the way we expected the client to behave.

It converts the empty path into the '.'. Does this occur in Morepath situation? I see code that seems to handle this, but no test by itself at least here. I guess it trips up another test.

I added more tests that make sure never get a '.', but always a '/' as is seems to be the default.

POSIX apparently allows and leaves alone '//'. What does this mean for us?

Does it? I added tests in the updated version that say it doesn't ;)

can we have a test where there are no initial slashes? I'd rather be sure...

Do you mean like this ../site? That would exist.

@faassen
Copy link
Member

faassen commented Jun 4, 2015

Thanks. Looks read to merge to me!

@href
Copy link
Member Author

href commented Jun 4, 2015

Cool, I'll do that right away.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants