Use RequestURI when redirecting to canonical path. #1331

Open
wants to merge 14 commits into
from

Projects

None yet

4 participants

@ericdreeves
Collaborator

Caddy may trim a request's URL path when it starts with the path that's
associated with the virtual host. This change uses the path from the request's
RequestURI when performing a redirect.

Fix issue #1327.

@ericdreeves ericdreeves Use RequestURI when redirecting to canonical path.
Caddy may trim a request's URL path when it starts with the path that's
associated with the virtual host. This change uses the path from the request's
RequestURI when performing a redirect.

Fix issue #1327.
d7241c3
caddyhttp/staticfiles/fileserver.go
+ // Since the path in u may have been trimmed by Caddy, and Redirect
+ // requires the full path, parse the request's RequestURI and use the
+ // path.
+ redirurl, err := url.ParseRequestURI(r.RequestURI)
@abiosoft
abiosoft Jan 8, 2017 Collaborator

let's rename this to redirURL or anything else that won't give lint/vet error.

@ericdreeves
ericdreeves Jan 8, 2017 Collaborator

Thanks for taking a look, @abiosoft. I've made the change.

@tw4452852
Collaborator

Maybe https also is the case.

@ericdreeves ericdreeves Rename redirurl to redirURL.
3dde870
@ericdreeves
Collaborator

@tw4452852 Acknowledged. I'm investigating.

ericdreeves added some commits Jan 8, 2017
@ericdreeves ericdreeves Redirect to the full URL.
The scheme and host from the virtual host's site configuration is used
in order to redirect to the full URL.
024833b
@ericdreeves ericdreeves Merge branch 'master' of github.com:mholt/caddy into issue_1327
c0931ba
@ericdreeves ericdreeves Merge branch 'master' into issue_1327
b60e038
@mholt

Thanks for working on this, really appreciate your help!

caddyhttp/httpserver/server.go
+ if vhostURL, err := url.Parse(vhost.Addr.String()); err == nil {
+ r.URL.Scheme = vhostURL.Scheme
+ r.URL.Host = vhostURL.Host
+ }
@mholt
mholt Jan 14, 2017 Owner

Can you add a comment explaining why this is necessary?

@ericdreeves
ericdreeves Jan 15, 2017 Collaborator

Will do!

caddyhttp/staticfiles/fileserver.go
+ }
+ toURL := r.URL.ResolveReference(redirURL)
+
+ http.Redirect(w, r, toURL.String(), http.StatusMovedPermanently)
@mholt
mholt Jan 14, 2017 Owner

Hmm, it's unfortunate that all this is required. Sorry about that. Since you've now worked with this a bit, do you have any recommendations for how we can make it easier/simpler?

@ericdreeves
ericdreeves Jan 15, 2017 Collaborator

My concern with changing fields in r.URL is that it might cause issues for other middleware Handlers that rely on them. Otherwise, only the RequestURI seems to have the original path. Let me know if that's not the case.

The check for "/" on redirURL.Path is probably redundant and can be removed.

I'll make that change and add the requested comment. Thanks for taking a look!

ericdreeves added some commits Jan 15, 2017
@ericdreeves ericdreeves Add comment and remove redundant check.
a14eb97
@ericdreeves ericdreeves Merge branch 'master' into issue_1327
da2fa95
@mholt
Owner
mholt commented Jan 15, 2017

I think AppVeyor is having trouble, not a problem with this PR:

ERR Error running script (call to f_5b8e3b60e050400311bc8b6ef2cbba37bc0b9cb3): @user_script:1: @user_script: 1: -OOM command not allowed when used memory > 'maxmemory'.
@mholt
Owner
mholt commented Jan 17, 2017 edited

@ericdreeves Just had an idea. What if Caddy didn't chop off the path in the request URI to adjust for the site's address in the Caddyfile, but instead set a field on the SiteConfig and then middlewares that want to access the file system based on the request path have to take that into account...? (Maybe we provide a method to make this easy? It's just a path.Join though. So maybe not needed.) I ask because I'm dealing with this issue on something else locally right now too.

We wouldn't have to do all these kinda ugly hacks -- which I appreciate you for diving into this. :) What do you think?

@tw4452852
Collaborator
tw4452852 commented Jan 17, 2017 edited

I don't think this is a good choice, because all exist middles will be changed if they use the r.URL.
Actually I have another way to handle this: if any middleware wants to redirect, it just returns the special status code (maybe http.StatusTemporaryRedirect) and saves the destination in the response header. After all the middleware chains return, we handle the redirection actually if any. (here is the proper place).

@mholt
Owner
mholt commented Jan 17, 2017

@tw4452852 That's an interesting idea!

So, two options:

  1. We don't chop off the URI if the site address contains a path (like we do now) -- instead, middlewares that want to access static files on disk just have to take into account the site's path. Slightly error prone, if middlewares don't think about that. They'll have to store what the base path is then do a TrimPrefix or something.

  2. Middlewares that want to redirect just set the Location header and return a 3xx status code. This is really quite elegant. However, will this break anything like errors or log or gzip middlewares? How does this work with the redir middleware? I guess there's no response body, unless the redir middleware writes a body because of a "meta tag" redirect...

I like Tw's idea most, but I'll have to give it some thought and also somebody should experiment to make sure it won't break anything. ๐Ÿ˜„

@ericdreeves
Collaborator

If we go with the second option, how would it solve the original issue? If the middleware has no way of knowing that the URL path was changed, what would it write to the Location header?

@tw4452852
Collaborator

@ericdreeves , Location header will only be the relative path, it will be joint with site root finally to make up a right redirection URL.

@tw4452852
Collaborator
tw4452852 commented Jan 18, 2017 edited

@mholt I think every middleware now should work with it w/o changes. It just delegates(defers) the default redirection action. If any middleware will do redirection itself, it should return 0 and the default redirection action won't run.

@ericdreeves
Collaborator

@tw4452852 That's what I figured. Thanks for confirming.

@mholt Once you've thought it over, I'd be happy to make the change. I should have some time later in the week to do it.

@mholt
Owner
mholt commented Jan 21, 2017

If we do go with 2, and we continue to trim the path like we do now, it only helps in the cases where redirects are performed. Are there any other situations where the full Path would be needed to be known? (Asking out loud here, because this is what I'm currently thinking over.)

@ericdreeves
Collaborator

@mholt The {path} request placeholder.

For example, this Caddyfile:

example.com:80/foo {
	tls off
	log / stdout "URL: {scheme}://{host}/{path}"
}

and this request:

curl -v -L http://example.com/foo
* Hostname was NOT found in DNS cache
*   Trying 172.21.0.2...
* Connected to example.com (172.21.0.2) port 80 (#0)
> GET /foo HTTP/1.1
> User-Agent: curl/7.38.0
> Host: example.com
> Accept: */*

produce this log:

example.com_1  | URL: http://example.com//
@ericdreeves
Collaborator

Once I discovered that the {path} placeholder is also affected I became convinced that redirects are a symptom of the underlying issue. Option 2 is indeed elegant but if Caddy continues to provide a middleware (like redir) the ability to do its own redirects then it should somehow have access to the full path.

@mholt
Owner
mholt commented Jan 22, 2017

Okay. So maybe we need to rethink this.

Maybe middlewares which access files from disk need to access the path portion of the address in the Caddyfile and trim it. We could provide a helper method (maybe?) in the httpserver package to do it for the middlewares, they just have to call that method instead of accessing r.URL.Path directly.

How does that sound?

If it sounds good, then what about fastcgi and proxy middlewares? Trim or not trim?

@ericdreeves
Collaborator
ericdreeves commented Jan 24, 2017 edited

I like the sound of that, for two reasons. First, middlewares can decide which path is appropriate for the task they're performing. It seems clear that there's a need for both. Second, providing a helper method should provide a better alternative than obtaining the full path through unsupported means (i.e. RequestURI ๐Ÿ˜).

I'll need to dig through the httpserver package a bit more to have a better understanding of where such a method might exist.

@mholt
Owner
mholt commented Jan 24, 2017

Sounds good. Thanks for picking up my slack @ericdreeves! Looking forward to seeing where this goes.

@ericdreeves
Collaborator

@mholt Do you think this comment is still accurate? My observation is that RawPath is set only when url.Parse() does it.

@mholt
Owner
mholt commented Jan 28, 2017

@ericdreeves From the docs on net/url, it does seem that Parse() is how that gets set. So if we can't guarantee that we call Parse(), it probably is no longer accurate.

@ericdreeves
Collaborator

Thanks for taking a look.

RawPath may be another option, then. The EscapedPath() method in url package ignores RawPath if it is not a valid escaping of Path, and since httpserver's ServeHTTP cleans Path via sanitizePath() it can't be assumed that RawPath would be consistent with Path anyway.

Storing the sanitized but untrimmed path in RawPath is a better alternative than using RequestURI. Also, it may be simpler than giving middlewares a helper method.

If it sounds good, then what about fastcgi and proxy middlewares? Trim or not trim?

The proxy middleware already looks for RawPath when creating its upstream request. In my initial testing I observed that the trimmed path affects the upstream request in the same fashion-perhaps proxy middleware would benefit without changes to the middleware so long as the change in behavior is correct.

One drawback is that using RawPath in this way may not be consistent with its typical use. And while Caddy isn't using RawPath extensively now, it may complicate future use.

We could provide a helper method (maybe?) in the httpserver package to do it for the middlewares, they just have to call that method instead of accessing r.URL.Path directly.

I've spent some time researching the httpserver package to determine how this would work however it's not clear. Perhaps you could elaborate when you have some time? Appreciate the help!

@mholt
Owner
mholt commented Jan 29, 2017 edited

Yikes, this is complicated. I will take a look at this as soon as I can, but I'm going to be out of town and away from the computer for a week starting Monday. Don't let me slow you down. :) Reduce this to the simplest possible way to think about the problem: separate sanitizing, trimming, and using the URL and consider using the Context value on the request if necessary. Ideally, we'll change as little as possible. We should do what makes the most semantic sense for each field in the URL. We want to keep things as simple and obvious for the middlewares as possible, but I understand if it can't be perfectly obvious or the same for every possible situation.

Sorry I can't be more help right now. I'm drifting to sleep under some cold medicine and am not quite in the right state of mind to be doing this... ha. I believe in you contributors! ๐Ÿ™Œ (Thank you very much for taking such care, too!)

ericdreeves added some commits Jan 29, 2017
@ericdreeves ericdreeves Store the original URL path in request context.
By storing the original URL path as a value in the request context,
middlewares can access both it and the sanitized path. The default
default FileServer handler will use the original URL on redirects.
9f08b77
@ericdreeves ericdreeves Merge branch 'master' of github.com:mholt/caddy into issue_1327
c51f199
@ericdreeves
Collaborator

Understand completely! Your reply was very helpful, as it reminded me of Context-so, thank you.

I've updated the PR to store the original URL path as a value in the request's context. If we agree on the approach then I'll see about subsequent PRs to fix the {path} request placeholder and proxy middleware.

ericdreeves added some commits Feb 6, 2017
@ericdreeves ericdreeves Merge branch 'master' into issue_1327
492d3c3
@ericdreeves ericdreeves Merge branch 'master' into issue_1327
6b4e2e0
@mholt

Most of my critiques are seeking clarification on the changes, plus one that we might have to wait for related to another PR. ๐Ÿ‘ Thanks for working on this and being so patient!

caddy.go
+// net/http source.
+type contextKey struct {
+ name string
+}
@mholt
mholt Feb 16, 2017 Owner

I wonder if this belongs in the httpserver package. (My PR, #1430 creates a type called CtxKey so maybe we should consider merging these implementations.) I like the simplicity of it just being a string... any reason to use a struct?

@mholt
mholt Feb 17, 2017 Owner

@ericdreeves FYI, I have merged the branch that contains httpserver.CtxKey so you can now use that.

@ericdreeves
ericdreeves Feb 18, 2017 Collaborator

Will do!

@ericdreeves
ericdreeves Feb 19, 2017 Collaborator

As is, the staticfiles package cannot use CtxKey because importing httpserver would result in an import cycle. My rationale for putting my contextKey type in the caddy package was not only to avoid the import cycle but also to make the predefined keys available to the caddytls package where HTTPChallengeHandler is implemented.

If you don't like the idea of putting CtxKey definition in the caddy package then how about a new package within caddyhttp? Then, httpserver and other middlewares (staticfiles, fastcgi, basicauth, etc.) could each import the package and have access to the predefined keys declared therein.

I have some local commits that prove it out, but wanted to get others' take before pushing.

@mholt
mholt Feb 21, 2017 Owner

@ericdreeves Ah, I didn't think about that. Yes, we can probably put CtxKey in the caddy package. Good idea. I don't see any negative consequences of that.

+ // from the virtual host's site address.
+ if vhostURL, err := url.Parse(vhost.Addr.String()); err == nil {
+ r.URL.Scheme = vhostURL.Scheme
+ r.URL.Host = vhostURL.Host
@mholt
mholt Feb 16, 2017 Owner

Is this just for convenience in upstream middlewares?

@ericdreeves
ericdreeves Feb 18, 2017 Collaborator

The URL in request r has neither the scheme nor the host.

http.Redirect will set the Location header to a path (not absolute URI) when
the provided urlStr is missing both.

Since the virtual host's site address has both, and the virtual host
configuration was matched to r.URL.Path, r.URL is updated so that invoking
http.Redirect will:

  1. Set the Location header to an absolute URI
  2. Redirect to a URI with the correct scheme
@mholt
mholt Feb 18, 2017 Owner

That's cool, but I don't think a fully qualified URL is needed; the path seems to do okay. I guess setting them isn't bad either. We can try it.

caddyhttp/staticfiles/fileserver.go
+ if !strings.HasSuffix(r.URL.Path, "/") {
+ toURL, _ := url.Parse(r.URL.String())
+
+ path, ok := r.Context().Value(caddy.URLPathContextKey).(string)
@mholt
mholt Feb 16, 2017 Owner

Hmm, do we want to use the context's path value here?

A Caddyfile like:

example.com/subpath
root /www/mysite

And then a request to example.com/subpath/test.html should load /www/mysite/test.html -- is that what you would expect? I think that's how it's currently implemented, anyway.

@ericdreeves
ericdreeves Feb 18, 2017 Collaborator

If it's guaranteed that no middleware will modify r.URL.Path then I agree that we can instead use context's path.

@mholt
mholt Feb 18, 2017 Owner

Pretty sure that can't be guaranteed. Is the context storing the original URI or the modified one?

@ericdreeves
ericdreeves Feb 18, 2017 Collaborator

I'll verify the behavior with the Caddyfile that you provided above.

@ericdreeves
ericdreeves Feb 18, 2017 Collaborator

The context stores the original URI.

@ericdreeves
ericdreeves Feb 19, 2017 Collaborator

Your Caddyfile does produce the expected behavior.

+ }
+ toURL.Path = strings.TrimSuffix(toURL.Path, "/")
+
+ http.Redirect(w, r, toURL.String(), http.StatusMovedPermanently)
@mholt
mholt Feb 16, 2017 Owner

The diff shows that this used to be Redirect(...) not http.Redirect(...). Why the change?

@ericdreeves
ericdreeves Feb 18, 2017 Collaborator

staticfiles.Redirect does not convert to absolute URI.

While refreshing my memory regarding the reason for this change, I was reminded that the browse package uses staticfiles.Redirect for a similar reason: the path refers to a directory but doesn't have a trailing '/'. We should investigate whether the browse package relies on always redirecting to a relative path.

@ericdreeves
Collaborator

Thanks for being patient as well! I've provided some details that I hope address your questions. I have some behavior to verify, and a conflict to resolve. I should be able to address both over the weekend.

ericdreeves added some commits Feb 22, 2017
@ericdreeves ericdreeves Merge branch 'master' of github.com:mholt/caddy into issue_1327 bf2d7b3
@ericdreeves ericdreeves Replace contextKey type with CtxKey.
In addition to moving the CtxKey definition to the caddy package, this
change updates the CtxKey references in the httpserver, fastcgi, and
basicauth packages.
cb4f7f9
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment