New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If rewrite expression not a REGEX, should caddy assume it is a file path? #2219

Closed
kjk opened this Issue Jul 9, 2018 · 11 comments

Comments

Projects
None yet
5 participants
@kjk
Copy link

kjk commented Jul 9, 2018

Using latest caddy on mac from homebrew:

$ caddy -version
Caddy 0.11.0 (unofficial)

It seems like at some point a change in Caddy broke parsing of Caddyfile (I upgraded Caddy but not Caddyfile).

But this bug is about some error messages from parsing Caddyfile not being helpful:

$ caddy -validate
2018/07/09 13:38:40 error parsing regexp: invalid nested repetition operator: `++`

The error message should be more verbose like: error parsing Caddyfile line ${lineNo} '${lineContent', error parsing regexp...

As it stands today, I had to guess that the error was from parsing Caddyfile and then had to guess which part of the file is incorrect.

Good error message are already happening for some parsing errors e.g. with a line like:

rerite "/article/11t/c-interfaces-and-implementations.html" "/blog/11t.html"

I get a good error message:

2018/07/09 13:49:55 Caddyfile:384 - Error during parsing: Unknown directive 'rerite'

As to parsing error itself, it looks like a bug.

The clue is info about ++. The only instance of ++ in the file is:

rewrite "/tag/c++" "/blog/archives-by-tag-cplusplus.html"

Removing this line fixes the issue.

The doc says that in rewrite from to, from is either an exact path or regular expression.

If parsing from as regular expression fails, it should be considered an exact path, not an error.

@mholt

This comment has been minimized.

Copy link
Owner

mholt commented Jul 9, 2018

If parsing from as regular expression fails, it should be considered an exact path, not an error.

Hmm, I disagree: I think errors should be explicit. @abiosoft does rewrite return parse errors from the dispenser when parsing regular expressions?

As for the missing line numbers, that's strange -- we used to have line numbers on parse errors... 🤔

@kjk

This comment has been minimized.

Copy link
Author

kjk commented Jul 9, 2018

Note that this makes codegen of Caddyfiles exponentially harder and leaves a trap that almost everyone will fall into (if they happen to have a path that also is an invalid regular expression).

In my case those lines were generated by a program. To make this generation robust I would have to add code "if my fixed path is also a regular expression, then instead of generating obvious rewrite rule, generate a more complicated rule that Caddy won't reject".

Caddy doesn't advertise or guarantee any particular syntax for regexpes so this code would only work against Caddy version I reverse-engineered to find out which regexp engine is used and pray it'll never change.

Reading the docs it's unclear to me how I would even write this in a way that Caddy would accept.

Reading the docs I would never anticipate this problem because current wording is incomplete and misleading:

from is the exact path to match or a regular expression to match

I interpret "or" as "both are supported and Caddy will pick one that works".

I provided a valid exact path. Caddy chose to interpret it as a regular expression and then failed because it's not parseable as a regular expression.

This is not "or".

It could be a warning logged but should not be a fatal error.

I get this is a corner case but it should at least be documented and an explicit example of providing only an exact path should be provided in the docs because it looks like the only work-around for this is a complicated rewrite with an if expression.

@francislavoie

This comment has been minimized.

Copy link
Collaborator

francislavoie commented Jul 9, 2018

Should be easy to escape the string like this: "/tag/c\+\+"

I don't disagree with your points though.

@kjk

This comment has been minimized.

Copy link
Author

kjk commented Jul 9, 2018

Not to mention this is a backwards-incompatible change to a core functionality.

This seems to be caused by 9fe2ef4

@francislavoie

This comment has been minimized.

Copy link
Collaborator

francislavoie commented Jul 9, 2018

https://stackoverflow.com/a/695467

+ is not a URL-safe char. Rather, it's a reserved one meaning a space in query arguments.

@kjk

This comment has been minimized.

Copy link
Author

kjk commented Jul 9, 2018

@francislavoie If I knew the regular syntax in question, I'm pretty sure it's an NP problem. Since I don't know it, it's an impossible problem.

And caddy is more than happy to accept '+' otherwise, for example this is valid:

rewrite "/articles/go-cookbook.html+" "/book/go-cookbook.html"
@francislavoie

This comment has been minimized.

Copy link
Collaborator

francislavoie commented Jul 9, 2018

Re: BC break, quoting from @abiosoft #2082 (comment)

it is not 100% backward compatible, from is now with regex support.

My argument is that the one-liner should be a sensible default and people should only open a config block for more control, but what is currently happening is no one is using the one liner and many people open a config block to only use regex.

Another suggestion, since you're autogenerating these, switch to this syntax for each rewrite:

rewrite "/tag/c++" {
    to "/blog/archives-by-tag-cplusplus.html"
}
@kjk

This comment has been minimized.

Copy link
Author

kjk commented Jul 9, 2018

Finally, let me appeal to your own self-interest.

Other people will hit this problem because what I did is both intuitive and correct based on my reading of rewrite docs.

Some of those people will come back here to file bugs or to support forum asking why this doesn't work.

It'll come back to you as never ending support burden.

You can keep explaining why it works this way or you can make a simple fix to conform to people's expectations.

@abiosoft

This comment has been minimized.

Copy link
Collaborator

abiosoft commented Jul 9, 2018

Not to mention this is a backwards-incompatible change to a core functionality.

@kjk This is intended as it serves wider audience. I believe the breaking change was communicated in the release. In fact, you are the first real user of the exact match I have come across, the opposite has always been the case, people wanting regex with claims that the exact match is too limiting.

If you consider that converting any path to exact match is by surrounding with ^$ and the fact that most users prefer regex, you'd realise the decision is made with users in consideration.

I agree that the documentation can be better. The following should work for your example. What makes your case a bit challenging is due to the + character.

rewrite "^/tag/c\+\+$" "/blog/archives-by-tag-cplusplus.html"

As mentioned, + is actually not a safe character for URL. You probably have your reasons for this but I think it is safer to use something else.

@kjk

This comment has been minimized.

Copy link
Author

kjk commented Jul 9, 2018

I assume that by "serving a wider audience" you mean changing allowing from clause to also accept a regex. I'm 100% for that. In fact, when I was writing my Caddy generation for 0.10 I wanted that feature, given past familiarity with nginx and apache.

What I don't see how it serves anyone that it's a fatal error when from is a valid exact path (and all strings are that) but not a valid regex.

The doc says "A or B". I gave a valid A. Caddy choses to interpret it as B but then rejects it because it doesn't validate as B.

While I appreciate explanations and work-around ideas, given current docs I could not have anticipated it and given current diagnostics I could not have figured what and why.

Involving 3 people to educate everyone who hits this problem and is persistent enough to investigate and file a bug is not scalable.

To fix that you can improve docs (explicitly mention this corner case, document a reliable way to specify an exact path rewrite rule), improve diagnostic (this error message should be something like "from clause ${s} in rewrite rule is not a valid regular expression. If you intended this to be an exact path, do ${y} instead").

Or you can change the code to be:

if !validRegexp(from) {
   // assume from is an exact path
}

I don't see what calculus would lead to choosing the first option.

@tobya tobya changed the title Better diagnostics for failure to parse Caddyfile If rewrite expression not a REGEX, should caddy assume it is a file path? Jul 11, 2018

@tobya tobya added the discussion label Sep 20, 2018

@tobya

This comment has been minimized.

Copy link
Collaborator

tobya commented Sep 20, 2018

I will close this now but please feel free to continue the discussion.

@tobya tobya closed this Sep 20, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment