Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Confusing error with {} #545

Closed
CAD97 opened this issue Dec 17, 2018 · 1 comment
Closed

Confusing error with {} #545

CAD97 opened this issue Dec 17, 2018 · 1 comment

Comments

@CAD97
Copy link
Contributor

CAD97 commented Dec 17, 2018

I wrote the regex r"\\u{[^}]*}" which works (as \\u{[^}]*}) on regex101 under pcre, js, python, and go flavors. When parsing with this crate, it gives:


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
regex parse error:
    \\u{[^}]*}
        ^
error: decimal literal empty
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

"decimal literal empty" was not helpful in figuring out the problem, which was that regex_syntax expects {} to be a repetition, and the other mentioned regex engines silently fell back to matching a literal { and } when it wasn't a repetition. The correct unproblematic regex escapes the braces: r"\\u\{[^}]*\}".

This could be considered a bug or a suboptimal error depending on how you think this regex should be processed. I'd be perfectly happy if the error were to say something along the lines of "expected bounded repetition" here, rather than the current vague "decimal literal empty". (I would understand the error if it were {}, but with something other than } after the {, it's confusing.)

@BurntSushi
Copy link
Member

Yeah, the error message should definitely be improved here. The behavior does indeed match my intent. Specifically, I biased toward less implicitness in the syntax. That is, if something is a meta character and you want to use it as a literal, then it needs to be escaped. There are some exceptions to this, particularly, in character classes, e.g., []] and [-a-z], due to their prevalence. The thinking here is that if reading a { requires a human to go and interpret whether it "needs" to be escaped or not in order to determine whether it's a meta character or not, then the regex becomes harder to read.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants