Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better Error Message When Parsing Greek Question Mark (and similar confusing characters) #25957

Closed
Havvy opened this issue Jun 2, 2015 · 6 comments · Fixed by #29837
Closed
Labels
A-diagnostics Area: Messages for errors, warnings, and lints E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.

Comments

@Havvy
Copy link
Contributor

Havvy commented Jun 2, 2015

Current:

fn main() {
    let a = 5;
}
<anon>:2:14: 2:15 error: unknown start of token: \u{37e}
<anon>:2     let a = 5;
                      ^
playpen: application terminated with error code 101

Do you know what the error message is here? It's that I'm using the Greek Question Mark instead of a semicolon, causing a parse error. And while there is an error message, it's extremely opaque.

A better error message would actually print the token that is in error directly in the error message, e.g. as @niconii suggsted:

error: expected one of `.`, `;`, or an operator, found unicode character U+37E `;`

Using the human readable U+37E over the Rust encoding thing (\u{37E}) would definitely help.

In the specific case of the Greek Question Mark, it looks exactly like a semicolon, and symbols that look like other symbols are pretty common in Unicode, but perfectly enumerable. If we had a table of unicode point to actually used symbol, we could display an even better notice message:

error: expected one of `.`, `;`;, or an operator, found unicode character U+37E `;`

note: Unicode character U+37E (Greek Question Mark) looks like a semicolon, but is not.
@sfackler sfackler added the A-diagnostics Area: Messages for errors, warnings, and lints label Jun 2, 2015
@DanielKeep
Copy link
Contributor

Useful resource might be the list of Unicode "confusables": http://www.unicode.org/Public/security/revision-06/confusables.txt

Just going on printable ASCII punctuation, the list contains 486 possibly confusing glyphs. For reference, the command I used to work that out was:

$ egrep -v '^[ ]*#' confusables.txt | egrep -v '^[ ]+$|^$' | egrep '00(2[1-9A-F]|3[A-F]|40|5[BCDF]|7[BCD])' | egrep ';[^0-9A-F]00.. ;' | egrep -v '^00' > confusables-ascii-punc.txt

@Manishearth
Copy link
Member

Note that the error is in the lexer, not parser, so errors of the type "expected ... found" are right out. The parser only knows what to expect once everything has been converted into tokens.

The most we can do here is make it detect the confusable unicode characters.

@Manishearth Manishearth added the E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. label Nov 14, 2015
@Manishearth
Copy link
Member

Willing to mentor this.

You basically need to maintain an array of confusable characters and check against them where this error is emitted in src/libsyntax/parse/lexer/mod.rs

The error would finally look something like:

<anon>:2:14: 2:15 error: unknown start of token: \u{37e}
<anon>:2     let a = 5;
                      ^
<anon>:2:14: 2:15 note: Unicode character U+37E (Greek Question Mark) looks like a semicolon, but is not.
<anon>:2     let a = 5;
                      ^

@wafflespeanut
Copy link
Contributor

I'll take it!

@Manishearth
Copy link
Member

Let me lknow if you need help or clarification!

@pickfire
Copy link
Contributor

image

I think it may still not be good enough, maybe we should have a clearer suggestion to change it to semicolon? Changing the wording would be good enough probably.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants