Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhelpful Error Messages When Trying to Compile UTF16 Files #73979

Closed
JakobDegen opened this issue Jul 3, 2020 · 5 comments · Fixed by #81856
Closed

Unhelpful Error Messages When Trying to Compile UTF16 Files #73979

JakobDegen opened this issue Jul 3, 2020 · 5 comments · Fixed by #81856
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST. C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@JakobDegen
Copy link
Contributor

JakobDegen commented Jul 3, 2020

Saving a Hello World program as UTF-16:

fn main(){
    println!("Hello World!");
}

and trying to compile it causes about as many errors as there are characters in the file complaining of unknown start of token: \u{0}. Instead, using some heuristics to determine that the file is saved as UTF16 and printing a more helpful error message would be much friendlier to new users who are most likely to run into this issue.

Meta

rustc --version --verbose:

rustc 1.46.0-nightly (50fc24d8a 2020-06-25)
binary: rustc
commit-hash: 50fc24d8a172a853b5dfe40702d6550e3b8562ba
commit-date: 2020-06-25
host: x86_64-unknown-linux-gnu
release: 1.46.0-nightly
LLVM version: 10.0
@JakobDegen JakobDegen added the C-bug Category: This is a bug. label Jul 3, 2020
@LeSeulArtichaut LeSeulArtichaut added A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST. C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. and removed C-bug Category: This is a bug. labels Jul 3, 2020
@Stupremee
Copy link
Member

@rustbot claim

@rustbot rustbot self-assigned this Jul 3, 2020
@Stupremee
Copy link
Member

There's still a question. Should rustc throw an error and exit, or read the file using UTF 16?

@JakobDegen
Copy link
Contributor Author

@Stupremee I do not think rustc should parse the file as UTF-16, as this would introduce an ambiguity. I opened this issue only to address the poor UX of the current error messages.

@Stupremee
Copy link
Member

@rustbot release-assignment
This seems more complicated than expected, because I dont know any reliable way to check if the file is utf-16.

@rustbot rustbot removed their assignment Jul 5, 2020
@tesuji
Copy link
Contributor

tesuji commented Jul 6, 2020

Maybe add a help message when output error "unknown start of token: \u{0}".

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Feb 27, 2021
Suggest character encoding is incorrect when encountering random null bytes

This adds a note whenever null bytes are seen at the start of a token unexpectedly, since those tend to come from UTF-16 encoded files without a [BOM](https://en.wikipedia.org/wiki/Byte_order_mark) (if a UTF-16 BOM appears it won't be valid UTF-8, but if there is no BOM it be both valid UTF-16 and valid but garbled UTF-8). This approach was suggested in rust-lang#73979 (comment).

Closes rust-lang#73979.
@bors bors closed this as completed in be3d1eb Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-parser Area: The parsing of Rust source code to an AST. C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants