ICE on "lexer accepted unterminated literal with trailing slash" #62913

dwrensha · 2019-07-23T22:31:09Z

I'm getting an internal compiler on the following input (found by fuzz-rustc):

"\u\\"

error: incorrect unicode escape sequence
 --> main.rs:1:2
  |
1 | "\u\\"
  |  ^^^ incorrect unicode escape sequence
  |
  = help: format of unicode escape sequences is `\u{...}`

thread 'rustc' panicked at 'lexer accepted unterminated literal with trailing slash', src/libsyntax/parse/unescape_error_reporting.rs:194:13
stack backtrace:
   0: std::panicking::default_hook::{{closure}}
   1: std::panicking::default_hook
   2: rustc::util::common::panic_hook
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic
   5: syntax::parse::unescape_error_reporting::emit_unescape_error
   6: syntax::parse::unescape::unescape_str
   7: syntax::parse::lexer::StringReader::try_next_token
   8: syntax::parse::lexer::StringReader::next_token
   9: syntax::parse::lexer::tokentrees::TokenTreesReader::parse_all_token_trees
  10: syntax::parse::lexer::tokentrees::<impl syntax::parse::lexer::StringReader>::into_token_trees
  11: syntax::parse::maybe_file_to_stream
  12: syntax::parse::maybe_source_file_to_parser
  13: syntax::parse::source_file_to_parser
  14: syntax::parse::parse_crate_from_file
  15: rustc_interface::passes::parse::{{closure}}
  16: rustc::util::common::time
  17: rustc_interface::passes::parse
  18: rustc_interface::queries::Query<T>::compute
  19: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::parse
  20: rustc_interface::interface::run_compiler_in_existing_thread_pool
  21: std::thread::local::LocalKey<T>::with
  22: syntax::with_globals
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
query stack during panic:
end of query stack
error: aborting due to previous error


error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.38.0-nightly (e649e9034 2019-07-22) running on x86_64-apple-darwin

I'm seeing the error on nightly, beta, and stable.

The text was updated successfully, but these errors were encountered:

@petrochenkov

Always emit trailing slash error Fix rust-lang#62913. r? @petrochenkov

Note that from_token_lit was looking for errors but never finding them! - issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913 was about an ICE due to an unterminated string literal, so the new version should be good enough. - literals-are-validated-before-expansion.rs: this tests exactly the behaviour that has been changed. XXX: insert a new test covering more of that

Note that from_token_lit was looking for errors but never finding them! - issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913 was about an ICE due to an unterminated string literal, so the new version should be good enough. - literals-are-validated-before-expansion.rs: this tests exactly the behaviour that has been changed. XXX: insert a new test covering more of that - XXX: explain the tests that needed to be split - XXX: tests/ui/parser/unicode-control-codepoints.rs: just reordered errors - XXX: tests/rustdoc-ui/ignore-block-help.rs: relies on a parsing error occurring. The error present was an unescaping error, which is now delayed to after parsing. So the commit changes it to an "unterminated character literal" error which still occurs during parsing.

Currently string literals are unescaped twice. - Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`. This one just checks for errors. - Again in `LitKind::from_token_lit`, which is mostly called when lowering AST to HIR, but also in a few other places during expansion. This one actually constructs the unescaped string. It also has error checking code, but that code handling the error cases is actually dead (and has several bugs) because the check during lexing catches all errors! This commit removes the checking during lexing, and fixes up `LitKind::from_token_lit` so it properly does both checking and construction. This is a language change: some programs now compile that previously did not. For example, it is now possible for macros to be passed "invalid" string literals like "\a\b\c". This is a continuation of a trend of delaying semantic error checking of literals to after expansion, e.g. rust-lang#102944 did this for some cases for numeric literals, and the detection of NUL chars in C string literals is already delayed in this way. XXX: have Session::report_lit_errors? XXX: have LitKind::from_token_lit so you don't need the .0? Things to note: - `LitError` has a new `EscapeError` variant. - `LitKind::from_token_lit`'s return value changed, to produce multiple errors/warnings, and also to handle lexer warnings. This latter case is annoying but necessary to preserve existing warning behaviour. - `report_lit_error` becomes `report_lit_errors`, in order to handle multiple errors in a single string literal. Notes about test changes: - `tests/rustdoc-ui/ignore-block-help.rs`: this relies on a parsing error occurring. The error present was an unescaping error, which is now delayed to after parsing. So the commit changes it to an "unterminated character literal" error which continues to occurs during parsing. - Several tests had unescaping errors combined with unterminated literal errors. The former are now delayed but the latter remain as lexing errors. So the unterminated literal part needed to be split into a separate test file otherwise compilation would end before the other errors were reported. - issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913 was about an ICE due to an unterminated string literal, so the new version should be good enough. - literals-are-validated-before-expansion.rs: this tests exactly the behaviour that has been changed, and so was removed XXX: insert a new test covering more of that - A couple of other test produce the same errors, just in a different order.

XXX: need more tests Currently string literals are unescaped twice. - Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`. This one just checks for errors. - Again in `LitKind::from_token_lit`, which is called when lowering AST to HIR, and also in a few other places during expansion. This one actually constructs the unescaped string. It also has error checking code, but that part of the code is actually dead (and has several bugs) because the check during lexing catches all errors! This commit removes the error-check-only unescaping during lexing, and fixes up `LitKind::from_token_lit` so it properly does both checking and construction. This is a backwards-compatible language change: some programs now compile that previously did not. For example, it is now possible for macros to consume "invalid" string literals like "\a\b\c". This is a continuation of a trend of delaying semantic error checking of literals to after expansion: - rust-lang#102944 did this for some cases for numeric literals - The detection of NUL chars in C string literals is already delayed in this way. Notes about test changes: - `tests/rustdoc-ui/ignore-block-help.rs`: this requires a parse error occurring. The error used was an unescaping error, which is now delayed to after parsing. So the commit changes it to an "unterminated character literal" error which still occurs during parsing. - Several tests had unescaping errors combined with unterminated literal errors. The former are now delayed but the latter remain as lexing errors. So the unterminated literal part needed to be split into a separate test file otherwise compilation would end before the other errors were reported. - issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913 was about an ICE due to an unterminated string literal, so the new version should be good enough. - literals-are-validated-before-expansion.rs: this tests exactly the behaviour that has been changed, and so was removed - A couple of other test produce the same errors, just in a different order.

Currently string literals are unescaped twice. - Once during lexing in `cook_quoted`/`cook_c_string`/`cook_common`. This one just checks for errors. - Again in `LitKind::from_token_lit`, which is called when lowering AST to HIR, and also in a few other places during expansion. This one actually constructs the unescaped string. It also has error checking code, but that part of the code is actually dead (and has several bugs) because the check during lexing catches all errors! This commit removes the error-check-only unescaping during lexing, and fixes up `LitKind::from_token_lit` so it properly does both checking and construction. This is a backwards-compatible language change: some programs now compile that previously did not. For example, it is now possible for macros to consume "invalid" string literals like "\a\b\c". This is a continuation of a trend of delaying semantic error checking of literals to after expansion: - rust-lang#102944 did this for some cases for numeric literals - The detection of NUL chars in C string literals is already delayed in this way. Notes about test changes: - `ignore-block-help.rs`: this requires a parse error for the test to work. The error used was an unescaping error, which is now delayed to after parsing. So the commit changes it to an "unterminated character literal" error which still occurs during parsing. - `tests/ui/lexer/error-stage.rs`: this shows the newly allowed cases, due to delayed literal unescaping. - Several tests had unescaping errors combined with unterminated literal errors. The former are now delayed but the latter remain as lexing errors. So the unterminated literal part needed to be split into a separate test file otherwise compilation would end before the other errors were reported. - issue-62913.rs: The structure and output changed a bit. Issue rust-lang#62913 was about an ICE due to an unterminated string literal, so the new version should be good enough. - literals-are-validated-before-expansion.rs: this tests exactly the behaviour that has been changed, and so was removed - A couple of other test produce the same errors, just in a different order.

estebank self-assigned this Jul 23, 2019

estebank mentioned this issue Jul 23, 2019

Always emit trailing slash error #62917

Merged

Centril added a commit to Centril/rust that referenced this issue Jul 24, 2019

Rollup merge of rust-lang#62917 - estebank:trailing-slash, r=matklad

c44e29b

Always emit trailing slash error Fix rust-lang#62913. r? @petrochenkov

bors closed this as completed in #62917 Jul 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICE on "lexer accepted unterminated literal with trailing slash" #62913

ICE on "lexer accepted unterminated literal with trailing slash" #62913

dwrensha commented Jul 23, 2019 •

edited

ICE on "lexer accepted unterminated literal with trailing slash" #62913

ICE on "lexer accepted unterminated literal with trailing slash" #62913

Comments

dwrensha commented Jul 23, 2019 • edited

dwrensha commented Jul 23, 2019 •

edited