Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ICE when combining "\r", unicode, and unused parameter in format string #70381

Closed
Reconcyl opened this issue Mar 25, 2020 · 12 comments · Fixed by #75972
Closed

ICE when combining "\r", unicode, and unused parameter in format string #70381

Reconcyl opened this issue Mar 25, 2020 · 12 comments · Fixed by #75972
Labels
A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-help-wanted Call for participation: Help is requested to fix this issue. glacier ICE tracked in rust-lang/glacier. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Reconcyl
Copy link

Reconcyl commented Mar 25, 2020

The following code causes an ICE:

fn main() {
    print!("\r¡{}");
}

As far as I can tell:

  • Any number of characters can be inserted between \r and ¡ and the error will still happen.
  • ¡ can be replaced with any unicode character and the error will still happen.
  • No escape sequences other than \r cause the error.

The error is reproducible on the stable or nightly compiler at https://play.rust-lang.org.

rustc --version --verbose outputs the following on my machine:

rustc 1.40.0 (73528e339 2019-12-16)
binary: rustc
commit-hash: 73528e339aae0f17a15ffa49a8ac608f50c6cf14
commit-date: 2019-12-16
host: x86_64-apple-darwin
release: 1.40.0
LLVM version: 9.0

Error output

thread 'rustc' panicked at 'assertion failed: bpos.to_u32() >= mbc.pos.to_u32() + mbc.bytes as u32', src/libsyntax/source_map.rs:875:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.

error: internal compiler error: unexpected panic

note: the compiler unexpectedly panicked. this is a bug.

note: we would appreciate a bug report: https://github.com/rust-lang/rust/blob/master/CONTRIBUTING.md#bug-reports

note: rustc 1.40.0 (73528e339 2019-12-16) running on x86_64-apple-darwin
Backtrace

stack backtrace:
   0: <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt
   1: core::fmt::write
   2: std::io::Write::write_fmt
   3: std::panicking::default_hook::{{closure}}
   4: std::panicking::default_hook
   5: rustc_driver::report_ice
   6: std::panicking::rust_panic_with_hook
   7: std::panicking::begin_panic
   8: syntax::source_map::SourceMap::bytepos_to_file_charpos
   9: syntax::source_map::SourceMap::lookup_char_pos
  10: syntax::source_map::SourceMap::span_to_filename
  11: <syntax::source_map::SourceMap as rustc_errors::SourceMapper>::call_span_if_macro
  12: rustc_errors::emitter::Emitter::fix_multispan_in_std_macros
  13: rustc_errors::emitter::Emitter::fix_multispans_in_std_macros
  14: <rustc_errors::emitter::EmitterWriter as rustc_errors::emitter::Emitter>::emit_diagnostic
  15: rustc_errors::HandlerInner::emit_diagnostic
  16: rustc_errors::diagnostic_builder::DiagnosticBuilder::emit
  17: syntax_ext::format::expand_preparsed_format_args
  18: syntax_ext::format::expand_format_args_impl
  19: <F as syntax_expand::base::TTMacroExpander>::expand
  20: syntax_expand::expand::MacroExpander::fully_expand_fragment
  21: syntax_expand::expand::MacroExpander::expand_crate
  22: rustc_interface::passes::configure_and_expand_inner::{{closure}}
  23: rustc_interface::passes::configure_and_expand_inner
  24: rustc_interface::passes::configure_and_expand::{{closure}}
  25: rustc_data_structures::box_region::PinnedGenerator<I,A,R>::new
  26: rustc_interface::queries::Query<T>::compute
  27: rustc_interface::queries::<impl rustc_interface::interface::Compiler>::expansion
  28: rustc_interface::interface::run_compiler_in_existing_thread_pool
  29: std::thread::local::LocalKey<T>::with
  30: scoped_tls::ScopedKey<T>::set
  31: syntax::with_globals
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

@Reconcyl Reconcyl added C-bug Category: This is a bug. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Mar 25, 2020
@jonas-schievink jonas-schievink added A-parser Area: The parsing of Rust source code to an AST. I-nominated labels Mar 25, 2020
@Centril Centril added the regression-from-stable-to-stable Performance or correctness regression from one stable version to another. label Mar 25, 2020
@Centril
Copy link
Contributor

Centril commented Mar 25, 2020

Godbolt says this regressed in 1.29.0.

@spastorino spastorino added P-low Low priority and removed I-nominated labels Mar 25, 2020
@spastorino
Copy link
Member

pre-triage: this doesn't seem to be very important but would be of course nice to fix. Tagging it as P-low.

@Centril Centril added E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-help-wanted Call for participation: Help is requested to fix this issue. labels Mar 25, 2020
@moonheart08
Copy link
Contributor

@rustbot claim
Time to try an easy issue

@rustbot rustbot self-assigned this Mar 25, 2020
@rust-lang-glacier-bot rust-lang-glacier-bot added the glacier ICE tracked in rust-lang/glacier. label Mar 27, 2020
@kfitch
Copy link

kfitch commented Apr 4, 2020

I tried a few things and I will document what I found, so hopefully this can jumpstart some ideas:

I removed the assert that was initiating the panic, and I see:

error: 1 positional argument in format string, but no arguments were given
 --> ../test.rs:2:15
  |
2 |     print!("\r¡{}");
  |               ^^

Notice the OBOE on the positioning of the ^^

So, I tried different escape sequences, and got mildly confusing results

error: 1 positional argument in format string, but no arguments were given
 --> ../test.rs:2:24
  |
2 |     print!("\u{1234}¡{}");
  |                        ^^

Here we have an off by two error in the other direction

Now, I tried the unicode escape that should (assuming I didn't misremember something) be equivalent to \r:

 --> ../test.rs:2:22
  |
2 |     print!("\u{000d}¡{}");
  |                      ^^

And that seems to work just fine.

So, I suspect we somehow interpret escape sequences as the character they represent, but not other times. If this causes us to be offcut in the middle of a UTF-8 multibyte sequence, then we ICE, but there are cases where the ^^ will end up in the wrong spot even if we don't ICE.

EDIT: a few more observations

Turns out the OBOE can be seen with just old fashioned ASCII (no multi-byte utf-8 characters needed):

--> <source>:3:16
  |
3 |     println!("\r{}");
  |                ^^
 --> <source>:3:25
  |
3 |     println!("\u{1234}{}");
  |                         ^^

In addition, springboarding off what @Centril found, going from 1.28.0 to.1.29.0 the ^^ in the error messages got a lot more specific, e.g. in 1.28.0 we see:

--> <source>:3:5
  |
3 |     println!("\r{}");
  |     ^^^^^^^^^^^^^^^^^

@kfitch
Copy link

kfitch commented Apr 6, 2020

After a bunch of debug prints, and counting bytes, I am now convinced that the code in the function that is panicing, is not the problem. The inputs to that function are faulty. In particular the BytePos is OBO, potentially putting it in the middle of a UTF-8 sequence. I chased down the source of the bad data, and I think it is from find_skips in src/librustc_builtin_macros/format.rs.
The code appears to be computing the difference between how many bytes it takes in the source to represent an escape sequence vs how many the interpreted value takes. There are a number of match statements that seem to leave \r out. Adding them seems to fix the original problem.
In addition the code for dealing with \u doesn't seem to take into account the varying number bytes the UTF-8 value will take. This seems to line up with what I was seeing.
Also, I suspect the \x escape might need some tweaking for code points >= 0x80

I forked and branched with what I have done so far: https://github.com/kfitch/rust/tree/issue-70381-escape-sequence-ice

@moonheart08
Copy link
Contributor

@rustbot unclaim
I've been too distracted, and it seems @kfitch has done most of the bug chasing

@amadeusine
Copy link

@rustbot claim
I'd like to take this up if this is still open.

@rustbot rustbot assigned rustbot and unassigned rustbot Apr 8, 2020
@Dylan-DPC-zz
Copy link

It is open, go ahead :)

@kfitch
Copy link

kfitch commented Apr 8, 2020

@amadeusine , FYI what is in my branch:
https://github.com/kfitch/rust/tree/issue-70381-escape-sequence-ice
seems to solve the \r issue just fine, but does not address the \u{} issue at all. My quick dirty attempts at that failed. You are welcome to leverage off of my stuff if you are taking this over. This has just been a fun distraction when I have time, but I can't reliably dedicate time to it.

Also, I have not addressed unit tests at all yet. Also, I am beginning to suspect there may be a larger (yet subtle) underlying confusion somewhere else in the code about bytes vs characters. The find_skips function I just updated has comments talking about characters, but data derived from what it generates is later fed into bytepos_to_file_charpos (in source_map.rs) where we are dealing with bytes. So, perhaps there is a simplification if we can always deal with either just one of bytes or chars (and avoid any conversions). Or, on the other hand there may be a lot of comments that could use clarification.

@agathazeren
Copy link

I have also found that this off by n error will occur with any unicode character whose display width is non-standardized and non one.

  |
2 |     print!("𒀿{}")
  |              ^^

I have not yet checked the source to try to fix the issue, but I have worked with unicode_width in my own code, and as it is a compiler dependency is probably the source of this problem. The largest issue with this is as far as I could find there is no standardization for the display width of these characters.

@Alexendoo
Copy link
Member

Hi, are you still working on this issue @amadeusine?

@Alexendoo
Copy link
Member

@rustbot release-assignment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-parser Area: The parsing of Rust source code to an AST. C-bug Category: This is a bug. E-easy Call for participation: Easy difficulty. Experience needed to fix: Not much. Good first issue. E-help-wanted Call for participation: Help is requested to fix this issue. glacier ICE tracked in rust-lang/glacier. I-ICE Issue: The compiler panicked, giving an Internal Compilation Error (ICE) ❄️ P-low Low priority regression-from-stable-to-stable Performance or correctness regression from one stable version to another. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.