Wrong part colored in diagnostic suggestions after tabs #87972

m-ou-se · 2021-08-12T14:50:45Z

trait T {
	const A: &u8;
}

(The second line starts with a tab, not with spaces!)

This results in:

It suggests to add a 'a after the &, but the green color is not applied to the 'a part, but instead to A: &, shifted a bit to the left.

cc @estebank

The text was updated successfully, but these errors were encountered:

m-ou-se · 2021-08-12T14:59:20Z

The problem is only with tabs, it seems. It works fine with multi-byte, multi-codepoint, and double-width characters.

estebank · 2021-08-12T15:09:57Z

It works ok when it is a single line, so the problem is likely in the following code not accounting for calls to replace_tabs:

rust/compiler/rustc_errors/src/emitter.rs

Lines 1679 to 1689 in 6bed1f0

    
           if is_multiline { 
        
               for SubstitutionHighlight { start, end } in parts { 
        
                   buffer.set_style_range( 
        
                       row_num, 
        
                       max_line_num_len + 3 + start, 
        
                       max_line_num_len + 3 + end, 
        
                       Style::Addition, 
        
                       true, 
        
                   ); 
        
               } 
        
           }

error[E0106]: missing lifetime specifier
 --> f1.rs:2:11
  |
2 |     const A: &u8;
  |              ^ expected named lifetime parameter
  |
help: consider using the `'static` lifetime
  |
2 |     const A: &'static u8;
  |               +++++++
help: consider introducing a named lifetime parameter
  |
1 ~ trait T<'a> {
2 ~     const A: &'a u8;
  |

error[E0106]: missing lifetime specifier
 --> f1.rs:4:20
  |
4 | trait K { const A: &u8; }
  |                    ^ expected named lifetime parameter
  |
help: consider using the `'static` lifetime
  |
4 | trait K { const A: &'static u8; }
  |                     +++++++
help: consider introducing a named lifetime parameter
  |
4 | trait K<'a> { const A: &'a u8; }
  |        ++++            ~~~

m-ou-se · 2021-08-12T16:08:14Z

Oh, multi-{byte,column,codepoint} characters do not work fine in a single line:

trait T {
    const BLÅHAJ /* 👩‍❤️‍💋‍👩 */: &u8;
}

m-ou-se · 2021-08-12T16:10:05Z

The second line with the ^ or ++++ not aligning might be unactionable, because it depends a lot on the terminal how wide things are:

But at least the green coloring of &'stati can be fixed.

estebank · 2021-08-12T17:53:33Z

The underline not aligning due to wide grapheme clusters is a known issue, and somewhat of an impossible problem to solve in a way that covers all terminals, but the coloring we can definitely deal with. That being said, we might be able to have an env flag or config file to set different strategies for people to chose (emoji support/vs one byte==one col).

m-ou-se · 2021-08-12T18:24:08Z

That being said, we might be able to have an env flag or config file to set different strategies for people to chose (emoji support/vs one byte==one col).

Another option could be to use the \e[6n escape sequence to ask the terminal for feedback on how many columns it used after printing something. But that's probably not worth the effort. (And won't work when we're not directly writing to a terminal.)

estebank · 2021-08-12T19:39:24Z

That would fix quite a few things, but it would also require us to use a completely different approach to printing to the screen than what we do today (we keep a buffer that we dump all at once, so we precalculate all the col/line positions ahead of time).

thomcc · 2021-08-13T07:02:52Z

it depends a lot on the terminal how wide things are

Yeah, it depends on all of: the terminal, the font, the algorithm it uses for character width... In general, the most important thing is not getting it "right" (for some definition of right), it's doing the same thing that the terminal does. Which is intractable without querying it with \e[6n, as you said.

(This is a problem I've thought about a great deal...)

That said, I think the tab-specific case could be fixed by replacing tabs with spaces. This might be worth doing given how common use of tabs can be for indentation (even though in Rust they are not as common).

The downside is that the output text won't be identical to what's in the source buffer, but hey, it didn't have escape sequences in it either 😉.

m-ou-se · 2021-08-13T07:54:36Z

could be fixed by replacing tabs with spaces

That already happens. (That's why the char offsets in the output buffer and the source code didn't match.)

thomcc · 2021-08-13T08:14:21Z

Oh. I had figured it was just computing the width of the tab wrong, my bad.

m-ou-se · 2021-08-13T09:41:23Z

Yeah, it depends on all of: the terminal, the font, the algorithm it uses for character width...

Note that most terminals don't use the font at all for this information:

xterm: https://github.com/ThomasDickey/xterm-snapshots/blob/f8a406455c18ec6f3de1cf2c642f7b7f0312d362/wcwidth.c#L167-L168

rxvt-unicode: http://cvs.schmorp.de/rxvt-unicode/README.FAQ?revision=1.74&view=markup#l1112

Gnome's vte: https://gitlab.gnome.org/GNOME/vte/-/blob/c17e6d12da00a94c3768be6671182a6a039ec0c0/src/vte.cc#L239-254

Konsole: https://github.com/KDE/konsole/blob/1c8db8a3fcd8db5b346915b6f8e7e6b94cb66e0c/src/characters/CharacterWidth.cpp#L130

They all try to follow the unicode standard, regardless of the font used, to make it possible for programs to predict column positions.

thomcc · 2021-08-13T16:48:00Z

The font is what determines how many glyphs ligatures (such as ZWJ emoji sequences) render as. So it's definitely relevant here for many terminals, even if they don't explicitly use it in their own computations.

(Also, the relevant unicode standard — UAX#11 — explicitly says they shouldn't use it, which is why I mentioned following what the terminal does is more important than being "right")

m-ou-se · 2021-08-13T23:52:13Z

The font is what determines how many glyphs ligatures (such as ZWJ emoji sequences) render as.

That has no influence on how many columns they take up in all of the terminals I linked. The amount of cells taken up is independent of the font in all of those implementations.

And most terminals ignore that information also for rendering and draw the individual codepoints of those sequences separately. But the point here is the amount of columns things take up, not how the visuals inside there are rendered.

thomcc · 2021-08-14T00:36:07Z

Sigh this argument is ridiculous and very tiring.

That has no influence on how many columns they take up in all of the terminals I linked. The amount of cells taken up is independent of the font in all of those implementations.

You'll have to believe me it matters for some terminals then, ones that render ZWJ emoji sequences as a single glyph. iTerm2 is one, if ligatures are enabled. I believe xterm.js (used in VSCode's integrated terminal) is another. (There are also terminals that treat ligature sequences as if they always take up the same width, I think either alacritty or kitty does this — it's been a very long time this was relevant to my work or I'd have this information closer to the front of my mind)

Anyway, it definitely is font dependent on a subset of terminals. I had believed that some of the ones you linked (in particular VTE) was among those, but I guess I was mistaken.

estebank · 2021-08-17T16:23:53Z

For the purposes of coloring, the fact that ligatures are enabled shouldn't affect how correct the output is unless those terminals are already buggy independently of what we provide because we are relying on ANSI codes (on non-Windows environments), which are inline. If those terminals don't handle ANSI codes in between two different characters separated by a color change that happen to have a ligature correctly, it is beyond the scope of rustc to sidestep the issue, beyond a reasonable effort (in the same way that we currently restrict ourselves to ASCII for our output, for example). For what is worth, the coloring is more likely to be correct (now, with the PR fixing this issue) than the underlines, because the underlines do have to account for how wide the output will be on the display.

You'll have to believe me it matters for some terminals then, ones that render ZWJ emoji sequences as a single glyph.

I don't think anyone is disputing that there are lots of variation in how the visual representation of text varies across terminals, but you're coming across as incredibly dismissive even in the face of being shown the source code for a representative sampling of terminals. Let's all try to be constructive here.

iTerm2 is one, if ligatures are enabled. I believe xterm.js (used in VSCode's integrated terminal) is another.

Funnily enough, those are the two terminals I regularly test in.

thomcc · 2021-08-17T16:48:23Z

but you're coming across as incredibly dismissive even in the face of being shown the source code for a representative sampling of terminals. Let's all try to be constructive here.

Sorry, that's not my intention at all. My experience is that the logic here is often spread between both a wcwidth-style function and the underlying font rendering/shaping code (which comes from harfbuzz, appkit, etc), and so the existence of one of these doesn't really show that the other doesn't exist (although it seems I was mistaken and in those cases it does not).

By ligatures, I mostly mean ZWJ sequences rather than something you'd meaningfully change color in the middle of (although see some of the sections here here). For clarity, what I'm referring to is mostly stuff like what's mentioned by https://github.com/unicode-rs/unicode-width in the note, — cases where the computed width values may not match the actual rendered column width.

For example, 👩‍🔬 will render as two columns on many terminals that properly handle fonts with ligatures (example from iTerm 2). If this is not supported by the font (or often by the OS/application unicode version), it will render as multiple glyphs. I remember this happening to someone I know recently who had a trans flag in their shell prompt, as there was a period of time when on some machines it would render with two emoji (as the ZWJ sequence wasn't recognized), and on others one.

I spent a lot of time trying to make this work well in terminal applications in the past, but ultimately just gave up and concluded that querying the position was the only option if you really needed it to work 100% of the time. That said, most of the cases I was working with, these cases were slightly more problematic than they are for Rust diagnostics — problematic characters were more frequent than they appear in Rust code, and getting it wrong would totally wreck the UI (possibly to the point where it was unusable).

I suspect for Rust diagnostic uses, this position querying might be more trouble than it's worth.

…ng, r=m-ou-se Account for tabs when highlighting multiline code suggestions Address `'\t'` case in rust-lang#87972. Before: ![Screen Shot 2021-08-12 at 8 52 27 AM](https://user-images.githubusercontent.com/1606434/129228214-e5cfd203-9aa8-41c7-acd9-ce255ef8a21e.png) After: ![Screen Shot 2021-08-12 at 8 52 15 AM](https://user-images.githubusercontent.com/1606434/129228236-57c951fc-c8cf-4901-989f-b9b5aa5eebca.png)

estebank · 2021-08-27T21:37:53Z

Closing as per #87976.

m-ou-se added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-suggestion-diagnostics Area: Suggestions generated by the compiler applied by `cargo fix`. labels Aug 12, 2021

jonas-schievink added the C-bug Category: This is a bug. label Aug 12, 2021

estebank self-assigned this Aug 12, 2021

estebank mentioned this issue Aug 12, 2021

Account for tabs when highlighting multiline code suggestions #87976

Merged

estebank closed this as completed Aug 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong part colored in diagnostic suggestions after tabs #87972

Wrong part colored in diagnostic suggestions after tabs #87972

m-ou-se commented Aug 12, 2021

m-ou-se commented Aug 12, 2021

estebank commented Aug 12, 2021 •

edited

m-ou-se commented Aug 12, 2021

m-ou-se commented Aug 12, 2021

estebank commented Aug 12, 2021 •

edited

m-ou-se commented Aug 12, 2021 •

edited

estebank commented Aug 12, 2021

thomcc commented Aug 13, 2021

m-ou-se commented Aug 13, 2021

thomcc commented Aug 13, 2021

m-ou-se commented Aug 13, 2021

thomcc commented Aug 13, 2021 •

edited

m-ou-se commented Aug 13, 2021

thomcc commented Aug 14, 2021

estebank commented Aug 17, 2021 •

edited

thomcc commented Aug 17, 2021

estebank commented Aug 27, 2021

Wrong part colored in diagnostic suggestions after tabs #87972

Wrong part colored in diagnostic suggestions after tabs #87972

Comments

m-ou-se commented Aug 12, 2021

m-ou-se commented Aug 12, 2021

estebank commented Aug 12, 2021 • edited

m-ou-se commented Aug 12, 2021

m-ou-se commented Aug 12, 2021

estebank commented Aug 12, 2021 • edited

m-ou-se commented Aug 12, 2021 • edited

estebank commented Aug 12, 2021

thomcc commented Aug 13, 2021

m-ou-se commented Aug 13, 2021

thomcc commented Aug 13, 2021

m-ou-se commented Aug 13, 2021

thomcc commented Aug 13, 2021 • edited

m-ou-se commented Aug 13, 2021

thomcc commented Aug 14, 2021

estebank commented Aug 17, 2021 • edited

thomcc commented Aug 17, 2021

estebank commented Aug 27, 2021

estebank commented Aug 12, 2021 •

edited

estebank commented Aug 12, 2021 •

edited

m-ou-se commented Aug 12, 2021 •

edited

thomcc commented Aug 13, 2021 •

edited

estebank commented Aug 17, 2021 •

edited