Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong part colored in diagnostic suggestions after tabs #87972

Closed
m-ou-se opened this issue Aug 12, 2021 · 17 comments
Closed

Wrong part colored in diagnostic suggestions after tabs #87972

m-ou-se opened this issue Aug 12, 2021 · 17 comments
Assignees
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-suggestion-diagnostics Area: Suggestions generated by the compiler applied by `cargo fix`. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@m-ou-se
Copy link
Member

m-ou-se commented Aug 12, 2021

trait T {
	const A: &u8;
}

(The second line starts with a tab, not with spaces!)

This results in:
Screenshot 2021-08-12 at 16 48 04

It suggests to add a 'a after the &, but the green color is not applied to the 'a part, but instead to A: &, shifted a bit to the left.

cc @estebank

@m-ou-se m-ou-se added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-suggestion-diagnostics Area: Suggestions generated by the compiler applied by `cargo fix`. labels Aug 12, 2021
@jonas-schievink jonas-schievink added the C-bug Category: This is a bug. label Aug 12, 2021
@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 12, 2021

image

The problem is only with tabs, it seems. It works fine with multi-byte, multi-codepoint, and double-width characters.

@estebank
Copy link
Contributor

estebank commented Aug 12, 2021

It works ok when it is a single line, so the problem is likely in the following code not accounting for calls to replace_tabs:

if is_multiline {
for SubstitutionHighlight { start, end } in parts {
buffer.set_style_range(
row_num,
max_line_num_len + 3 + start,
max_line_num_len + 3 + end,
Style::Addition,
true,
);
}
}

Screen Shot 2021-08-12 at 8 12 19 AM

error[E0106]: missing lifetime specifier
 --> f1.rs:2:11
  |
2 |     const A: &u8;
  |              ^ expected named lifetime parameter
  |
help: consider using the `'static` lifetime
  |
2 |     const A: &'static u8;
  |               +++++++
help: consider introducing a named lifetime parameter
  |
1 ~ trait T<'a> {
2 ~     const A: &'a u8;
  |

error[E0106]: missing lifetime specifier
 --> f1.rs:4:20
  |
4 | trait K { const A: &u8; }
  |                    ^ expected named lifetime parameter
  |
help: consider using the `'static` lifetime
  |
4 | trait K { const A: &'static u8; }
  |                     +++++++
help: consider introducing a named lifetime parameter
  |
4 | trait K<'a> { const A: &'a u8; }
  |        ++++            ~~~

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 12, 2021

Oh, multi-{byte,column,codepoint} characters do not work fine in a single line:

Screenshot 2021-08-12 at 18 06 22

trait T {
    const BLÅHAJ /* 👩‍❤️‍💋‍👩 */: &u8;
}

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 12, 2021

The second line with the ^ or ++++ not aligning might be unactionable, because it depends a lot on the terminal how wide things are:

image

But at least the green coloring of &'stati can be fixed.

@estebank
Copy link
Contributor

estebank commented Aug 12, 2021

The underline not aligning due to wide grapheme clusters is a known issue, and somewhat of an impossible problem to solve in a way that covers all terminals, but the coloring we can definitely deal with. That being said, we might be able to have an env flag or config file to set different strategies for people to chose (emoji support/vs one byte==one col).

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 12, 2021

That being said, we might be able to have an env flag or config file to set different strategies for people to chose (emoji support/vs one byte==one col).

Another option could be to use the \e[6n escape sequence to ask the terminal for feedback on how many columns it used after printing something. But that's probably not worth the effort. (And won't work when we're not directly writing to a terminal.)

@estebank
Copy link
Contributor

That would fix quite a few things, but it would also require us to use a completely different approach to printing to the screen than what we do today (we keep a buffer that we dump all at once, so we precalculate all the col/line positions ahead of time).

@thomcc
Copy link
Member

thomcc commented Aug 13, 2021

it depends a lot on the terminal how wide things are

Yeah, it depends on all of: the terminal, the font, the algorithm it uses for character width... In general, the most important thing is not getting it "right" (for some definition of right), it's doing the same thing that the terminal does. Which is intractable without querying it with \e[6n, as you said.

(This is a problem I've thought about a great deal...)

That said, I think the tab-specific case could be fixed by replacing tabs with spaces. This might be worth doing given how common use of tabs can be for indentation (even though in Rust they are not as common).

The downside is that the output text won't be identical to what's in the source buffer, but hey, it didn't have escape sequences in it either 😉.

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 13, 2021

could be fixed by replacing tabs with spaces

That already happens. (That's why the char offsets in the output buffer and the source code didn't match.)

@thomcc
Copy link
Member

thomcc commented Aug 13, 2021

Oh. I had figured it was just computing the width of the tab wrong, my bad.

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 13, 2021

Yeah, it depends on all of: the terminal, the font, the algorithm it uses for character width...

Note that most terminals don't use the font at all for this information:

xterm: https://github.com/ThomasDickey/xterm-snapshots/blob/f8a406455c18ec6f3de1cf2c642f7b7f0312d362/wcwidth.c#L167-L168

rxvt-unicode: http://cvs.schmorp.de/rxvt-unicode/README.FAQ?revision=1.74&view=markup#l1112

Gnome's vte: https://gitlab.gnome.org/GNOME/vte/-/blob/c17e6d12da00a94c3768be6671182a6a039ec0c0/src/vte.cc#L239-254

Konsole: https://github.com/KDE/konsole/blob/1c8db8a3fcd8db5b346915b6f8e7e6b94cb66e0c/src/characters/CharacterWidth.cpp#L130

They all try to follow the unicode standard, regardless of the font used, to make it possible for programs to predict column positions.

@thomcc
Copy link
Member

thomcc commented Aug 13, 2021

The font is what determines how many glyphs ligatures (such as ZWJ emoji sequences) render as. So it's definitely relevant here for many terminals, even if they don't explicitly use it in their own computations.

(Also, the relevant unicode standard — UAX#11 — explicitly says they shouldn't use it, which is why I mentioned following what the terminal does is more important than being "right")

@m-ou-se
Copy link
Member Author

m-ou-se commented Aug 13, 2021

The font is what determines how many glyphs ligatures (such as ZWJ emoji sequences) render as.

That has no influence on how many columns they take up in all of the terminals I linked. The amount of cells taken up is independent of the font in all of those implementations.

And most terminals ignore that information also for rendering and draw the individual codepoints of those sequences separately. But the point here is the amount of columns things take up, not how the visuals inside there are rendered.

@thomcc
Copy link
Member

thomcc commented Aug 14, 2021

Sigh this argument is ridiculous and very tiring.

That has no influence on how many columns they take up in all of the terminals I linked. The amount of cells taken up is independent of the font in all of those implementations.

You'll have to believe me it matters for some terminals then, ones that render ZWJ emoji sequences as a single glyph. iTerm2 is one, if ligatures are enabled. I believe xterm.js (used in VSCode's integrated terminal) is another. (There are also terminals that treat ligature sequences as if they always take up the same width, I think either alacritty or kitty does this — it's been a very long time this was relevant to my work or I'd have this information closer to the front of my mind)

Anyway, it definitely is font dependent on a subset of terminals. I had believed that some of the ones you linked (in particular VTE) was among those, but I guess I was mistaken.

@estebank
Copy link
Contributor

estebank commented Aug 17, 2021

For the purposes of coloring, the fact that ligatures are enabled shouldn't affect how correct the output is unless those terminals are already buggy independently of what we provide because we are relying on ANSI codes (on non-Windows environments), which are inline. If those terminals don't handle ANSI codes in between two different characters separated by a color change that happen to have a ligature correctly, it is beyond the scope of rustc to sidestep the issue, beyond a reasonable effort (in the same way that we currently restrict ourselves to ASCII for our output, for example). For what is worth, the coloring is more likely to be correct (now, with the PR fixing this issue) than the underlines, because the underlines do have to account for how wide the output will be on the display.

You'll have to believe me it matters for some terminals then, ones that render ZWJ emoji sequences as a single glyph.

I don't think anyone is disputing that there are lots of variation in how the visual representation of text varies across terminals, but you're coming across as incredibly dismissive even in the face of being shown the source code for a representative sampling of terminals. Let's all try to be constructive here.

iTerm2 is one, if ligatures are enabled. I believe xterm.js (used in VSCode's integrated terminal) is another.

Funnily enough, those are the two terminals I regularly test in.

@thomcc
Copy link
Member

thomcc commented Aug 17, 2021

but you're coming across as incredibly dismissive even in the face of being shown the source code for a representative sampling of terminals. Let's all try to be constructive here.

Sorry, that's not my intention at all. My experience is that the logic here is often spread between both a wcwidth-style function and the underlying font rendering/shaping code (which comes from harfbuzz, appkit, etc), and so the existence of one of these doesn't really show that the other doesn't exist (although it seems I was mistaken and in those cases it does not).

By ligatures, I mostly mean ZWJ sequences rather than something you'd meaningfully change color in the middle of (although see some of the sections here here). For clarity, what I'm referring to is mostly stuff like what's mentioned by https://github.com/unicode-rs/unicode-width in the note, — cases where the computed width values may not match the actual rendered column width.

For example, 👩‍🔬 will render as two columns on many terminals that properly handle fonts with ligatures (example from iTerm 2). If this is not supported by the font (or often by the OS/application unicode version), it will render as multiple glyphs. I remember this happening to someone I know recently who had a trans flag in their shell prompt, as there was a period of time when on some machines it would render with two emoji (as the ZWJ sequence wasn't recognized), and on others one.

I spent a lot of time trying to make this work well in terminal applications in the past, but ultimately just gave up and concluded that querying the position was the only option if you really needed it to work 100% of the time. That said, most of the cases I was working with, these cases were slightly more problematic than they are for Rust diagnostics — problematic characters were more frequent than they appear in Rust code, and getting it wrong would totally wreck the UI (possibly to the point where it was unusable).

I suspect for Rust diagnostic uses, this position querying might be more trouble than it's worth.

m-ou-se added a commit to m-ou-se/rust that referenced this issue Aug 23, 2021
…ng, r=m-ou-se

Account for tabs when highlighting multiline code suggestions

Address `'\t'` case in rust-lang#87972.

Before:

![Screen Shot 2021-08-12 at 8 52 27 AM](https://user-images.githubusercontent.com/1606434/129228214-e5cfd203-9aa8-41c7-acd9-ce255ef8a21e.png)

After:

![Screen Shot 2021-08-12 at 8 52 15 AM](https://user-images.githubusercontent.com/1606434/129228236-57c951fc-c8cf-4901-989f-b9b5aa5eebca.png)
@estebank
Copy link
Contributor

Closing as per #87976.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-diagnostics Area: Messages for errors, warnings, and lints A-suggestion-diagnostics Area: Suggestions generated by the compiler applied by `cargo fix`. C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants