Fix highlighting of byte escape sequences #15303

oxalica · 2023-07-17T14:54:30Z

Currently non-UTF8 escape sequences in byte strings and any escape sequences in byte literals are ignored.

lowr · 2023-07-18T09:37:28Z

crates/syntax/src/ast/token_ext.rs

+    // XXX: `Mode::CStr` is not supported by `unescape_literal` of ra-ap-rustc_lexer yet.
+    // Here we pretend it to be a byte string.
+    const MODE: Mode = Mode::ByteStr;


It seems unescape_literal() not supporting c strings is intentional. There's rustc_lexer::unescape::unescape_c_string() for c strings instead. See this comment for why this is the case.

I'm starting to doubt whether IsString::escaped_char_ranges() is the right abstraction as unescape_literal() and unescape_c_string() take a closure of different signature 🤔 But it's out of scope for this PR to come up with something to replace it I guess.

Can you override <CString as IsString>::escaped_char_ranges() so that it uses unescape_c_string()? Since it's only used for highlighting where the actual unescaped bytes aren't relevant, we can discard CStrUnit and pass e.g. Ok(' ') to the callback for the time being. A comment would be nice too!

I'm starting to doubt whether IsString::escaped_char_ranges() is the right abstraction as unescape_literal() and unescape_c_string() take a closure of different signature thinking

I think it's "more correct" to use CStrUnit aka. Either<char, u8> for all these functions. But the name CStrUnit is exclusive and not really suitable.

But it's out of scope for this PR to come up with something to replace it I guess.

I agree.

Can you override <CString as IsString>::escaped_char_ranges() so that it uses unescape_c_string()? Since it's only used for highlighting where the actual unescaped bytes aren't relevant, we can discard CStrUnit and pass e.g. Ok(' ') to the callback for the time being. A comment would be nice too!

I only found the format specifier parser which makes use of the range information as well as unescaped string content. But it takes ast::String so should not be affected by a placeholder ' '.

The change is pushed now.

oxalica · 2023-07-18T11:02:58Z

I'm also thinking about highlighting erroneous escape sequences as a special color, instead of leaving them the same as literals. But there is only a similar UnresolvedReference currently. Not sure if that's a good idea since they should already be marked by cargo-check as errors.

lowr · 2023-07-18T11:29:54Z

I think that's reasonable (I actually thought something like that would be cool while reviewing!). Do you want to implement it yourself, either in this PR or in a separate PR?

oxalica · 2023-07-19T07:18:51Z

I'm also thinking about highlighting erroneous escape sequences as a special color, instead of leaving them the same as literals. But there is only a similar UnresolvedReference currently. Not sure if that's a good idea since they should already be marked by cargo-check as errors.

I think that's reasonable (I actually thought something like that would be cool while reviewing!). Do you want to implement it yourself, either in this PR or in a separate PR?

Implemented. I'm not sure how to provide a default color (like, red?) for different editors. I cannot find related color settings for unresolvedReference either.

lowr · 2023-07-21T14:02:14Z

Sorry for the delay. The implementation looks good! Do we also want to highlight invalid escape sequences in highlight_escape_char() and highlight_escape_byte()? Might be a little more complicated as it'd require implementing something similar to IsString::escaped_char_ranges() for ast::Char and ast::Byte.

As for the styling, it seems we have little control over it. Looks like we can define some fallback TextMate scopes, but seeing we don't provide it for unresolvedReference, it wouldn't block this PR to land I suppose.

oxalica · 2023-07-21T16:11:16Z

Do we also want to highlight invalid escape sequences in highlight_escape_char() and highlight_escape_byte()?

I don't think it will help much since ' is not a reliable delimiter. If an invalid escape sequence occur after ', the lexer should very likely already blow up. We cannot determine the boundary of a char literal if something went wrong, considering 'lifetime tokens.

Looks like we can define some fallback TextMate scopes, but seeing we don't provide it for unresolvedReference, it wouldn't block this PR to land I suppose.

The link is broken. And I'm not familiar with TextMate thus prefer to skip it for now.

lowr · 2023-07-22T15:07:26Z

I don't think it will help much since ' is not a reliable delimiter. If an invalid escape sequence occur after ', the lexer should very likely already blow up. We cannot determine the boundary of a char literal if something went wrong, considering 'lifetime tokens.

Makes sense. Could you add a comment explaining that rationale? r=me with that, thanks!

The link is broken.

🤦‍♂️ My bad, fixed.

@bors delegate+

bors · 2023-07-22T15:07:29Z

✌️ @oxalica, you can now approve this pull request!

If @lowr told you to "r=me" after making some further change, please make that change, then do @bors r=@lowr

oxalica · 2023-07-22T20:25:35Z

@bors r=@lowr

bors · 2023-07-22T20:25:36Z

📌 Commit 51b35cc has been approved by lowr

It is now in the queue for this repository.

bors · 2023-07-22T20:25:42Z

⌛ Testing commit 51b35cc with merge 99718d0...

bors · 2023-07-22T20:40:05Z

☀️ Test successful - checks-actions
Approved by: lowr
Pushing 99718d0 to master...

Fix highlighting of byte escape sequences

de1f766

Currently non-UTF8 escape sequences in byte strings and any escape sequences in byte literals are ignored.

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 17, 2023

lowr reviewed Jul 18, 2023

View reviewed changes

Fix unescaping of C string literals

59a3e42

Introduce invalidEscapeSequence semantic token type

1f35e4d

Add comments for why skip highlighting for invalid char/byte literals

51b35cc

bors merged commit 99718d0 into rust-lang:master Jul 22, 2023
8 checks passed

oxalica deleted the fix/byte-escape-highlight branch July 28, 2023 11:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix highlighting of byte escape sequences #15303

Fix highlighting of byte escape sequences #15303

oxalica commented Jul 17, 2023

lowr Jul 18, 2023 •

edited

oxalica Jul 18, 2023

oxalica commented Jul 18, 2023

lowr commented Jul 18, 2023

oxalica commented Jul 19, 2023

lowr commented Jul 21, 2023 •

edited

oxalica commented Jul 21, 2023

lowr commented Jul 22, 2023 •

edited

bors commented Jul 22, 2023

oxalica commented Jul 22, 2023

bors commented Jul 22, 2023

bors commented Jul 22, 2023

bors commented Jul 22, 2023

Fix highlighting of byte escape sequences #15303

Fix highlighting of byte escape sequences #15303

Conversation

oxalica commented Jul 17, 2023

lowr Jul 18, 2023 • edited

Choose a reason for hiding this comment

oxalica Jul 18, 2023

Choose a reason for hiding this comment

oxalica commented Jul 18, 2023

lowr commented Jul 18, 2023

oxalica commented Jul 19, 2023

lowr commented Jul 21, 2023 • edited

oxalica commented Jul 21, 2023

lowr commented Jul 22, 2023 • edited

bors commented Jul 22, 2023

oxalica commented Jul 22, 2023

bors commented Jul 22, 2023

bors commented Jul 22, 2023

bors commented Jul 22, 2023

lowr Jul 18, 2023 •

edited

lowr commented Jul 21, 2023 •

edited

lowr commented Jul 22, 2023 •

edited