Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix highlighting of byte escape sequences #15303

Merged
merged 4 commits into from Jul 22, 2023

Conversation

oxalica
Copy link
Contributor

@oxalica oxalica commented Jul 17, 2023

Currently non-UTF8 escape sequences in byte strings and any escape sequences in byte literals are ignored.

Currently non-UTF8 escape sequences in byte strings and any escape
sequences in byte literals are ignored.
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 17, 2023
Comment on lines 288 to 290
// XXX: `Mode::CStr` is not supported by `unescape_literal` of ra-ap-rustc_lexer yet.
// Here we pretend it to be a byte string.
const MODE: Mode = Mode::ByteStr;
Copy link
Contributor

@lowr lowr Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems unescape_literal() not supporting c strings is intentional. There's rustc_lexer::unescape::unescape_c_string() for c strings instead. See this comment for why this is the case.

I'm starting to doubt whether IsString::escaped_char_ranges() is the right abstraction as unescape_literal() and unescape_c_string() take a closure of different signature 🤔 But it's out of scope for this PR to come up with something to replace it I guess.

Can you override <CString as IsString>::escaped_char_ranges() so that it uses unescape_c_string()? Since it's only used for highlighting where the actual unescaped bytes aren't relevant, we can discard CStrUnit and pass e.g. Ok(' ') to the callback for the time being. A comment would be nice too!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm starting to doubt whether IsString::escaped_char_ranges() is the right abstraction as unescape_literal() and unescape_c_string() take a closure of different signature thinking

I think it's "more correct" to use CStrUnit aka. Either<char, u8> for all these functions. But the name CStrUnit is exclusive and not really suitable.

But it's out of scope for this PR to come up with something to replace it I guess.

I agree.

Can you override <CString as IsString>::escaped_char_ranges() so that it uses unescape_c_string()? Since it's only used for highlighting where the actual unescaped bytes aren't relevant, we can discard CStrUnit and pass e.g. Ok(' ') to the callback for the time being. A comment would be nice too!

I only found the format specifier parser which makes use of the range information as well as unescaped string content. But it takes ast::String so should not be affected by a placeholder ' '.

The change is pushed now.

@oxalica
Copy link
Contributor Author

oxalica commented Jul 18, 2023

I'm also thinking about highlighting erroneous escape sequences as a special color, instead of leaving them the same as literals. But there is only a similar UnresolvedReference currently. Not sure if that's a good idea since they should already be marked by cargo-check as errors.

@lowr
Copy link
Contributor

lowr commented Jul 18, 2023

I think that's reasonable (I actually thought something like that would be cool while reviewing!). Do you want to implement it yourself, either in this PR or in a separate PR?

@oxalica
Copy link
Contributor Author

oxalica commented Jul 19, 2023

I'm also thinking about highlighting erroneous escape sequences as a special color, instead of leaving them the same as literals. But there is only a similar UnresolvedReference currently. Not sure if that's a good idea since they should already be marked by cargo-check as errors.

I think that's reasonable (I actually thought something like that would be cool while reviewing!). Do you want to implement it yourself, either in this PR or in a separate PR?

Implemented. I'm not sure how to provide a default color (like, red?) for different editors. I cannot find related color settings for unresolvedReference either.

@lowr
Copy link
Contributor

lowr commented Jul 21, 2023

Sorry for the delay. The implementation looks good! Do we also want to highlight invalid escape sequences in highlight_escape_char() and highlight_escape_byte()? Might be a little more complicated as it'd require implementing something similar to IsString::escaped_char_ranges() for ast::Char and ast::Byte.

As for the styling, it seems we have little control over it. Looks like we can define some fallback TextMate scopes, but seeing we don't provide it for unresolvedReference, it wouldn't block this PR to land I suppose.

@oxalica
Copy link
Contributor Author

oxalica commented Jul 21, 2023

Do we also want to highlight invalid escape sequences in highlight_escape_char() and highlight_escape_byte()?

I don't think it will help much since ' is not a reliable delimiter. If an invalid escape sequence occur after ', the lexer should very likely already blow up. We cannot determine the boundary of a char literal if something went wrong, considering 'lifetime tokens.

Looks like we can define some fallback TextMate scopes, but seeing we don't provide it for unresolvedReference, it wouldn't block this PR to land I suppose.

The link is broken. And I'm not familiar with TextMate thus prefer to skip it for now.

@lowr
Copy link
Contributor

lowr commented Jul 22, 2023

I don't think it will help much since ' is not a reliable delimiter. If an invalid escape sequence occur after ', the lexer should very likely already blow up. We cannot determine the boundary of a char literal if something went wrong, considering 'lifetime tokens.

Makes sense. Could you add a comment explaining that rationale? r=me with that, thanks!

The link is broken.

🤦‍♂️ My bad, fixed.

@bors delegate+

@bors
Copy link
Collaborator

bors commented Jul 22, 2023

✌️ @oxalica, you can now approve this pull request!

If @lowr told you to "r=me" after making some further change, please make that change, then do @bors r=@lowr

@oxalica
Copy link
Contributor Author

oxalica commented Jul 22, 2023

@bors r=@lowr

@bors
Copy link
Collaborator

bors commented Jul 22, 2023

📌 Commit 51b35cc has been approved by lowr

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Jul 22, 2023

⌛ Testing commit 51b35cc with merge 99718d0...

@bors
Copy link
Collaborator

bors commented Jul 22, 2023

☀️ Test successful - checks-actions
Approved by: lowr
Pushing 99718d0 to master...

@bors bors merged commit 99718d0 into rust-lang:master Jul 22, 2023
8 checks passed
@oxalica oxalica deleted the fix/byte-escape-highlight branch July 28, 2023 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants