Stricter Contextual Unicode Highlighting: Consider Same Script #143720

hediet · 2022-02-23T12:59:08Z

This doesn't look like a good and secure implementation to me.
Consider the following code:

if (variable_א || variable_ב || vаriable_ג) {
    admin = true;
}

As far as I understand, nothing will be highlighted after the change. But paste the code to an editor before the change and you'll see why it's bad.

I think it should only skip the highlighting if the word contains only characters from a single language.

Also, there must be a way to disable this word-based highlighting exclusion.

Originally posted by @justanotheranonymoususer in #140960 (comment)

The text was updated successfully, but these errors were encountered:

hediet · 2022-02-23T13:00:20Z

Thanks for your input!

As far as I understand, nothing will be highlighted after the change.

This is correct.

This doesn't look like a good and secure implementation to me.

To be secure, we recommand configuring "editor.unicodeHighlight.nonBasicASCII": true. We don't guarantee security with the other settings, just heuristic protection against common unicode spoofing attacks.

Also, if you use non-basic ascii characters for identifiers, you are making yourself much more vulnerable for such attack.

I think it should only skip the highlighting if the word contains only characters from a single language.

Might be worth to consider.

The rule would be:

Highlight characters that look like ASCII or that are invisible
... unless this character is in a word where at least one character cannot be confused with an ASCII character and all characters in that word have the same script

(bold is new)

justanotheranonymoususer · 2022-02-23T13:03:07Z

To be secure, we recommand configuring "editor.unicodeHighlight.nonBasicASCII": true.

Is that the default? I'd prefer VSCode to be secure by default.

Also, if you use non-basic ascii characters for identifiers, you are making yourself much more vulnerable for such attack.

The code might be authored by somebody else.

The rule would be:

Sounds good.

hediet · 2022-02-23T13:18:54Z

Is that the default? I'd prefer VSCode to be secure by default.

This is the default for untrusted workspaces. You should only open a workspace as trusted when you can rule out malicious intent.

The code might be authored by somebody else.

Then you should reject that PR.
In your example, λ and ג might actually be confusable.

hediet · 2022-02-24T13:00:48Z

Also see #143796.

Since this feature is all about ASCII/non-ASCII confusion, I tweaked the algorithm to only skip highlighting if all characters are non-ASCII.

hediet changed the title ~~This doesn't look like a good and secure implementation to me.~~ Stricter Contextual Unicode Highlighting: Consider Same Script Feb 23, 2022

hediet mentioned this issue Feb 23, 2022

Idea: Don't highlight ambiguous/invisible characters when they are surrounded by non-latin-looking characters #140960

Closed

hediet added the unicode-highlight label Feb 23, 2022

hediet added this to the February 2022 milestone Feb 23, 2022

hediet self-assigned this Feb 23, 2022

hediet closed this as completed Feb 24, 2022

github-actions bot locked and limited conversation to collaborators Apr 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stricter Contextual Unicode Highlighting: Consider Same Script #143720

Stricter Contextual Unicode Highlighting: Consider Same Script #143720

hediet commented Feb 23, 2022

hediet commented Feb 23, 2022

justanotheranonymoususer commented Feb 23, 2022

hediet commented Feb 23, 2022

hediet commented Feb 24, 2022

Stricter Contextual Unicode Highlighting: Consider Same Script #143720

Stricter Contextual Unicode Highlighting: Consider Same Script #143720

Comments

hediet commented Feb 23, 2022

hediet commented Feb 23, 2022

justanotheranonymoususer commented Feb 23, 2022

hediet commented Feb 23, 2022

hediet commented Feb 24, 2022