Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implements highlighting of unicode characters. #137508

Merged
merged 9 commits into from
Nov 22, 2021
Merged

Conversation

hediet
Copy link
Member

@hediet hediet commented Nov 19, 2021

This PR implements #136437

Some unicode characters are highlighted:

image

There is an hover to explain why:

image

"Adjust Settings" will open a quick pick:
image

New Settings

  • editor.unicodeHighlight.excludedCharacters: string[]
  • editor.unicodeHighlight.invisibleCharacters: boolean
  • editor.unicodeHighlight.nonBasicASCII: boolean
  • editor.unicodeHighlight.ambiguousCharacters: boolean
  • editor.unicodeHighlight.includeComments: boolean

For now, everything defaults to false or [], but it is planned to use the trust state to configure the defaults:

  • Untrusted:
    • editor.unicodeHighlight.excludedCharacters: []
    • editor.unicodeHighlight.invisibleCharacters: true
    • editor.unicodeHighlight.nonBasicASCII: true
    • editor.unicodeHighlight.ambiguousCharacters: true
    • editor.unicodeHighlight.includeComments: true
  • Trusted:
    • editor.unicodeHighlight.excludedCharacters: []
    • editor.unicodeHighlight.invisibleCharacters: true
    • editor.unicodeHighlight.nonBasicASCII: false
    • editor.unicodeHighlight.ambiguousCharacters: true
    • editor.unicodeHighlight.includeComments: false

New Theme Colors

  • editorUnicodeHighlight.border to control the border of the highlight box (default red)

Invisible characters

An unicode character is invisible if rendering it with a black font on a white canvas leaves the canvas white.

Ambiguous characters

A character is ambiguous if it is confusable with a basic ascii character and if it does not appear in vscode-loc for the currently selected locale.

Follow ups

  • Don't highlight invisible characters in emojis
  • Adjust the defaults
  • Evaluate if invisible characters that appear in vscode-loc should be excluded (warning: this includes zero width space etc.). Note, that for trusted workspace, it is not planned to highlight unicode characters in comments
  • Never highlight characters in markdown/text files

@hediet hediet added this to the November 2021 milestone Nov 19, 2021
@hediet hediet requested a review from alexdima November 19, 2021 13:07
@hediet hediet self-assigned this Nov 19, 2021
@hediet hediet linked an issue Nov 19, 2021 that may be closed by this pull request
…ManagementService to monaco editor contributions.
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Looks pretty good! I have finished looking at the code, I will spend some time looking at files and will try to write more feedback after that.

@@ -1032,3 +1033,76 @@ const enum CodePoint {
}

export const noBreakWhitespace = '\xa0';

export class AmbiguousCharacters {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a link to the script that was used to generate this such that we can regenerate them in the future.


export class InvisibleCharacters {
private static getRawData(): number[] {
return JSON.parse('[9,10,11,12,13,32,127,160,173,847,1564,4447,4448,6068,6069,6155,6156,6157,6158,7355,7356,8192,8193,8194,8195,8196,8197,8198,8199,8200,8201,8202,8203,8204,8205,8206,8207,8234,8235,8236,8237,8238,8239,8287,8288,8289,8290,8291,8292,8293,8294,8295,8296,8297,8298,8299,8300,8301,8302,8303,10240,12288,12644,65024,65025,65026,65027,65028,65029,65030,65031,65032,65033,65034,65035,65036,65037,65038,65039,65279,65440,65520,65521,65522,65523,65524,65525,65526,65527,65528,65532,78844,119155,119156,119157,119158,119159,119160,119161,119162,917504,917505,917506,917507,917508,917509,917510,917511,917512,917513,917514,917515,917516,917517,917518,917519,917520,917521,917522,917523,917524,917525,917526,917527,917528,917529,917530,917531,917532,917533,917534,917535,917536,917537,917538,917539,917540,917541,917542,917543,917544,917545,917546,917547,917548,917549,917550,917551,917552,917553,917554,917555,917556,917557,917558,917559,917560,917561,917562,917563,917564,917565,917566,917567,917568,917569,917570,917571,917572,917573,917574,917575,917576,917577,917578,917579,917580,917581,917582,917583,917584,917585,917586,917587,917588,917589,917590,917591,917592,917593,917594,917595,917596,917597,917598,917599,917600,917601,917602,917603,917604,917605,917606,917607,917608,917609,917610,917611,917612,917613,917614,917615,917616,917617,917618,917619,917620,917621,917622,917623,917624,917625,917626,917627,917628,917629,917630,917631,917760,917761,917762,917763,917764,917765,917766,917767,917768,917769,917770,917771,917772,917773,917774,917775,917776,917777,917778,917779,917780,917781,917782,917783,917784,917785,917786,917787,917788,917789,917790,917791,917792,917793,917794,917795,917796,917797,917798,917799,917800,917801,917802,917803,917804,917805,917806,917807,917808,917809,917810,917811,917812,917813,917814,917815,917816,917817,917818,917819,917820,917821,917822,917823,917824,917825,917826,917827,917828,917829,917830,917831,917832,917833,917834,917835,917836,917837,917838,917839,917840,917841,917842,917843,917844,917845,917846,917847,917848,917849,917850,917851,917852,917853,917854,917855,917856,917857,917858,917859,917860,917861,917862,917863,917864,917865,917866,917867,917868,917869,917870,917871,917872,917873,917874,917875,917876,917877,917878,917879,917880,917881,917882,917883,917884,917885,917886,917887,917888,917889,917890,917891,917892,917893,917894,917895,917896,917897,917898,917899,917900,917901,917902,917903,917904,917905,917906,917907,917908,917909,917910,917911,917912,917913,917914,917915,917916,917917,917918,917919,917920,917921,917922,917923,917924,917925,917926,917927,917928,917929,917930,917931,917932,917933,917934,917935,917936,917937,917938,917939,917940,917941,917942,917943,917944,917945,917946,917947,917948,917949,917950,917951,917952,917953,917954,917955,917956,917957,917958,917959,917960,917961,917962,917963,917964,917965,917966,917967,917968,917969,917970,917971,917972,917973,917974,917975,917976,917977,917978,917979,917980,917981,917982,917983,917984,917985,917986,917987,917988,917989,917990,917991,917992,917993,917994,917995,917996,917997,917998,917999]');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here please add a link.

src/vs/editor/common/config/editorOptions.ts Outdated Show resolved Hide resolved
src/vs/editor/common/config/editorOptions.ts Show resolved Hide resolved
src/vs/editor/common/config/editorOptions.ts Outdated Show resolved Hide resolved
@hediet hediet requested a review from alexdima November 22, 2021 16:51
Copy link
Member

@alexdima alexdima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@hediet hediet merged commit 3a33723 into main Nov 22, 2021
@hediet hediet deleted the hediet/unicode-highlighting branch November 22, 2021 20:56
@alexdima
Copy link
Member

We can add the scripts used to generate AmbiguousCharacters and InvisiblesCharacters to https://github.com/alexdima/unicode-utils such that we have a chance again in the future to update with newer Unicode versions.

@Keno
Copy link

Keno commented Nov 26, 2021

Is there a way to turn this off by default depending on the source language. In julia source files, unicode variable names are very common and they now all have ugly red boxes around them by default:

Screenshot from 2021-11-25 22-50-13

I understand that it's possible to turn this off manually, but this makes the first-use experience much worse for users opening julia source files.

@davidanthoff
Copy link
Contributor

If there is any way that we can avoid this shipping to the Julia user base on the stable release channel, I would greatly appreciate it. This is such a visible regression that would affect ten thousands of Julia users, that I’m very worried about the support request wave this would generate. We are a small team of volunteers that maintains the Julia extension, we are simply not equipped well to handle situations where something so fundamental regresses for a large fraction of our (large) user base. Is there some way that either we can disable this for Julia files before it ships (like Keno suggested), or could this be delayed from shipping until there is some other workable solution for languages like Julia that routinely use Unicode in variable names?

@hediet
Copy link
Member Author

hediet commented Nov 26, 2021

@Keno @davidanthoff let's discuss in #137924

guibber pushed a commit to guibber/vscode that referenced this pull request Nov 30, 2021
…hlighting

Implements highlighting of unicode characters.
@chrisinmtown
Copy link

What is the expected release for this new feature?

@hediet
Copy link
Member Author

hediet commented Dec 9, 2021

Yesterday.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 6, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Highlight confusable Unicode characters
5 participants