Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] \u3164 not detected as gremlin #213

Closed
timkrins opened this issue Nov 10, 2021 · 11 comments
Closed

[Bug] \u3164 not detected as gremlin #213

timkrins opened this issue Nov 10, 2021 · 11 comments

Comments

@timkrins
Copy link

Describe the bug
The unicode character \u3164 "HANGUL FILLER" is not detected as a gremlin.
See https://certitude.consulting/blog/en/invisible-backdoor/ for a great article on this character (and my inspiration for this bug report)

To Reproduce
Steps to reproduce the behavior:

  1. View a file containing \u3164
  2. Gremlin not marked

Example code (from article above)

const { timeout,ㅤ} = req.query;

Expected behavior
The \u3164 whitespace is detected as a gremlin.

Screenshots
Screenshot 2021-11-10 at 10 29 00

Operating system:

  • OS: macOS
  • Version 11.5.2

Visual Studio Code:

  • Version 1.61.2

Gremlins extension:

  • Version 0.26.0
@TheSench
Copy link
Collaborator

Hey @timkrins , thanks for the suggestion. This looks like a reasonable addition. As a workaround for now, you can create a custom set of rules in VSCode and add this in. It should automatically fill out all of the default ones for you when you go to edit the gremlins.characters setting.

@nhoizey
Copy link
Owner

nhoizey commented Nov 10, 2021

I just read the article too and came here to create the issue, so thanks a lot @timkrins for creating it first! 🙏

Do you have time to provide the Pull Request for this addition?

@timkrins
Copy link
Author

@nhoizey can do - what level should we mark it as?

@timkrins
Copy link
Author

there are actually a huge number of Unicode 'confusables'...

just for white spaces there are:
0x1680 OGHAM SPACE MARK
0x2000 EN QUAD
0x2001 EM QUAD
0x2002 EN SPACE
0x2003 EM SPACE
0x2004 THREE-PER-EM SPACE
0x2005 FOUR-PER-EM SPACE
0x2006 SIX-PER-EM SPACE
0x2007 FIGURE SPACE
0x2008 PUNCTUATION SPACE
0x2009 THIN SPACE
0x200A HAIR SPACE

I wonder if there would be a way of flagging any Unicode confusable.

@TheSench
Copy link
Collaborator

@timkrins I don't know of a definitive way to classify certain unicode characters as "confusables" automatically. For this group though, you could at least configure a range to capture most of these. @sheldonhull recently put up PR #185 to add instructions on doing so to the README.

@timkrins
Copy link
Author

timkrins commented Nov 11, 2021

@TheSench there is a list of them here: https://www.unicode.org/Public/security/14.0.0/confusables.txt

License for Unicode data files is here: https://www.unicode.org/license.txt

@timkrins
Copy link
Author

timkrins commented Nov 11, 2021

I can see @alexdima has created an issue in microsoft/vscode to perform this type of functionality natively (and the task assigned to @hediet in the November iteration plan) - microsoft/vscode#136437

@TheSench
Copy link
Collaborator

@TheSench there is a list of them here: https://www.unicode.org/Public/security/14.0.0/confusables.txt

License for Unicode data files is here: https://www.unicode.org/license.txt

Thanks for the links, I'll take a look into those. I'd love to see this become a feature of VSCode itself, but until that comes, we'll see what can be done here.

@ZaLiTHkA
Copy link

greetings.. I found my way to this issue after reading a post by Chris Coyier titled The Invisible JavaScript Backdoor, which in turn linked to a source article by Wolfgang Ettlinger with the same title.

I've already extended my local gremlins.characters array with the following:

"3164": {
  "description": "'HANGUL FILLER'",
  "level": "error"
}

but not everybody will know about this "problem", so I feel this should be included in the extension's internal gremlin characters list..

is there any plan for this at the moment, or is it sitting waiting for more information and/or motivation?

@timkrins
Copy link
Author

@ZaLiTHkA see activity in issue linked above about this unicode-flagging feature being available in vscode natively.

@timkrins
Copy link
Author

Since vscode November 2021 (version 1.63) unicode highlighting functionality is native!
See https://code.visualstudio.com/updates/v1_63#_unicode-highlighting in the changelog.
Thanks @nhoizey and gremlins, you were great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants