Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode Variables #1654

Closed
NeilFraser opened this issue Feb 22, 2018 · 4 comments
Closed

Unicode Variables #1654

NeilFraser opened this issue Feb 22, 2018 · 4 comments
Assignees

Comments

@NeilFraser
Copy link
Contributor

Current behaviour

Blockly variables have their Unicode characters encoded. Thus the variable '中国' is turned into '_E4_B8_AD_E5_9B_BD'. Such JavaScript is legal, but completely unreadable.

Expected behaviour

It turns out that Unicode characters are allowed in JavaScript variable names.

var 中国 = 'China';
alert(中国);

Or:

var Hͫ̆̒̐ͣ̊̄ͯ͗͏̵̗̻̰̠̬͝ͅE̴̷̬͎̱̘͇͍̾ͦ͊͒͊̓̓̐_̫̠̱̩̭̤͈̑̎̋ͮͩ̒͑̾͋͘Ç̳͕̯̭̱̲̣̠̜͋̍O̴̦̗̯̹̼ͭ̐ͨ̊̈͘͠M̶̝̠̭̭̤̻͓͑̓̊ͣͤ̎͟͠E̢̞̮̹͍̞̳̣ͣͪ͐̈T̡̯̳̭̜̠͕͌̈́̽̿ͤ̿̅̑Ḧ̱̱̺̰̳̹̘̰́̏ͪ̂̽͂̀͠ = 'Zalgo';
alert(Hͫ̆̒̐ͣ̊̄ͯ͗͏̵̗̻̰̠̬͝ͅE̴̷̬͎̱̘͇͍̾ͦ͊͒͊̓̓̐_̫̠̱̩̭̤͈̑̎̋ͮͩ̒͑̾͋͘Ç̳͕̯̭̱̲̣̠̜͋̍O̴̦̗̯̹̼ͭ̐ͨ̊̈͘͠M̶̝̠̭̭̤̻͓͑̓̊ͣͤ̎͟͠E̢̞̮̹͍̞̳̣ͣͪ͐̈T̡̯̳̭̜̠͕͌̈́̽̿ͤ̿̅̑Ḧ̱̱̺̰̳̹̘̰́̏ͪ̂̽͂̀͠);

Here's documentation regarding what is and isn't allowed:
https://mathiasbynens.be/notes/javascript-identifiers

@rachel-fenichel
Copy link
Collaborator

rachel-fenichel commented Feb 22, 2018

@AnmAtAnm
Copy link
Contributor

AnmAtAnm commented Feb 22, 2018

This doesn't appear to be occuring at a JavaScript specific level. (/me sneaks away for it can be reassigned.)

@NeilFraser
Copy link
Contributor Author

Yup, it's in Blockly.Names.prototype.safeName_ which states:

Given a proposed entity name, generate a name that conforms to the
[_A-Za-z][_A-Za-z0-9]* format that most languages consider legal for
variables.

A quick check of Dart, Lua and Python 2 shows that they do not support Unicode variables, though Python 3, JavaScript and PHP do.

There's a feasible way to implement this in JavaScript:

eval('"use strict"; var ' + x)

where x is the first letter then the first two letters, then the first three letters of the name. If it blows up, then encode the offending letter. Also, '/n' and '/r' and ';' need to be escaped up front to prevent injection attacks.

@cpcallen
Copy link
Contributor

cpcallen commented Sep 13, 2022

There's a feasible way to implement this in JavaScript:

eval('"use strict"; var ' + x)

Feasible, yes, but not exactly sensible, as the list of characters that would need to be escaped first is longer than just [\n\r;] (possibly much longer, and certainly not obvious).

A better approach is probably to use a regexp, even though the required regexp 8-10kB long and depends on what version of ECMAScript and Unicode one is targeting.

See: Valid JavaScript variable names in ES2015.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants