New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support astral symbols #4

Open
mathiasbynens opened this Issue Oct 17, 2013 · 0 comments

Comments

Projects
None yet
1 participant
@mathiasbynens

mathiasbynens commented Oct 17, 2013

> htmlentities.decode('𝌆') // U+1D306 TETRAGRAM FOR CENTRE
'\uD306' // should be `\uD834\uDF06` i.e. `饾寙`

E.g. © decodes just fine, but 𝌆 doesn鈥檛 because String.fromCharCode(0x1D306) doesn鈥檛 work for astral values (i.e. values > 0xFFFF). U+1D306 is an astral symbol. Details here: http://mathiasbynens.be/notes/javascript-encoding

This can easily be fixed by using the Punycode module:

// Instead of鈥
String.fromCharCode(codePoint);
// 鈥hich only works for values from 0x0000 to 0xFFFF, use this:
punycode.ucs2.encode([ codePoint ]);
// 鈥hich works for all Unicode code points (i.e. values from 0x000000 to 0x10FFFF)

(Note: Punycode.js is bundled with Node.js v0.6.2+ but you could always add it to package.json anyway if you want to support older versions).

See he鈥檚 he.decode() for a working example that doesn鈥檛 rely on Punycode.js.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment