Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

第 73 期(JavaScript-ECMAScript-语法):Unicode字符与编码互转 #76

Open
wingmeng opened this issue Jul 29, 2019 · 0 comments
Open

Comments

@wingmeng
Copy link
Collaborator

说起 Unicode 字符与编码的互转,我们很容易想到 charCodeAtfromCharCode 两个方法。

  • String.prototype.charCodeAt()
  • String.fromCharCode()

注意:charCodeAtString 的原型方法,fromCharCodeString 的静态方法,它们的使用方式是不同的。

用法:

'A'.charCodeAt();  // 65
String.fromCharCode(65);  // "A"

很简单对吧?一个是 charCode 的 “at”,即字符的编码位于哪里;另一个是 charCode 的 “from”,即从编码找出对应的字符,容易理解和记忆。然而当我们遇到某些特殊字符时就出问题了:

'🚀'.charCodeAt();  // 55357
String.fromCharCode(55357);  // "�"

'𠆧'.charCodeAt();  // 55360
String.fromCharCode(55360);  // "�"

可以看到我们无法通过获取的编码还原出字符,这是因为上面的 🚀𠆧 都是 4 字节字符(length 长度为 2 而不是 1),它们各自的 Unicode 编码有 2 组,所以要获取完整的编码值得这样写:

'🚀'.charCodeAt(0);  // 前两个字节的值:55357
'🚀'.charCodeAt(1);  // 后两个字节的值:56960

'𠆧'.charCodeAt(0);  // 前两个字节的值:55360
'𠆧'.charCodeAt(1);  // 后两个字节的值:56743

如何将上面这种 4 字节组成的 charCode 值解析为字符串呢?需要将 2 组编码都传入 fromCharCode 中进行解析:

String.fromCharCode(55357, 56960);  // "🚀"
String.fromCharCode(55360, 56743);  // "𠆧"

这在实际使用时会很不便,我们需要手动判断字符是否为 4 字节字符,还需要做相应处理,好在 ES6 中新增了 codePointAtfromCodePoint 两个 API 来解决这个问题,它们的使用方法和原有的 charCodeAtfromCharCode 基本一致:

'A'.codePointAt();  // 65
String.fromCodePoint(65);  // "A"

'🚀'.codePointAt();  // 128640
String.fromCodePoint(128640);  // "🚀"

'𠆧'.codePointAt();  // 131495
String.fromCodePoint(131495);  // "𠆧"

charCodeAt

fromCodePoint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant