Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lept_encode_utf8 函数关于位操作处理疑问 #56

Closed
Alex1990 opened this issue Oct 14, 2016 · 3 comments
Closed

lept_encode_utf8 函数关于位操作处理疑问 #56

Alex1990 opened this issue Oct 14, 2016 · 3 comments

Comments

@Alex1990
Copy link

Alex1990 commented Oct 14, 2016

叶老师请看注释

static void lept_encode_utf8(lept_context* c, unsigned u) {
    if (u <= 0x7F) 
        PUTC(c, u & 0xFF);
    else if (u <= 0x7FF) {
        PUTC(c, 0xC0 | ((u >> 6) & 0xFF)); // 这行最后为什么是 0xFF,为啥我觉得是 0x1F,即`11111`
        PUTC(c, 0x80 | ( u       & 0x3F));
    }
    else if (u <= 0xFFFF) {
        PUTC(c, 0xE0 | ((u >> 12) & 0xFF)); // 这行最后为什么是 0xFF,为啥我觉得是 0xF,即`1111`
        PUTC(c, 0x80 | ((u >>  6) & 0x3F));
        PUTC(c, 0x80 | ( u        & 0x3F));
    }
    else {
        assert(u <= 0x10FFFF);
        PUTC(c, 0xF0 | ((u >> 18) & 0xFF)); // 这行最后为什么是 0xFF,为啥我觉得是 0x7,即`111`
        PUTC(c, 0x80 | ((u >> 12) & 0x3F));
        PUTC(c, 0x80 | ((u >>  6) & 0x3F));
        PUTC(c, 0x80 | ( u        & 0x3F));
    }
}

我在做练习时,上面三处0xFF的地方写成我觉得的值,通过了测试。

@rqycpp
Copy link

rqycpp commented Oct 15, 2016

没有问题,只是两种写法而已。相比之下我觉得叶老师的写法更简洁一些(即不需要特殊处理)。
以码点U+FFFF为例:

  1. U+FFFF(码点位数16)的二进位为1111 111111 111111
  2. u>>12 = 1111(二进位)
  3. (u>>12) & 0xFF == &(u>>12) & 0xF

也就是说,你所关注的那些不经该进行&运算的位实际上是0,而0&1 == 0,对结果无影响。
00001111 & 11111111 == 00001111
1111 & 1111 == 1111

补充一点:你的第一条注释中0x2F == 101111,事实上0x1F == 11111。

@miloyip
Copy link
Owner

miloyip commented Oct 15, 2016

第一个字节其实不写 & 也可以的,只是一些编译器会错误地 warn 有机会溢出。

@Alex1990
Copy link
Author

感谢 @rqy1994 @miloyip 解答和指正,有点儿理解了。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants