Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"surprising" Javascript UTF-8 cstring len behaviour #10911

Open
zevv opened this issue Mar 27, 2019 · 3 comments
Open

"surprising" Javascript UTF-8 cstring len behaviour #10911

zevv opened this issue Mar 27, 2019 · 3 comments

Comments

@zevv
Copy link
Contributor

zevv commented Mar 27, 2019

What is the semantics of len() on unicode cstrings in javascript? Is the following behaviour expected?

let s1 = "♜♞♝♛♚♝♞♜"
const s2 = "♜♞♝♛♚♝♞♜"

echo "lit   string  ", "♜♞♝♛♚♝♞♜".len
echo "lit   cstring ", "♜♞♝♛♚♝♞♜".cstring.len
echo "let   string  ", s1.len
echo "let   cstring ", s1.cstring.len
echo "const string  ", s2.len
echo "const cstring ", s2.cstring.len

native:

lit   string  24
lit   cstring 24
let   string  24
let   cstring 24
const string  24
const cstring 24

Javascript:

lit   string  24
lit   cstring 24
let   string  24
let   cstring 8
const string  24
const cstring 24
@zevv zevv changed the title Javascript cstring len mismatch Javascript UTF-8 cstring len mismatch Mar 27, 2019
@zevv zevv changed the title Javascript UTF-8 cstring len mismatch "surprising" Javascript UTF-8 cstring len behaviour Mar 27, 2019
@Araq
Copy link
Member

Araq commented Mar 27, 2019

cstring is mapped to JS strings and so len should return what JS's length returns.

@zevv
Copy link
Contributor Author

zevv commented Mar 27, 2019

Fair enough, but the behaviour is different for consts and literals then for variables - which might surprise some users, me included.

@krux02
Copy link
Contributor

krux02 commented Mar 29, 2019

I can confirm, this is a bug. Only the value of the let expression is correct.

var s1_245005 = makeNimstrLit("\xE2\x99\x9C\xE2\x99\x9E\xE2\x99\x9D\xE2\x99\x9B\xE2\x99\x9A\xE2\x99\x9D\xE2\x99\x9E\xE2\x99\x9C");
rawEcho(makeNimstrLit("lit   string  "), makeNimstrLit("24"));
rawEcho(makeNimstrLit("lit   cstring "), makeNimstrLit("24"));
rawEcho(makeNimstrLit("let   string  "), cstrToNimstr(((s1_245005 != null ? s1_245005.length : 0))+""));
rawEcho(makeNimstrLit("let   cstring "), cstrToNimstr(((toJSStr(s1_245005) != null ? toJSStr(s1_245005).length : 0))+""));
rawEcho(makeNimstrLit("const string  "), makeNimstrLit("24"));
rawEcho(makeNimstrLit("const cstring "), makeNimstrLit("24"));

In the other cases Nim thinks in can calculate the size at compile time when in fact it can't. When I tried to resolve this bug, I found out that in semMagic the magic to calculate the length of the cstring is mLengthArray, not mLengthStr as it should be, at least by the overloads that are in system.nim.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants