Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371. #4935

Open
FlydragonTy opened this issue Jan 4, 2022 · 3 comments
Labels
bug Undesired behaviour

Comments

@FlydragonTy
Copy link

FlydragonTy commented Jan 4, 2022

JerryScript revision

Commit: a6ab5e9

Version: v3.0.0

Build platform

Ubuntu 18.04.5 LTS (Linux 4.19.128-microsoft-standard x86_64)

Ubuntu 18.04.5 LTS (Linux 5.4.0-44-generic x86_64)

Build steps
python ./tools/build.py --clean --debug --compile-flag=-fsanitize=address --compile-flag=-m32 --compile-flag=-g --strip=off --lto=off --logging=on --line-info=on --error-message=on --system-allocator=on --stack-limit=20
Test case

poc-as.txt

Execution steps & Output
$ ./jerryscript/build/bin/jerry poc.js

ICE: Assertion 'lit_is_valid_cesu8_string (string_p, string_size)' failed at jerryscript/jerry-core/ecma/base/ecma-helpers-string.c(ecma_new_ecma_string_from_utf8):371.
Error: ERR_FAILED_INTERNAL_ASSERTION
[1]    abort      jerry poc.js

Credits: Found by OWL337 team.

@rerobika rerobika added the bug Undesired behaviour label Jan 4, 2022
@ossy-szeged
Copy link
Contributor

ossy-szeged commented Jan 10, 2022

@rerobika I think it is not a bug, but a feature. "𞸋" is encoded in UTF-8 as 0xF09EB88B which is invaliid in CESU8. But of course we could raise a user friendly error message instead of assertion.

@dbatyai
Copy link
Member

dbatyai commented Jan 10, 2022

The issue is not with the "𞸋" character, all non-BMP characters are converted to cesu8 encoding during parsing.
The problem is that the first character is in the basic multilingual plane and should be encoded using 3 bytes, however it is encoded using 4 bytes in the input. This messes up the conversion logic, which always expects the cesu8 equivalent to be 6 bytes long.

@ossy-szeged
Copy link
Contributor

ossy-szeged commented Jan 11, 2022

+info, a simple /*𝔽*/ string fails with the same error if we build with tools/build.py --debug --function-to-string=on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Undesired behaviour
Projects
None yet
Development

No branches or pull requests

4 participants