Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new Intl.Segmenter().segment() causes SEGFAULT with --with-intl=small-icu and no runtime ICU data #51752

Open
septatrix opened this issue Feb 13, 2024 · 9 comments
Assignees
Labels
i18n-api Issues and PRs related to the i18n implementation. v8 engine Issues and PRs related to the V8 dependency.

Comments

@septatrix
Copy link

Version

v20.10.0

Platform

Linux acfbcf435a24 6.7.4-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Feb 5 22:21:14 UTC 2024 x86_64 GNU/Linux

Subsystem

Intl

What steps will reproduce the bug?

  1. Obtain a nodejs build compiled with --with-intl=small-icu.*
  2. Execute new Intl.Segmenter().segment() in the REPL.

*In my case that was the nodejs package from Fedora 39 without nodejs-full-i18n. This can be obtained as follows:

  1. Create a docker/podman container with Fedora 39: podman run --rm -it fedora:39
  2. dnf install -y --setopt=install_weak_deps=False nodejs

How often does it reproduce? Is there a required condition?

Always, as long as no runtime ICU data is present.

What is the expected behavior? Why is that the expected behavior?

NodeJS should fallback to a locale-unaware string separator, not provide Intl.Segmenter at all, or raise a JS exception.
The status quo offers no possibility to know if segmentation is safe and calling it results in a segmentation fault and therefore instant termination of the process.

What do you see instead?

A segmentation fault

Additional information

Backtrace
Core was generated by `node'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  v8::internal::JSSegments::Create () at ../../deps/v8/src/objects/js-segments.cc:33
33            segmenter->icu_break_iterator().raw()->clone();                                                                                                                           
[Current thread is 1 (Thread 0x7f0039bc7c40 (LWP 49))]
(gdb) bt
#0  v8::internal::JSSegments::Create () at ../../deps/v8/src/objects/js-segments.cc:33
#1  0x00007f003c910b24 in Builtin_Impl_SegmenterPrototypeSegment () at ../../deps/v8/src/builtins/builtins-intl.cc:1170
#2  v8::internal::Builtin_SegmenterPrototypeSegment () at ../../deps/v8/src/builtins/builtins-intl.cc:1160
#3  0x00007f003c755df6 in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit () from /lib64/libnode.so.115
#4  0x00007f003c6c7d1c in Builtins_InterpreterEntryTrampoline () from /lib64/libnode.so.115
#5  0x0000167f10a804e9 in ?? ()
#6  0x000003439a820159 in ?? ()
#7  0x0000000500000000 in ?? ()
#8  0x0000167f10a805b9 in ?? ()
#9  0x000027cb1169d0c9 in ?? ()
#10 0x000027cb1169d0c9 in ?? ()
#11 0x000003439a820159 in ?? ()
#12 0x0000167f10a804e9 in ?? ()
#13 0x0000004900000000 in ?? ()
#14 0x000000e7305257a1 in ?? ()
#15 0x0000000000000002 in ?? ()
#16 0x000000e730525f49 in ?? ()
#17 0x000003439a80ef49 in ?? ()
#18 0x00007ffefb915ea8 in ?? ()
#19 0x00007f003c6c60dc in Builtins_JSEntryTrampoline () from /lib64/libnode.so.115
#20 0x000003439a80eee1 in ?? ()
#21 0x000027cb1169c439 in ?? ()
#22 0x000000e730525f49 in ?? ()
#23 0x000000000000002c in ?? ()
#24 0x00007ffefb915f10 in ?? ()
#25 0x00007f003c6c5e03 in Builtins_JSEntry () from /lib64/libnode.so.115
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

On a Fedora 39 system one should be able to open the coredump with:
DEBUGINFOD_URLS=https://debuginfod.fedoraproject.org/ gdb /usr/bin/node-20 coredump

coredump.zip

@targos targos added v8 engine Issues and PRs related to the V8 dependency. i18n-api Issues and PRs related to the i18n implementation. labels Feb 14, 2024
@targos
Copy link
Member

targos commented Feb 14, 2024

Backtrace:

#0  0x0000fffff68cba28 in v8::internal::JSSegments::Create(v8::internal::Isolate*, v8::internal::Handle<v8::internal::JSSegmenter>, v8::internal::Handle<v8::internal::String>) () from /lib64/libnode.so.115
#1  0x0000fffff64bfee0 [PAC] in v8::internal::Builtin_SegmenterPrototypeSegment(int, unsigned long*, v8::internal::Isolate*) ()
   from /lib64/libnode.so.115
#2  0x0000fffff6305964 [PAC] in Builtins_CEntry_Return1_ArgvOnStack_BuiltinExit () from /lib64/libnode.so.115

I don't know if this is a V8 bug or if we don't initialize it correctly.

/cc @nodejs/i18n-api

@richardlau
Copy link
Member

FWIW we strip out ICU break iterator data with small-icu

"brkitr": "none",

"remove": [
"cnvalias.icu",
"postalCodeData.res",
"genderList.res",
"brkitr/root.res",
"unames.icu"
],

Also https://bugs.chromium.org/p/v8/issues/detail?id=3345 referenced

# TODO(srl295): reenable following pending
# https://code.google.com/p/v8/issues/detail?id=3345
# (saves some space)
'UCONFIG_NO_BREAK_ITERATION=0',

cc @srl295

@srl295
Copy link
Member

srl295 commented Feb 14, 2024

FWIW we strip out ICU break iterator data with small-icu

"brkitr": "none",

"remove": [
"cnvalias.icu",
"postalCodeData.res",
"genderList.res",
"brkitr/root.res",
"unames.icu"
],

Also https://bugs.chromium.org/p/v8/issues/detail?id=3345 referenced

# TODO(srl295): reenable following pending
# https://code.google.com/p/v8/issues/detail?id=3345
# (saves some space)
'UCONFIG_NO_BREAK_ITERATION=0',

cc @srl295

In 2014 there wasn't an Intl.Segmenter.. probably these should be included.

However, v8 shouldn't crash, that should be fixed in v8.

We do have some code (IIRC) that detects small-icu and monkeypatched Intl with a warning about small icu.

@septatrix
Copy link
Author

Apart from the PR for a regression test is someone also working towards fixing this? Will this be solved by adjusting the ICU configuration in NodeJS to include a locale-unaware fallback, must this be fixed in V8, or both? (In the case that actions are required upstream it would be great to link a ticket)

@srl295
Copy link
Member

srl295 commented Mar 10, 2024

Apart from the PR for a regression test is someone also working towards fixing this? Will this be solved by adjusting the ICU configuration in NodeJS to include a locale-unaware fallback, must this be fixed in V8, or both? (In the case that actions are required upstream it would be great to link a ticket)

sorry i did not take a look at this more yet

a couple of different concerns:

  1. fixing v8 to not crash- that needs to be filed with v8.
  2. fixing node to include some segmentation data even with small-icu: Not clear we should do this. small icu wont' be small if it includes all segmentation data.
  3. remove the "TODO reenable UCONFIG_NO_BREAK_ITERATION=1" - this comment should be removed.

@srl295 srl295 self-assigned this Mar 10, 2024
@septatrix
Copy link
Author

a couple of different concerns:

  1. fixing v8 to not crash- that needs to be filed with v8.

If you point me to where this can be done (monorail?) I could do this though I am not sure which information to provide them so it might make more sense for some of you to do this.

  1. fixing node to include some segmentation data even with small-icu: Not clear we should do this. small icu wont' be small if it includes all segmentation data.

I was thinking of a dummy implementation similar to the other i18n functionality (which to my knowledge uses either very simplistic/naive implementatios or ships only with the data required for the english locale. I.e. simply split grapheme s at utf8 boundaries, words at spaces/punctuation and sentences at a full stop, question/exclamation mark etc.
Alternatively the object should not exist on Intl which would at least allow detection but once browser support is ubiquitous, libraries would drop that which (or newer ones never check it) which will result in errors.

  1. remove the "TODO reenable UCONFIG_NO_BREAK_ITERATION=1" - this comment should be removed.

I have no clue about this

@srl295
Copy link
Member

srl295 commented Mar 10, 2024

Alternatively the object should not exist on Intl

we remove(d) it in some circumstances. Again i need to dig that out.

@septatrix
Copy link
Author

Are you able to provide more information on where this currently sits in the backlog and when this might get resolved?

@srl295
Copy link
Member

srl295 commented Aug 12, 2024

I have not been able to look further into this myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i18n-api Issues and PRs related to the i18n implementation. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

4 participants