Segfault on node 0.12.4, to do with unicode and buffers #25583
Description
I have a core dump now from a binary compiled with symbols, but I'm not sure what to do with it. I've managed to dump the data from the buffer being operated on to a file, but I cannot reproduce the crash in javascript code. I'm uncertain how exactly things got passed through and along. The backtrace looks like this:
#0 v8::base::OS::Abort () at ../deps/v8/src/base/platform/platform-posix.cc:278
#1 0x08bfd8b4 in V8_Fatal (file=0x8d58527 "../deps/v8/src/unicode.cc", line=320, format=0x8d58516 "CHECK(%s) failed") at ../deps/v8/src/base/logging.cc:87
#2 0x08a8d695 in unibrow::Utf8DecoderBase::WriteUtf16Slow (stream=0xb24203d "<data>"..., data=0x5b4d7c3e, data_length=1) at ../deps/v8/src/unicode.cc:320
#3 0x0879f02e in unibrow::Utf8Decoder<512u>::WriteUtf16 (this=0xa10ba40, data=0x5b4d3ff8, length=7714) at ../deps/v8/src/unicode-inl.h:197
#4 0x0878ed6d in v8::internal::Factory::NewStringFromUtf8 (this=0xa0fe080, string=..., pretenure=v8::internal::NOT_TENURED) at ../deps/v8/src/factory.cc:263
#5 0x0862c24c in v8::(anonymous namespace)::NewString (factory=0xa0fe080, type=v8::String::kNormalString, string=...) at ../deps/v8/src/api.cc:5333
#6 0x0863b37d in v8::(anonymous namespace)::NewString<char> (v8_isolate=0xa0fe080, location=0x8c4d06c "v8::String::NewFromUtf8()", env=0x8c4d058 "String::NewFromUtf8", data=0xb2401a0 "<data>"..., type=v8::String::kNormalString, length=7834) at ../deps/v8/src/api.cc:5378
#7 0x0862c37d in v8::String::NewFromUtf8 (isolate=0xa0fe080, data=0xb2401a0 "<data>"..., type=v8::String::kNormalString, length=7834) at ../deps/v8/src/api.cc:5397
#8 0x08b99bfd in node::StringBytes::Encode (isolate=0xa0fe080, buf=0xb2401a0 "<data>"..., buflen=7834, encoding=node::UTF8) at ../src/string_bytes.cc:727
#9 0x08b69f43 in node::Buffer::StringSlice<(node::encoding)1> (args=...) at ../src/node_buffer.cc:266
#10 0x08b685a5 in node::Buffer::Utf8Slice (args=...) at ../src/node_buffer.cc:281
#11 0x08649a59 in v8::internal::FunctionCallbackArguments::Call (this=0xbf7fe95c, f=0x8b68594 <node::Buffer::Utf8Slice(v8::FunctionCallbackInfo<v8::Value> const&)>) at ../deps/v8/src/arguments.cc:33
#12 0x0867f8a5 in v8::internal::HandleApiCallHelper<false> (args=..., isolate=0xa0fe080) at ../deps/v8/src/builtins.cc:1144
#13 0x0867a996 in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0xa0fe080) at ../deps/v8/src/builtins.cc:1161
#14 0x0867a975 in v8::internal::Builtin_HandleApiCall (args_length=4, args_object=0xbf7fea34, isolate=0xa0fe080) at ../deps/v8/src/builtins.cc:1160
#15 0x5340a3f6 in ?? ()
#16 0x5347d7ea in ?? ()
#17 0x5340b93b in ?? ()
#18 0x5563ccc4 in ?? ()
#19 0x534332f7 in ?? ()
#20 0x53430b73 in ?? ()
#21 0x5f01ad25 in ?? ()
#22 0x53b36df8 in ?? ()
#23 0xb152490f in ?? ()
#24 0x5340b93b in ?? ()
#25 0x2b9bc4b4 in ?? ()
#26 0x2b97af56 in ?? ()
#27 0x53bf31e2 in ?? ()
#28 0x5340b93b in ?? ()
#29 0x53b3bb00 in ?? ()
#30 0x53447015 in ?? ()
#31 0x5342574a in ?? ()
#32 0x0877cdbf in v8::internal::Invoke (is_construct=false, function=..., receiver=..., argc=2, args=0xbf7feeec) at ../deps/v8/src/execution.cc:91
#33 0x0877d148 in v8::internal::Execution::Call (isolate=0xa0fe080, callable=..., receiver=..., argc=2, argv=0xbf7feeec, convert_receiver=true) at ../deps/v8/src/execution.cc:141
#34 0x086287ff in v8::Function::Call (this=0xa128928, recv=..., argc=2, argv=0xbf7feeec) at ../deps/v8/src/api.cc:4020
#35 0x08b4bae2 in node::AsyncWrap::MakeCallback (this=0xa13c8b8, cb=..., argc=2, argv=0xbf7feeec) at ../src/async-wrap.cc:136
#36 0x08b4d650 in node::AsyncWrap::MakeCallback (this=0xa13c8b8, symbol=..., argc=2, argv=0xbf7feeec) at ../src/async-wrap-inl.h:99
#37 0x08b8f596 in node::ZCtx::After (work_req=0xa13c92c, status=0) at ../src/node_zlib.cc:341
#38 0x08be72c6 in uv__queue_done (w=0xa13c958, err=0) at ../deps/uv/src/threadpool.c:257
#39 0x08be720a in uv__work_done (handle=0x917c3a0 <default_loop_struct+96>) at ../deps/uv/src/threadpool.c:236
#40 0x08be9106 in uv__async_event (loop=0x917c340 <default_loop_struct>, w=0x917c42c <default_loop_struct+236>, nevents=1) at ../deps/uv/src/unix/async.c:92
#41 0x08be9263 in uv__async_io (loop=0x917c340 <default_loop_struct>, w=0x917c430 <default_loop_struct+240>, events=1) at ../deps/uv/src/unix/async.c:132
#42 0x08bfa7d0 in uv__io_poll (loop=0x917c340 <default_loop_struct>, timeout=0) at ../deps/uv/src/unix/linux-core.c:324
#43 0x08be9c87 in uv_run (loop=0x917c340 <default_loop_struct>, mode=UV_RUN_ONCE) at ../deps/uv/src/unix/core.c:324
#44 0x08b61c41 in node::Start (argc=2, argv=0xa0fe008) at ../src/node.cc:3722
#45 0x08b88b58 in main (argc=2, argv=0xbf802664) at ../src/node_main.cc:65
(I've omitted the actual data snippits for readability)
As best as I can tell, the buffer being worked with is split on the first byte of a four-byte utf-8 sequence representing a > 0xFFFF value (U+1F495); it's part of an HTTP response. The error is obvious:
https://github.com/joyent/node/blob/v0.12.4/deps/v8/src/unicode.cc#L317-L321
An assumption is made here, that if there's a UTF-16 surrogate pair, both parts exist, but that's not what's happening. It seems assumed that such a case won't reach this code. I was unable to follow the code here well enough to come up with a way to reproduce the problem in JS code:
https://github.com/joyent/node/blob/v0.12.4/deps/v8/src/factory.cc#L231-L265
Please let me know what if anything I can do to help nail this down further!