Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Segfault on node 0.12.4, to do with unicode and buffers #25583

Closed
myndzi opened this issue Jun 26, 2015 · 14 comments
Closed

Segfault on node 0.12.4, to do with unicode and buffers #25583

myndzi opened this issue Jun 26, 2015 · 14 comments

Comments

@myndzi
Copy link

myndzi commented Jun 26, 2015

I have a core dump now from a binary compiled with symbols, but I'm not sure what to do with it. I've managed to dump the data from the buffer being operated on to a file, but I cannot reproduce the crash in javascript code. I'm uncertain how exactly things got passed through and along. The backtrace looks like this:

#0  v8::base::OS::Abort () at ../deps/v8/src/base/platform/platform-posix.cc:278
#1  0x08bfd8b4 in V8_Fatal (file=0x8d58527 "../deps/v8/src/unicode.cc", line=320, format=0x8d58516 "CHECK(%s) failed") at ../deps/v8/src/base/logging.cc:87
#2  0x08a8d695 in unibrow::Utf8DecoderBase::WriteUtf16Slow (stream=0xb24203d "<data>"..., data=0x5b4d7c3e, data_length=1) at ../deps/v8/src/unicode.cc:320
#3  0x0879f02e in unibrow::Utf8Decoder<512u>::WriteUtf16 (this=0xa10ba40, data=0x5b4d3ff8, length=7714) at ../deps/v8/src/unicode-inl.h:197
#4  0x0878ed6d in v8::internal::Factory::NewStringFromUtf8 (this=0xa0fe080, string=..., pretenure=v8::internal::NOT_TENURED) at ../deps/v8/src/factory.cc:263
#5  0x0862c24c in v8::(anonymous namespace)::NewString (factory=0xa0fe080, type=v8::String::kNormalString, string=...) at ../deps/v8/src/api.cc:5333
#6  0x0863b37d in v8::(anonymous namespace)::NewString<char> (v8_isolate=0xa0fe080, location=0x8c4d06c "v8::String::NewFromUtf8()", env=0x8c4d058 "String::NewFromUtf8", data=0xb2401a0 "<data>"..., type=v8::String::kNormalString, length=7834) at ../deps/v8/src/api.cc:5378
#7  0x0862c37d in v8::String::NewFromUtf8 (isolate=0xa0fe080, data=0xb2401a0 "<data>"..., type=v8::String::kNormalString, length=7834) at ../deps/v8/src/api.cc:5397
#8  0x08b99bfd in node::StringBytes::Encode (isolate=0xa0fe080, buf=0xb2401a0 "<data>"..., buflen=7834, encoding=node::UTF8) at ../src/string_bytes.cc:727
#9  0x08b69f43 in node::Buffer::StringSlice<(node::encoding)1> (args=...) at ../src/node_buffer.cc:266
#10 0x08b685a5 in node::Buffer::Utf8Slice (args=...) at ../src/node_buffer.cc:281
#11 0x08649a59 in v8::internal::FunctionCallbackArguments::Call (this=0xbf7fe95c, f=0x8b68594 <node::Buffer::Utf8Slice(v8::FunctionCallbackInfo<v8::Value> const&)>) at ../deps/v8/src/arguments.cc:33
#12 0x0867f8a5 in v8::internal::HandleApiCallHelper<false> (args=..., isolate=0xa0fe080) at ../deps/v8/src/builtins.cc:1144
#13 0x0867a996 in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0xa0fe080) at ../deps/v8/src/builtins.cc:1161
#14 0x0867a975 in v8::internal::Builtin_HandleApiCall (args_length=4, args_object=0xbf7fea34, isolate=0xa0fe080) at ../deps/v8/src/builtins.cc:1160
#15 0x5340a3f6 in ?? ()
#16 0x5347d7ea in ?? ()
#17 0x5340b93b in ?? ()
#18 0x5563ccc4 in ?? ()
#19 0x534332f7 in ?? ()
#20 0x53430b73 in ?? ()
#21 0x5f01ad25 in ?? ()
#22 0x53b36df8 in ?? ()
#23 0xb152490f in ?? ()
#24 0x5340b93b in ?? ()
#25 0x2b9bc4b4 in ?? ()
#26 0x2b97af56 in ?? ()
#27 0x53bf31e2 in ?? ()
#28 0x5340b93b in ?? ()
#29 0x53b3bb00 in ?? ()
#30 0x53447015 in ?? ()
#31 0x5342574a in ?? ()
#32 0x0877cdbf in v8::internal::Invoke (is_construct=false, function=..., receiver=..., argc=2, args=0xbf7feeec) at ../deps/v8/src/execution.cc:91
#33 0x0877d148 in v8::internal::Execution::Call (isolate=0xa0fe080, callable=..., receiver=..., argc=2, argv=0xbf7feeec, convert_receiver=true) at ../deps/v8/src/execution.cc:141
#34 0x086287ff in v8::Function::Call (this=0xa128928, recv=..., argc=2, argv=0xbf7feeec) at ../deps/v8/src/api.cc:4020
#35 0x08b4bae2 in node::AsyncWrap::MakeCallback (this=0xa13c8b8, cb=..., argc=2, argv=0xbf7feeec) at ../src/async-wrap.cc:136
#36 0x08b4d650 in node::AsyncWrap::MakeCallback (this=0xa13c8b8, symbol=..., argc=2, argv=0xbf7feeec) at ../src/async-wrap-inl.h:99
#37 0x08b8f596 in node::ZCtx::After (work_req=0xa13c92c, status=0) at ../src/node_zlib.cc:341
#38 0x08be72c6 in uv__queue_done (w=0xa13c958, err=0) at ../deps/uv/src/threadpool.c:257
#39 0x08be720a in uv__work_done (handle=0x917c3a0 <default_loop_struct+96>) at ../deps/uv/src/threadpool.c:236
#40 0x08be9106 in uv__async_event (loop=0x917c340 <default_loop_struct>, w=0x917c42c <default_loop_struct+236>, nevents=1) at ../deps/uv/src/unix/async.c:92
#41 0x08be9263 in uv__async_io (loop=0x917c340 <default_loop_struct>, w=0x917c430 <default_loop_struct+240>, events=1) at ../deps/uv/src/unix/async.c:132
#42 0x08bfa7d0 in uv__io_poll (loop=0x917c340 <default_loop_struct>, timeout=0) at ../deps/uv/src/unix/linux-core.c:324
#43 0x08be9c87 in uv_run (loop=0x917c340 <default_loop_struct>, mode=UV_RUN_ONCE) at ../deps/uv/src/unix/core.c:324
#44 0x08b61c41 in node::Start (argc=2, argv=0xa0fe008) at ../src/node.cc:3722
#45 0x08b88b58 in main (argc=2, argv=0xbf802664) at ../src/node_main.cc:65

(I've omitted the actual data snippits for readability)

As best as I can tell, the buffer being worked with is split on the first byte of a four-byte utf-8 sequence representing a > 0xFFFF value (U+1F495); it's part of an HTTP response. The error is obvious:

https://github.com/joyent/node/blob/v0.12.4/deps/v8/src/unicode.cc#L317-L321

An assumption is made here, that if there's a UTF-16 surrogate pair, both parts exist, but that's not what's happening. It seems assumed that such a case won't reach this code. I was unable to follow the code here well enough to come up with a way to reproduce the problem in JS code:

https://github.com/joyent/node/blob/v0.12.4/deps/v8/src/factory.cc#L231-L265

Please let me know what if anything I can do to help nail this down further!

@myndzi
Copy link
Author

myndzi commented Jun 26, 2015

I've managed to construct a simple JS test case:

'use strict';

var pattern = new Buffer([0xF0, 0x9F, 0x92, 0x95]);
var data = new Buffer(8000);
for (var i = 0; i < 8000; i += 4) {
    pattern.copy(data, i, 0, 4);
}
data.slice(0, 7833).toString();

While I was working this out, the crash was unreliable: the same code would crash, or not crash. There was at one point a loop to repeatedly call a function that copied part of the buffer into a new buffer. The above code appears to crash 100% of the time for me each execution, but if it doesn't for you, you might try looping it and/or the buffer copy thing.

@aasierra
Copy link

@myndzi So I attempted to reproduce this with the following code and could not have it crash.

'use strict';
for (var j =0; j < 1000; j++) {
  var pattern = new Buffer([0xF0, 0x9F, 0x92, 0x95]);
  var data = new Buffer(8000);
  for (var i = 0; i < 8000; i += 4) {
      pattern.copy(data, i, 0, 4);
  }
  console.log(data.slice(0, 7833).toString())
}

But no luck. Is this what you recommended to reproduce? Also I simply ran this by using node file.js with node 10.31. What node version are you running?

@myndzi
Copy link
Author

myndzi commented Jun 28, 2015

Er, as it says in the title of the issue, this is for node 0.12. It probably crashes in 0.11.x and up -- I was on 0.11 when I first observed it and updated to latest to see if it had been fixed.

When I had it looping, it was only the slice / composition part, so something like this:

var pattern = new Buffer([0xF0, 0x9F, 0x92, 0x95]);
var data = new Buffer(8000);
for (var i = 0; i < 8000; i += 4) {
    pattern.copy(data, i, 0, 4);
}
for (var i = 0; i < 10e5; i++) {
    data.slice(0, 7833).toString();
}

...but I don't imagine on 0.10.31, if the other didn't crash, this will.

@aasierra
Copy link

@myndzi Well it is probably a Mac specific issue but I get Bus error 10 when using Node 0.12.5 as well as 0.12.4

@myndzi
Copy link
Author

myndzi commented Jul 1, 2015

http://stackoverflow.com/a/212585/2156140

It's surely the same cause, just a different effect on a different platform

@trevnorris
Copy link

Does it crash if you use toString('binary')?

@myndzi
Copy link
Author

myndzi commented Jul 1, 2015

No. The problem appears to specifically be in the utf8->utf16 conversion path. Binary doesn't have such a thing as a 'partially complete encoded character' at the end of the line

@trevnorris
Copy link

The first test case from @myndzi caused a segfault on my Linux box.

@myndzi
Copy link
Author

myndzi commented Jul 1, 2015

Does seem not to crash on 0.10.x for me as well, FWIW. (I think there's a possibility that the resultant string is malformed, however, if you were to combine two strings that split over such a character break)

@aasierra
Copy link

aasierra commented Jul 1, 2015

@myndzi Just a note that after looking at the most recent code sample you sent, the first slice always crashes for me with the node version built off master. The for loop is not even needed. I am having trouble finding the code path slice triggers though.

@myndzi
Copy link
Author

myndzi commented Jul 2, 2015

You'll notice the reproduction I arrived at doesn't include the loop; I only mentioned it because there was some random behavior in the versions leading up to it, where looping helped. The first code sample should be preferred.

Slice isn't necessary to the crash, it just creates the bad data. The crash is caused by the .toString() method; the string is converted to utf-16, but since the last character of the buffer is the first byte of a four-byte sequence, the converted utf-16 becomes the first half of a utf-16 surrogate pair. The WriteUtf16 loop is called from the inlined unicode helpers and copies the bulk of the data, and then the WriteUtf16Slow function is called to 'finish' the copy. But it gets called with only one byte left (len = 1), and that byte is the first half of a surrogate pair; in the source, this causes the code to assume that there are at least two bytes left -- though there's an assertion that causes the crash when this assumption is not met.

Some previous work decoding utf-8 appears to be necessary; I tried it with a buffer that ended in just the one character, but it crashed very intermittently. When I filled the buffer with plenty of copies so that some decoding went on before it got to the slow-copy part, it began to crash reliably, and that's how I arrived at the above code.

@myndzi
Copy link
Author

myndzi commented Jul 2, 2015

For what it's worth, even if it doesn't crash, your data will get messed up if it cuts in the middle of a UTF-8 sequence, if you're stringifying each chunk. This could happen because people don't know how to work with buffers and are just using setEncoding, or they're piping to some streaming module that does the same (htmlparser2 operates with strings). I wrote this: https://www.npmjs.com/package/utf8-align-stream which should ensure that a Node stream always outputs whole UTF-8 sequences when they're available by conditionally buffering the last <= 5 bytes. This will make the strings compose correctly as well as prevent this crash.

@jasnell
Copy link
Member

jasnell commented Jul 10, 2015

@trevnorris ... I assume this is resolved now?

@trevnorris
Copy link

Ah yes. Sorry.

@jasnell jasnell closed this as completed Jul 10, 2015
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants