Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault on v10.7.0 #22358

Closed
tlbdk opened this issue Aug 16, 2018 · 23 comments
Closed

Segfault on v10.7.0 #22358

tlbdk opened this issue Aug 16, 2018 · 23 comments
Assignees
Labels
string_decoder Issues and PRs related to the string_decoder subsystem.

Comments

@tlbdk
Copy link
Contributor

tlbdk commented Aug 16, 2018

E PID 8 received SIGSEGV for address: 0x0
E /app/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2c0d)[0x7f4e099bac0d]
E /lib/x86_64-linux-gnu/libpthread.so.0(+0x128e0)[0x7f4e0bd958e0]
E node[0xe837ac]
E node(_ZN2v88internal7Factory20NewStringFromTwoByteENS0_6VectorIKtEENS0_13PretenureFlagE+0x19)[0xe83c09]
E node(_ZN2v86String14NewFromTwoByteEPNS_7IsolateEPKtNS_13NewStringTypeEi+0x9a)[0xac57ba]
E node(_ZN4node11StringBytes6EncodeEPN2v87IsolateEPKtmPNS1_5LocalINS1_5ValueEEE+0x29d)[0x95aaad]
E node[0x95c9ed]
E node[0xb44409]
E node(_ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE+0xb9)[0xb44f79]
E [0x1bb2750041bd]

  • Version:
    node -v
    v10.7.0
  • Platform:
    Linux d187343d97b6 4.9.93-linuxkit-aufs deps: update openssl to 1.0.1j #1 SMP Wed Jun 6 16:55:56 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 16, 2018

Same issue on v10.9.0:

E PID 8 received SIGSEGV for address: 0x0
E /app/node_modules/segfault-handler/build/Release/segfault-handler.node(+0x2c0d)[0x7f4dfe980c0d]
E /lib/x86_64-linux-gnu/libpthread.so.0(+0x128e0)[0x7f4e00d5b8e0]
E node[0xe9eb00]
E node(_ZN2v88internal7Factory20NewStringFromTwoByteENS0_6VectorIKtEENS0_13PretenureFlagE+0x19)[0xe9ef49]
E node(_ZN2v86String14NewFromTwoByteEPNS_7IsolateEPKtNS_13NewStringTypeEi+0x9a)[0xadf7ca]
E node(_ZN4node11StringBytes6EncodeEPN2v87IsolateEPKtmPNS1_5LocalINS1_5ValueEEE+0x29d)[0x96cebd]
E node[0x96edfd]
E node[0xb5996f]
E node(_ZN2v88internal21Builtin_HandleApiCallEiPPNS0_6ObjectEPNS0_7IsolateE+0xb9)[0xb5a4d9]
E [0x30fa8a1dc01d]

@addaleax
Copy link
Member

Can you share code that reproduces this? Do you use the official binaries downloaded from https://nodejs.org/, or from another source?

@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 16, 2018

Sadly no, It's a rather large application that only crashes in the docker container running in staging and production environments, so the issues seem to be Linux specific or at least not happening on MacOSX where we are doing development.

We are using the official binaries you can see the docker images we use here: https://github.com/connectedcars/docker-node

Also, we are not seeing the issue on Node 8.x.

If you could give us some instructions on extracting more debug information, we can help with that?

@baslr
Copy link
Contributor

baslr commented Aug 16, 2018

@tlbdk oom? can you run node with gdb?

@addaleax addaleax added the string_decoder Issues and PRs related to the string_decoder subsystem. label Aug 16, 2018
@addaleax
Copy link
Member

The relevant symbol here is 0x96edfd which is inside node::(anonymous namespace)::DecodeData(v8::FunctionCallbackInfo<v8::Value> const&), which is (unfortunately) not provided by segfault-handler.

How consistently are you seeing these issues? Is there a chance you could run modified node binaries, or possibly even compile a debug build of Node?

@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 17, 2018

@baslr Don't think it's an out of memory issue, we can run node with gdb, any instructions on how to do that?

@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 17, 2018

@addaleax It seems pretty consistent, so it would be easy to run a modified node binary. If you have a prebuilt debug build of node that would be the easiest.

@addaleax
Copy link
Member

@tlbdk I think the ideal situation would be causing a crash with a debug build to generate a core dump, then provide that debug binary + the core dump together (warning: these files are going to be huge).

If you want to run gdb yourself, you’d start gdb path/to/debug/binary and then enter run <node.js arguments> inside the shell. There, once the crash occurs, you could run bt to get a much better stack trace, use frame N to select individual frames from the stack as listed by bt, and p (expression) to print the values of variables etc.

@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 17, 2018

@addaleax ok, what entails a debug build, do you have any instructions, should I just enable all the debug flags?

 ./configure | grep debug
                 'debug_nghttp2': 'false',
                 'node_debug_lib': 'false',
                 'v8_optimized_debug': 0,

@addaleax
Copy link
Member

addaleax commented Aug 17, 2018

@tlbdk It’s ./configure --debug – what you’re looking at in that output seem to be feature flags, not compiler settings

@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 24, 2018

@addaleax we finally managed to build a debug image to run the code and have captured a core dump, note that we sanitized it for passwords using:

perl -pi -e 's/password/xxxxxxxx/g' core.node.7.1535125297

Hope this is ok, else please provide another method for doing this.

Core dump: https://storage.googleapis.com/connectedcars-staging-public/node-debug/core.node.7.1535125297.gz
Debug node: https://storage.googleapis.com/connectedcars-staging-public/node-debug/node.debug.gz

@addaleax
Copy link
Member

@tlbdk That doesn’t quite look like a debug build – those are typically in the range of ~600 MB binaries, whereas this one is the size of a regular Node binary…

./configure --debug generates two binaries, the regular one (node aka out/Release/node) and a debug build (node_g aka out/Debug/node).

@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 24, 2018

@addaleax We just did ./configure --debug && make install

I now added:

cp -f out/Debug/node /opt/node-v10.9.0-linux-x64/bin/node
ls -lh /usr/local/bin/node
-rwxr-xr-x 1 root root 575M Aug 24 18:22 /usr/local/bin/node

Do I need to do anything else?

I would be nice with a short howto guide on how to do this, would be happy to write one if you had a good location to put it?

The build tasks 2-3 hours and hitting the issue another 10-20 minutes so we might not have a new core dump before Monday.

@addaleax
Copy link
Member

@tlbdk That looks about right, yes. :)

Depending on how much you think a guide would be, BUILDING.md in this repo might be a good place?

@mathiastj
Copy link

@addaleax
Copy link
Member

@mathiastj Sorry, but I’m getting file format errors when trying to open either with gdb…

$ gdb node.debug core.node.8.1535359906
[...]
"/tmp/22358/node.debug": not in executable format: File format not recognized
"/tmp/22358/core.node.8.1535359906" is not a core dump: File format not recognized

Are you sure you the password sanitization didn’t do more to the files than it should have done?

(Feel free to run the above gdb command on the original core dump yourself, running backtrace on it should be pretty helpful too)

@mathiastj
Copy link

I get the same file format errors when trying to open the original core dump. I tried to extract another core dump but that too gives the same error.
I am able to open the core dumps in lldb though.

@mathiastj
Copy link

@addaleax Running gdb on the container that had created the core dump I was able to open the files. For reference the image is here: https://github.com/connectedcars/docker-node/blob/debug/debug/Dockerfile
Here is the output of backtrace:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
#1  0x00007f59240e92f1 in __GI_abort () at abort.c:79
#2  0x000055a20a26cd16 in node::Abort () at ../src/node.cc:1193
#3  0x000055a20a26ce04 in node::Assert (args=0x55a20ccc07e0 <node::(anonymous namespace)::MakeString(v8::Isolate*, char const*, unsigned long, node::encoding)::args>) at ../src/node.cc:1210
#4  0x000055a20a364545 in node::(anonymous namespace)::MakeString (isolate=0x55a20de0bbb0, data=0x55a20df972a1 "2", length=65534, encoding=node::UCS2) at ../src/string_decoder.cc:35
#5  0x000055a20a364d6b in node::StringDecoder::DecodeData (this=0x55a20de54480, isolate=0x55a20de0bbb0, data=0x55a20df972a1 "2", nread_ptr=0x7ffd26791c70) at ../src/string_decoder.cc:219
#6  0x000055a20a365134 in node::(anonymous namespace)::DecodeData (args=...) at ../src/string_decoder.cc:272
#7  0x000055a20a6d28e4 in v8::internal::FunctionCallbackArguments::Call (this=0x7ffd26791e10, handler=0x2a8d26b8a659) at ../deps/v8/src/api-arguments-inl.h:94
#8  0x000055a20a6d584e in v8::internal::(anonymous namespace)::HandleApiCallHelper<false> (isolate=0x55a20de0bbb0, function=..., new_target=..., fun_data=..., receiver=..., args=...) at ../deps/v8/src/builtins/builtins-api.cc:109
#9  0x000055a20a6d36eb in v8::internal::Builtin_Impl_HandleApiCall (args=..., isolate=0x55a20de0bbb0) at ../deps/v8/src/builtins/builtins-api.cc:139
#10 0x000055a20a6d3485 in v8::internal::Builtin_HandleApiCall (args_length=7, args_object=0x7ffd26792008, isolate=0x55a20de0bbb0) at ../deps/v8/src/builtins/builtins-api.cc:127

@tlbdk
Copy link
Contributor Author

tlbdk commented Aug 30, 2018

@addaleax Was this enough to help with the debugging or do we need to provide more info?

@addaleax
Copy link
Member

@tlbdk I’m pretty sure it’s enough, I’ll try to have a patch up today or tomorrow. Thanks for the help!

@addaleax addaleax self-assigned this Aug 30, 2018
addaleax added a commit to addaleax/node that referenced this issue Aug 31, 2018
This allows removing a number of special cases in other
parts of the code, at least one of which was incorrect
in expecting aligned input when that was not guaranteed.

Fixes: nodejs#22358
@addaleax
Copy link
Member

@tlbdk @mathiastj Would be great if you could try the patch in #22622?

addaleax added a commit that referenced this issue Sep 3, 2018
This allows removing a number of special cases in other
parts of the code, at least one of which was incorrect
in expecting aligned input when that was not guaranteed.

Fixes: #22358

PR-URL: #22622
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
@tlbdk
Copy link
Contributor Author

tlbdk commented Sep 4, 2018

I'm out traveling but should have time to do a build from master end of the week

targos pushed a commit that referenced this issue Sep 5, 2018
This allows removing a number of special cases in other
parts of the code, at least one of which was incorrect
in expecting aligned input when that was not guaranteed.

Fixes: #22358

PR-URL: #22622
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
targos pushed a commit that referenced this issue Sep 6, 2018
This allows removing a number of special cases in other
parts of the code, at least one of which was incorrect
in expecting aligned input when that was not guaranteed.

Fixes: #22358

PR-URL: #22622
Reviewed-By: Tobias Nießen <tniessen@tnie.de>
@mathiastj
Copy link

@addaleax

We have now tested it out and it works flawlessly.
Thanks a lot for all the help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
string_decoder Issues and PRs related to the string_decoder subsystem.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants