Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node: Segmentation fault #12608

Closed
borisovg opened this issue Sep 17, 2017 · 8 comments
Closed

Node: Segmentation fault #12608

borisovg opened this issue Sep 17, 2017 · 8 comments
Assignees

Comments

@borisovg
Copy link

What version of gRPC and what language are you using?

1.6.0 for Node.js

What operating system (Linux, Windows, …) and version?

Debian Sid

What runtime / compiler are you using (e.g. python version or version of gcc)

gcc (Debian 7.2.0-4) 7.2.0

What did you do?

I am connecting to etcd using grpc.load() of etcd's proto file. While testing client behaviour when the etcd server is not accessible my app segfaults after several connection errors ("Failed to connect before the deadline" from grpc.waitForClientReady()).

Backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `node app.js'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fef86ca5cd8 in main_arena () from /lib/x86_64-linux-gnu/libc.so.6
[Current thread is 1 (Thread 0x7fef87d69b80 (LWP 11173))]
(gdb) bt full
#0  0x00007fef86ca5cd8 in main_arena () from /lib/x86_64-linux-gnu/libc.so.6
No symbol table info available.
#1  0x00007fef765553b0 in grpc_timer_cancel (exec_ctx=<optimized out>, timer=0x7fef86ca5d08 <main_arena+520>) at ../src/core/lib/iomgr/timer_uv.c:84
No locals.
#2  0x00007fef76583d55 in uv_tc_on_connect (req=<optimized out>, status=-125) at ../src/core/lib/iomgr/tcp_client_uv.c:85
        connect = 0x7fef86ca5ca8 <main_arena+424>
        exec_ctx = {closure_list = {head = 0x0, tail = 0x0}, active_combiner = 0x0, last_combiner = 0x0, flags = 1, check_ready_to_finish_arg = 0x0, check_ready_to_finish = 0x0}
        error = 0x0
        done = <optimized out>
        closure = 0x2ccee40
#3  0x000000000132b914 in ?? ()
No symbol table info available.
#4  0x0000000001321350 in uv_run ()
No symbol table info available.
#5  0x00000000010a9c50 in node::Start(int, char**) ()
No symbol table info available.
#6  0x00007fef8692e2e1 in __libc_start_main (main=0x7b7f70 <main>, argc=2, argv=0x7ffe59d658a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffe59d65898)
    at ../csu/libc-start.c:291
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -3805309164132912269, 8099412, 140730405640352, 0, 0, 3804524265751032691, 3814509443662721907}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x2, 
              0x7b7f70 <main>}, data = {prev = 0x0, cleanup = 0x0, canceltype = 2}}}
        not_first_call = <optimized out>
#7  0x00000000007b967d in _start ()
No symbol table info available.

What did you expect to see?

Something other than a segfault.

What did you see instead?

A segfault.

@murgatroid99
Copy link
Member

Can you share a script that reproduces this? Also, what version of Node are you using?

@borisovg
Copy link
Author

Node is v6.11.3 - I'll have a go at making a small repro script tomorrow.

@borisovg
Copy link
Author

A better backtrace:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `node app.js'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  grpc_closure_sched (exec_ctx=0x7fff961b9190, c=0x23ebc80, error=0x4) at ../src/core/lib/iomgr/closure.c:177
177     ../src/core/lib/iomgr/closure.c: No such file or directory.
[Current thread is 1 (Thread 0x7fa526584740 (LWP 4090))]
(gdb) bt full
#0  grpc_closure_sched (exec_ctx=0x7fff961b9190, c=0x23ebc80, error=0x4) at ../src/core/lib/iomgr/closure.c:177
No locals.
#1  0x00007fa51dcaa3b0 in grpc_timer_cancel (exec_ctx=<optimized out>, timer=0x23ec430) at ../src/core/lib/iomgr/timer_uv.c:84
No locals.
#2  0x00007fa51dcd8d55 in uv_tc_on_connect (req=<optimized out>, status=-125) at ../src/core/lib/iomgr/tcp_client_uv.c:85
        connect = 0x23ec3d0
        exec_ctx = {closure_list = {head = 0x0, tail = 0x0}, active_combiner = 0x0, last_combiner = 0x0, flags = 1, check_ready_to_finish_arg = 0x0, check_ready_to_finish = 0x0}
        error = 0x0
        done = <optimized out>
        closure = 0x0
#3  0x00007fa525f6435c in uv__stream_destroy (stream=stream@entry=0x2438880) at src/unix/stream.c:445
        __PRETTY_FUNCTION__ = "uv__stream_destroy"
#4  0x00007fa525f5aa36 in uv__finish_close (handle=0x2438880) at src/unix/core.c:256
No locals.
#5  uv__run_closing_handles (loop=0x7fa526173900 <default_loop_struct>) at src/unix/core.c:286
        p = 0x2438880
        q = 0x238b410
#6  uv_run (loop=0x7fa526173900 <default_loop_struct>, mode=mode@entry=UV_RUN_ONCE) at src/unix/core.c:356
        timeout = <optimized out>
        r = 1
#7  0x0000000000d720c0 in node::StartNodeInstance (arg=<synthetic pointer>) at ../src/node.cc:4707
        seal = {isolate_ = 0x234b470, prev_limit_ = 0x238b370, prev_sealed_level_ = 0}
        more = <optimized out>
        locker = {has_lock_ = true, top_level_ = true, isolate_ = 0x234b470}
        context = {val_ = 0x2389380}
        env = <optimized out>
        handle_scope = {isolate_ = 0x234b470, prev_next_ = 0x0, prev_limit_ = 0x0}
        exit_code = <optimized out>
        params = {entry_hook = 0x0, code_event_handler = 0x0, constraints = {max_semi_space_size_ = 0, max_old_space_size_ = 0, max_executable_size_ = 0, stack_limit_ = 0x0, code_range_size_ = 0}, 
          snapshot_blob = 0x0, counter_lookup_callback = 0x0, create_histogram_callback = 0x0, add_histogram_sample_callback = 0x0, array_buffer_allocator = 0x234b450}
        array_buffer_allocator = 0x234b450
        isolate = 0x234b470
        instance_data = <synthetic pointer>
        args = {0xf3b029 "../src/node.cc", 0xf3b066 "4659", 0xf3b06b "(node_isolate) == (nullptr)", 0xf3cc80 <node::StartNodeInstance(void*)::__PRETTY_FUNCTION__> "void node::StartNodeInstance(void*)"}
        args = {0xf3b029 "../src/node.cc", 0xf3b04a "4745", 0xf3b04f "(isolate) != (nullptr)", 0xf3cc80 <node::StartNodeInstance(void*)::__PRETTY_FUNCTION__> "void node::StartNodeInstance(void*)"}
#8  node::Start (argc=<optimized out>, argv=<optimized out>) at ../src/node.cc:4793
        instance_data = {node_instance_type_ = node::MAIN, exit_code_ = 1, event_loop_ = <optimized out>, argc_ = <optimized out>, argv_ = <optimized out>, exec_argc_ = <optimized out>, 
          exec_argv_ = <optimized out>, use_debug_agent_flag_ = <optimized out>}
        exec_argc = 0
        exec_argv = 0x2349c70
        exit_code = 1
#9  0x00007fa5221912e1 in __libc_start_main (main=0x6d9460 <main(int, char**)>, argc=2, argv=0x7fff961b95a8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff961b9598)
    at ../csu/libc-start.c:291
        result = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {0, -3599453058325645345, 7187328, 140735711778208, 0, 0, 3599222158807431135, 3550641350141310943}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 
              0x7fff961b95c0, 0x7fa5265b3170}, data = {prev = 0x0, cleanup = 0x0, canceltype = -1776577088}}}
        not_first_call = <optimized out>
#10 0x00000000006dabaa in _start ()
No symbol table info available.

@borisovg
Copy link
Author

borisovg commented Sep 19, 2017

This one works for me most of the time (but not always), hence the shell loop.

Source of foo.js

var grpc = require('grpc');
var proto = grpc.load('foo.proto');

var attempt = 0;

(function loop () {
    var client = new proto.foo.FooService('gir.me.uk:8888', grpc.credentials.createInsecure());

    grpc.waitForClientReady(client, new Date(Date.now() + 5000), function (err) {
        if (err) {
            attempt += 1;

            if (attempt < 2) {
                setTimeout(loop, 15000);
            }

            console.error('Connection timeout', err);
        }
    });
}());

Source of foo.proto:

syntax = "proto3";

package foo;

service FooService {
}

Run until it crashes:

while [ true ]; do node foo.js || break; done

@murgatroid99
Copy link
Member

So, "gir.me.uk:8888" is simply an arbitrary dns name + port that is expected to never exist; is that right?

@borisovg
Copy link
Author

It's a port on my website that I know will drop connections, as upposed to reset them which will trigger a different error in the client.

vosst added a commit to vosst/grpc that referenced this issue Sep 27, 2017
The state is used both in the callback for the actual connect as well as
in the additional timeout that is setup for the operation. Both code
paths decrease the reference count and if they happen to be queued at
the same time, memory is corrupted. Subsequent behavior is undefined and
segfaults can be observed as a result.

Fixes grpc#12608
@vosst
Copy link
Contributor

vosst commented Sep 28, 2017

@borisovg hey, would you be able to give my PR a spin and provide feedback whether it fixes your crash?
Probably a good idea to cross-check while we are waiting for @murgatroid99 to provide feedback.

@borisovg
Copy link
Author

@vosst Looks like it does indeed fix it. :)

murgatroid99 pushed a commit to murgatroid99/grpc that referenced this issue Oct 3, 2017
The state is used both in the callback for the actual connect as well as
in the additional timeout that is setup for the operation. Both code
paths decrease the reference count and if they happen to be queued at
the same time, memory is corrupted. Subsequent behavior is undefined and
segfaults can be observed as a result.

Fixes grpc#12608
murgatroid99 pushed a commit to murgatroid99/grpc that referenced this issue Oct 4, 2017
The state is used both in the callback for the actual connect as well as
in the additional timeout that is setup for the operation. Both code
paths decrease the reference count and if they happen to be queued at
the same time, memory is corrupted. Subsequent behavior is undefined and
segfaults can be observed as a result.

Fixes grpc#12608
@lock lock bot locked as resolved and limited conversation to collaborators Oct 1, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants