Sporadic SIGSEGV in src/unix/stream.c #425

tstellanova · 2015-07-06T17:42:29Z

I can't provide a consistent repro case for this, but I've noticed that after some thousands of reads and writes to a TCP socket stream using libuv, I encounter a segmentation fault. Here's the backtrace from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc67fc700 (LWP 19000)]
uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:841
841       size_t len = buf->len;
(gdb) bt
#0  uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:841
#1  0x00007ffff79ae239 in uv__stream_io (loop=<optimized out>, w=0x7fffb0000b18, events=<optimized out>)
    at src/unix/stream.c:1242
#2  0x00007ffff79a604b in uv__run_pending (loop=0x852f28) at src/unix/core.c:710
#3  uv_run (loop=0x852f28, mode=UV_RUN_DEFAULT) at src/unix/core.c:316
#4  0x0000000000535a75 in ReliablePipe::run (this=0x852f20) at Pipe/ReliablePipe.cpp:231
#5  0x000000000051b99e in BasePipe::loopInit (this=0x852f20) at Pipe/BasePipe.cpp:34
#6  0x000000000051b831 in BasePipe::WorkerThread::run (this=0x855280) at Pipe/BasePipe.cpp:116
#7  0x0000000000561bae in Thread::startThread (data=0x855280) at Threads/Thread.cpp:8
#8  0x00007ffff7bc4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007ffff58d238d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()

From what I can, the problem is in this line:
uv_buf_t* buf = &(req->bufs[req->write_index]);

It looks like bufs is already null:
(gdb) p req-> bufs
$34 = (uv_buf_t *) 0x0

Looking through the code, it looks like req->bufs is cleared or not cleared depending on whether write succeeds. Is it possible I'm unintentionally ignoring some error status and causing this?

Thanks for any suggestions.

The text was updated successfully, but these errors were encountered:

bnoordhuis · 2015-07-06T17:59:33Z

What version of libuv is that? Going by the line numbers, it's not the most recent release.

Can you try a debug build and post the output of info locals at the crash site? Can you include the output of print *req as well? Thanks.

tstellanova · 2015-07-06T18:06:31Z

I believe we're running libuv 1.3 -- I'll investigate whether we can upgrade easily.

uv__write (stream=0x7fffb8000a90) at src/unix/stream.c:841
841 size_t len = buf->len;
(gdb) info locals
buf = 0x10
len =
iov = 0x7fffc4000ee0
q = 0x7fffc4000eb8
req = 0x7fffc4000e60
iovcnt =
n =
PRETTY_FUNCTION = "uv__write"

(gdb) print req
$1 = {data = 0x852f20, type = UV_WRITE, active_queue = {0x852f48, 0x852f48}, reserved = {0x0, 0x0, 0x0, 0x0}, cb = 0x51bb30
<BasePipe::after_standalone_write(uv_write_s, int)>, send_handle = 0x0, handle = 0x7fffb8000a90, queue = {
0x7fffb8000b60, 0x7fffb8000b60}, write_index = 1, bufs = 0x0, nbufs = 1, error = 0, bufsml = {{
base = 0x7fffc4000d60 "\355\376a", len = 103}, {base = 0x0, len = 0}, {base = 0x0, len = 0}, {base = 0x0, len = 0}}}

bnoordhuis · 2015-07-06T19:23:09Z

Libuv follows semver so an upgrade should be a drop-in replacement.

req->write_index == req->nbufs == 1 might be indicative of a bug. Is there anywhere in your application where you close one handle when another handle errors/eofs/etc.? I wonder if the queue manipulation in uv__run_pending() is unsound.

tstellanova · 2015-07-06T19:45:11Z

I updated to libuv 1.6.1, saw the crash again. Looks like the same basic issue (bufs is NULL)

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc77fe700 (LWP 29192)]
uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:853
853 size_t len = buf->len;
(gdb) bt
#0 uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:853
#1 0x00007ffff79af22b in uv_write2 (req=0x7fffb8002d40, stream=0x7fffb0000a90, bufs=0x7fffb8002e00, nbufs=1,
send_handle=, cb=) at src/unix/stream.c:1388
#2 0x000000000053628a in ReliablePipe::SendBinaryMessage (this=0x852f20, msg=0x7fffb8002c60 "\220&", length=205)
at Pipe/ReliablePipe.cpp:298

(gdb) info locals
buf = 0x10
len =
iov = 0x7fffb8002dc0
q = 0x7fffb8002d98
req = 0x7fffb8002d40
iovcnt =
n =
PRETTY_FUNCTION = "uv__write"

(gdb) p req
$1 = {data = 0x852f20, type = UV_WRITE, active_queue = {0x852f48, 0x852f48}, reserved = {0x0, 0x0, 0x0, 0x0}, cb = 0x51bb30
<BasePipe::after_standalone_write(uv_write_s, int)>, send_handle = 0x0, handle = 0x7fffb0000a90, queue = {
0x7fffb0000b60, 0x7fffb0000b60}, write_index = 1, bufs = 0x0, nbufs = 1, error = 0, bufsml = {{
base = 0x7fffb8002c60 "\220&", len = 205}, {base = 0x0, len = 0}, {base = 0x0, len = 0}, {base = 0x0, len = 0}}}

Thanks for the handle close idea -- I'll investigate.

saghul · 2015-07-07T07:50:26Z

Quick sanity check: there is no chance you're reusing uv_write_t structures, is there?

tstellanova · 2015-07-07T12:34:55Z

it doesn't appear we're reusing write_t structures
doesn't appear the stream has been disconnected by the other side before the crash (and none of the EOF / disconnect handlers are invoked)
the crash typically appears 30-45 minutes after opening the stream and sending and receiving about 10 small messages (<500 bytes ) per second; however it sometimes appears as soon as 30 seconds
this code is running in a multithreaded environment. All direct access to the stream is wrapped in a mutex (read_start, write, etc)-- however, uv_run is not, and we're using the default run mode

saghul · 2015-07-07T13:26:55Z

Ah, that's not right. None of the APIs are thread-safe unless stated
otherwise, so calling uv_write from outside the loop thread can lead to any
kind of failure.

it doesn't appear we're reusing write_t structures
doesn't appear the stream has been disconnected by the other side
before the crash (and none of the EOF / disconnect handlers are invoked)
the crash typically appears 30-45 minutes after opening the stream and
sending and receiving about 10 small messages (<500 bytes ) per second;
however it sometimes appears as soon as 30 seconds
this code is running in a multithreaded environment. All direct access
to the stream is wrapped in a mutex (read_start, write, etc)-- however,
uv_run is not, and we're using the default run mode

—
Reply to this email directly or view it on GitHub
#425 (comment).

tstellanova · 2015-07-07T15:57:42Z

Can you clarify this?

If you have one thread driving uv_run (which blocks in the default run mode), how should one then call uv_write?

On Jul 7, 2015, at 6:27 AM, Saúl Ibarra Corretgé notifications@github.com wrote:

Ah, that's not right. None of the APIs are thread-safe unless stated
otherwise, so calling uv_write from outside the loop thread can lead to any
kind of failure.

it doesn't appear we're reusing write_t structures

doesn't appear the stream has been disconnected by the other side
before the crash (and none of the EOF / disconnect handlers are invoked)

the crash typically appears 30-45 minutes after opening the stream and
sending and receiving about 10 small messages (<500 bytes ) per second;
however it sometimes appears as soon as 30 seconds

this code is running in a multithreaded environment. All direct access
to the stream is wrapped in a mutex (read_start, write, etc)-- however,
uv_run is not, and we're using the default run mode

—
Reply to this email directly or view it on GitHub
#425 (comment).
—
Reply to this email directly or view it on GitHub.

saghul · 2015-07-07T16:08:28Z

As a result of any other operation, such as a connection was made, data was received, a timer kicked in, etc.

None of the loop or handle APIs are thread-safe, the only exception is uv_async_send, which is used to trigger a callback in the loop thread by another thread: http://docs.libuv.org/en/v1.x/async.html?highlight=uv_async_send#c.uv_async_send

tstellanova · 2015-07-07T16:15:29Z

Thanks for the clarification.

franknj · 2015-09-10T15:53:14Z

@saghul , as a developer coming from *nix select/epoll background, I like to get a bit more clarification on the "single threaded network I/O " nature of libuv.
Does it means if uv_run's loop monitoring/polling a socket for anything readable, another thread CAN NOT send/write to that socket ? With Linux select or epoll, I believe that is allowed. We tried the same way with libuv. Not sure it is correct.
Our app gets sporadic Segmentation fault. And the stack is below,
#0 0x00007f8734756818 in uv__write (stream=0x7f873401b558) at src/unix/stream.c:863
#1 0x00007f87347577bb in uv__stream_io (loop=0x7f8734969b20 <default_loop_struct>, w=0x7f873401b5e0, events=4) at src/unix/stream.c:1264
#2 0x00007f873474a365 in uv__run_pending (loop=0x7f8734969b20 <default_loop_struct>) at src/unix/core.c:733
#3 0x00007f8734749a49 in uv_run (loop=0x7f8734969b20 <default_loop_struct>, mode=UV_RUN_DEFAULT) at src/unix/core.c:333

And the req is :
(gdb) p req
$1 = (uv_write_t *) 0x162be30
(gdb) p *req
$2 = {data = 0x162ba20, type = UV_WRITE, active_queue = {0x1632990, 0x1624530}, reserved = {0x0, 0x0, 0x0, 0x0},
cb = 0x7f87349d5fb3 <UV::g_on_write_end(uv_write_t, int)>, send_handle = 0x0, handle = 0x7f873401b558, queue = {0x16329d8, 0x1624578}, write_index = 1,
bufs = 0x0, nbufs = 1, error = 0, bufsml = {{base = 0x162ba20 "5", len = 55}, {base = 0x0, len = 0}, {base = 0x0, len = 0}, {base = 0x0, len = 0}}}

Thanks in advance,
Frank

bnoordhuis · 2015-09-10T17:04:27Z

@franknj You cannot call uv_write() or uv_read_start() from a different thread than the one that calls uv_run().

franknj · 2015-09-10T18:02:44Z

@bnoordhuis , thanks for clarification. That is a big difference from select/epoll based approach.

neevek · 2016-06-23T16:25:45Z

I run into this same problem for blindly reusing the uv_write_t object.

@saghul I think it would be useful to mention in the documentation that uv_write_t should not be reused without careful memory management.

saghul · 2016-06-23T16:32:21Z

@neevek mind submitting a documentation PR?

neevek · 2016-06-24T00:21:11Z

@saghul I will do it later.
But I am still confused because I changed my code and I reuse the uv_write_t object only after uv_write_cb is called, I assume that when the callback is called, the libuv internal is done with the uv_write_t object, but it seems not, the segfault still occurs. why?

saghul · 2016-06-24T00:45:41Z

After the callback fires it should be fine to reuse it, if not, there is a bug somewhere. A test case would definitely help.

neevek · 2016-06-25T06:51:32Z

@saghul Your are right, it was a bug in my code. And I just submitted a documentation PR~

tstellanova closed this as completed Jul 7, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sporadic SIGSEGV in src/unix/stream.c #425

Sporadic SIGSEGV in src/unix/stream.c #425

tstellanova commented Jul 6, 2015

bnoordhuis commented Jul 6, 2015

tstellanova commented Jul 6, 2015

bnoordhuis commented Jul 6, 2015

tstellanova commented Jul 6, 2015

saghul commented Jul 7, 2015

tstellanova commented Jul 7, 2015

saghul commented Jul 7, 2015

tstellanova commented Jul 7, 2015

saghul commented Jul 7, 2015

tstellanova commented Jul 7, 2015

franknj commented Sep 10, 2015

bnoordhuis commented Sep 10, 2015

franknj commented Sep 10, 2015

neevek commented Jun 23, 2016

saghul commented Jun 23, 2016

neevek commented Jun 24, 2016

saghul commented Jun 24, 2016

neevek commented Jun 25, 2016

Sporadic SIGSEGV in src/unix/stream.c #425

Sporadic SIGSEGV in src/unix/stream.c #425

Comments

tstellanova commented Jul 6, 2015

bnoordhuis commented Jul 6, 2015

tstellanova commented Jul 6, 2015

bnoordhuis commented Jul 6, 2015

tstellanova commented Jul 6, 2015

saghul commented Jul 7, 2015

tstellanova commented Jul 7, 2015

saghul commented Jul 7, 2015

tstellanova commented Jul 7, 2015

saghul commented Jul 7, 2015

tstellanova commented Jul 7, 2015

franknj commented Sep 10, 2015

bnoordhuis commented Sep 10, 2015

franknj commented Sep 10, 2015

neevek commented Jun 23, 2016

saghul commented Jun 23, 2016

neevek commented Jun 24, 2016

saghul commented Jun 24, 2016

neevek commented Jun 25, 2016