Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sporadic SIGSEGV in src/unix/stream.c #425

Closed
tstellanova opened this issue Jul 6, 2015 · 18 comments
Closed

Sporadic SIGSEGV in src/unix/stream.c #425

tstellanova opened this issue Jul 6, 2015 · 18 comments

Comments

@tstellanova
Copy link

I can't provide a consistent repro case for this, but I've noticed that after some thousands of reads and writes to a TCP socket stream using libuv, I encounter a segmentation fault. Here's the backtrace from gdb:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc67fc700 (LWP 19000)]
uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:841
841       size_t len = buf->len;
(gdb) bt
#0  uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:841
#1  0x00007ffff79ae239 in uv__stream_io (loop=<optimized out>, w=0x7fffb0000b18, events=<optimized out>)
    at src/unix/stream.c:1242
#2  0x00007ffff79a604b in uv__run_pending (loop=0x852f28) at src/unix/core.c:710
#3  uv_run (loop=0x852f28, mode=UV_RUN_DEFAULT) at src/unix/core.c:316
#4  0x0000000000535a75 in ReliablePipe::run (this=0x852f20) at Pipe/ReliablePipe.cpp:231
#5  0x000000000051b99e in BasePipe::loopInit (this=0x852f20) at Pipe/BasePipe.cpp:34
#6  0x000000000051b831 in BasePipe::WorkerThread::run (this=0x855280) at Pipe/BasePipe.cpp:116
#7  0x0000000000561bae in Thread::startThread (data=0x855280) at Threads/Thread.cpp:8
#8  0x00007ffff7bc4e9a in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9  0x00007ffff58d238d in clone () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x0000000000000000 in ?? ()

From what I can, the problem is in this line:
uv_buf_t* buf = &(req->bufs[req->write_index]);

It looks like bufs is already null:
(gdb) p req-> bufs
$34 = (uv_buf_t *) 0x0

Looking through the code, it looks like req->bufs is cleared or not cleared depending on whether write succeeds. Is it possible I'm unintentionally ignoring some error status and causing this?

Thanks for any suggestions.

@bnoordhuis
Copy link
Member

What version of libuv is that? Going by the line numbers, it's not the most recent release.

Can you try a debug build and post the output of info locals at the crash site? Can you include the output of print *req as well? Thanks.

@tstellanova
Copy link
Author

I believe we're running libuv 1.3 -- I'll investigate whether we can upgrade easily.

uv__write (stream=0x7fffb8000a90) at src/unix/stream.c:841
841 size_t len = buf->len;
(gdb) info locals
buf = 0x10
len =
iov = 0x7fffc4000ee0
q = 0x7fffc4000eb8
req = 0x7fffc4000e60
iovcnt =
n =
PRETTY_FUNCTION = "uv__write"

(gdb) print req
$1 = {data = 0x852f20, type = UV_WRITE, active_queue = {0x852f48, 0x852f48}, reserved = {0x0, 0x0, 0x0, 0x0}, cb = 0x51bb30
<BasePipe::after_standalone_write(uv_write_s
, int)>, send_handle = 0x0, handle = 0x7fffb8000a90, queue = {
0x7fffb8000b60, 0x7fffb8000b60}, write_index = 1, bufs = 0x0, nbufs = 1, error = 0, bufsml = {{
base = 0x7fffc4000d60 "\355\376a", len = 103}, {base = 0x0, len = 0}, {base = 0x0, len = 0}, {base = 0x0, len = 0}}}

@bnoordhuis
Copy link
Member

Libuv follows semver so an upgrade should be a drop-in replacement.

req->write_index == req->nbufs == 1 might be indicative of a bug. Is there anywhere in your application where you close one handle when another handle errors/eofs/etc.? I wonder if the queue manipulation in uv__run_pending() is unsound.

@tstellanova
Copy link
Author

I updated to libuv 1.6.1, saw the crash again. Looks like the same basic issue (bufs is NULL)

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffc77fe700 (LWP 29192)]
uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:853
853 size_t len = buf->len;
(gdb) bt
#0 uv__write (stream=0x7fffb0000a90) at src/unix/stream.c:853
#1 0x00007ffff79af22b in uv_write2 (req=0x7fffb8002d40, stream=0x7fffb0000a90, bufs=0x7fffb8002e00, nbufs=1,
send_handle=, cb=) at src/unix/stream.c:1388
#2 0x000000000053628a in ReliablePipe::SendBinaryMessage (this=0x852f20, msg=0x7fffb8002c60 "\220&", length=205)
at Pipe/ReliablePipe.cpp:298

(gdb) info locals
buf = 0x10
len =
iov = 0x7fffb8002dc0
q = 0x7fffb8002d98
req = 0x7fffb8002d40
iovcnt =
n =
PRETTY_FUNCTION = "uv__write"

(gdb) p req
$1 = {data = 0x852f20, type = UV_WRITE, active_queue = {0x852f48, 0x852f48}, reserved = {0x0, 0x0, 0x0, 0x0}, cb = 0x51bb30
<BasePipe::after_standalone_write(uv_write_s
, int)>, send_handle = 0x0, handle = 0x7fffb0000a90, queue = {
0x7fffb0000b60, 0x7fffb0000b60}, write_index = 1, bufs = 0x0, nbufs = 1, error = 0, bufsml = {{
base = 0x7fffb8002c60 "\220&", len = 205}, {base = 0x0, len = 0}, {base = 0x0, len = 0}, {base = 0x0, len = 0}}}

Thanks for the handle close idea -- I'll investigate.

@saghul
Copy link
Member

saghul commented Jul 7, 2015

Quick sanity check: there is no chance you're reusing uv_write_t structures, is there?

@tstellanova
Copy link
Author

  • it doesn't appear we're reusing write_t structures
  • doesn't appear the stream has been disconnected by the other side before the crash (and none of the EOF / disconnect handlers are invoked)
  • the crash typically appears 30-45 minutes after opening the stream and sending and receiving about 10 small messages (<500 bytes ) per second; however it sometimes appears as soon as 30 seconds
  • this code is running in a multithreaded environment. All direct access to the stream is wrapped in a mutex (read_start, write, etc)-- however, uv_run is not, and we're using the default run mode

@saghul
Copy link
Member

saghul commented Jul 7, 2015

Ah, that's not right. None of the APIs are thread-safe unless stated
otherwise, so calling uv_write from outside the loop thread can lead to any
kind of failure.

  • it doesn't appear we're reusing write_t structures
  • doesn't appear the stream has been disconnected by the other side
    before the crash (and none of the EOF / disconnect handlers are invoked)
  • the crash typically appears 30-45 minutes after opening the stream and
    sending and receiving about 10 small messages (<500 bytes ) per second;
    however it sometimes appears as soon as 30 seconds
  • this code is running in a multithreaded environment. All direct access
    to the stream is wrapped in a mutex (read_start, write, etc)-- however,
    uv_run is not, and we're using the default run mode


Reply to this email directly or view it on GitHub
#425 (comment).

@tstellanova
Copy link
Author

Can you clarify this?

If you have one thread driving uv_run (which blocks in the default run mode), how should one then call uv_write?

On Jul 7, 2015, at 6:27 AM, Saúl Ibarra Corretgé notifications@github.com wrote:

Ah, that's not right. None of the APIs are thread-safe unless stated
otherwise, so calling uv_write from outside the loop thread can lead to any
kind of failure.

  • it doesn't appear we're reusing write_t structures
  • doesn't appear the stream has been disconnected by the other side
    before the crash (and none of the EOF / disconnect handlers are invoked)
  • the crash typically appears 30-45 minutes after opening the stream and
    sending and receiving about 10 small messages (<500 bytes ) per second;
    however it sometimes appears as soon as 30 seconds
  • this code is running in a multithreaded environment. All direct access
    to the stream is wrapped in a mutex (read_start, write, etc)-- however,
    uv_run is not, and we're using the default run mode


Reply to this email directly or view it on GitHub
#425 (comment).

Reply to this email directly or view it on GitHub.

@saghul
Copy link
Member

saghul commented Jul 7, 2015

As a result of any other operation, such as a connection was made, data was received, a timer kicked in, etc.

None of the loop or handle APIs are thread-safe, the only exception is uv_async_send, which is used to trigger a callback in the loop thread by another thread: http://docs.libuv.org/en/v1.x/async.html?highlight=uv_async_send#c.uv_async_send

@tstellanova
Copy link
Author

Thanks for the clarification.

@franknj
Copy link

franknj commented Sep 10, 2015

@saghul , as a developer coming from *nix select/epoll background, I like to get a bit more clarification on the "single threaded network I/O " nature of libuv.
Does it means if uv_run's loop monitoring/polling a socket for anything readable, another thread CAN NOT send/write to that socket ? With Linux select or epoll, I believe that is allowed. We tried the same way with libuv. Not sure it is correct.
Our app gets sporadic Segmentation fault. And the stack is below,
#0 0x00007f8734756818 in uv__write (stream=0x7f873401b558) at src/unix/stream.c:863
#1 0x00007f87347577bb in uv__stream_io (loop=0x7f8734969b20 <default_loop_struct>, w=0x7f873401b5e0, events=4) at src/unix/stream.c:1264
#2 0x00007f873474a365 in uv__run_pending (loop=0x7f8734969b20 <default_loop_struct>) at src/unix/core.c:733
#3 0x00007f8734749a49 in uv_run (loop=0x7f8734969b20 <default_loop_struct>, mode=UV_RUN_DEFAULT) at src/unix/core.c:333

And the req is :
(gdb) p req
$1 = (uv_write_t *) 0x162be30
(gdb) p *req
$2 = {data = 0x162ba20, type = UV_WRITE, active_queue = {0x1632990, 0x1624530}, reserved = {0x0, 0x0, 0x0, 0x0},
cb = 0x7f87349d5fb3 <UV::g_on_write_end(uv_write_t
, int)>, send_handle = 0x0, handle = 0x7f873401b558, queue = {0x16329d8, 0x1624578}, write_index = 1,
bufs = 0x0, nbufs = 1, error = 0, bufsml = {{base = 0x162ba20 "5", len = 55}, {base = 0x0, len = 0}, {base = 0x0, len = 0}, {base = 0x0, len = 0}}}

Thanks in advance,
Frank

@bnoordhuis
Copy link
Member

@franknj You cannot call uv_write() or uv_read_start() from a different thread than the one that calls uv_run().

@franknj
Copy link

franknj commented Sep 10, 2015

@bnoordhuis , thanks for clarification. That is a big difference from select/epoll based approach.

@neevek
Copy link
Contributor

neevek commented Jun 23, 2016

I run into this same problem for blindly reusing the uv_write_t object.

@saghul I think it would be useful to mention in the documentation that uv_write_t should not be reused without careful memory management.

@saghul
Copy link
Member

saghul commented Jun 23, 2016

@neevek mind submitting a documentation PR?

@neevek
Copy link
Contributor

neevek commented Jun 24, 2016

@saghul I will do it later.
But I am still confused because I changed my code and I reuse the uv_write_t object only after uv_write_cb is called, I assume that when the callback is called, the libuv internal is done with the uv_write_t object, but it seems not, the segfault still occurs. why?

@saghul
Copy link
Member

saghul commented Jun 24, 2016

After the callback fires it should be fine to reuse it, if not, there is a bug somewhere. A test case would definitely help.

@neevek
Copy link
Contributor

neevek commented Jun 25, 2016

@saghul Your are right, it was a bug in my code. And I just submitted a documentation PR~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants