Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Sporadically failing test/simple/test-dgram-pingpong.js #2981

Closed
isaacs opened this issue Mar 21, 2012 · 7 comments
Closed

Sporadically failing test/simple/test-dgram-pingpong.js #2981

isaacs opened this issue Mar 21, 2012 · 7 comments
Assignees

Comments

@isaacs
Copy link

isaacs commented Mar 21, 2012

It looks like the sporadic failures in test/simple/test-dgram-pingpong.js are occurring because occasionally the server or client gets an empty buffer as the message, rather than the data that was sent.

isaacs/node@dgram-pingpong has a patch for the test to show this more clearly. It doesn't fail often, but when I run this:

i=0; while ./node test/simple/test-crypto-dh.js ; do let i++; echo $i; done

it'll usually fail within about 200 iterations.

@piscisaureus
Copy link

Does not reproduce on windows.
@bnoordhuis ?

@bnoordhuis
Copy link
Member

I think I've seen it happen on all three supported Unices (Linux, Darwin, SunOS). It certainly fails regularly on Linux.

It could be a bug in either the test itself or in the assumptions it makes - SOCK_DGRAM is not guaranteed to be reliable, even on loopback devices.

@isaacs
Copy link
Author

isaacs commented Mar 21, 2012

If we're ok with SOCK_DGRAM dying randomly, and either just abruptly stop sending any messages, or fail part of the way through and send an empty message, then perhaps we can change the test around:

  1. if an empty message shows up, or if 100ms goes by without any activity, assume that connection is through, and don't expect any more.
  2. Ensure that we get at least a 60% success rate, or some other arbitrary lower bound.

@bnoordhuis
Copy link
Member

I take back what I said earlier: dropped packets is one thing but empty packets shouldn't happen. I'll investigate it.

EDIT: I'm 90% sure that it's a bug in the slab allocator.

@ghost ghost assigned bnoordhuis Mar 22, 2012
@bnoordhuis
Copy link
Member

Update: I found the cause (or causes, really):

  1. The UDP slab allocator has bugs. That's not the root cause because the test failures don't stop if I switch to static buffers.
  2. The buffer gets GC'd somewhere between the call to MakeCallback() and the JS callback. Bingo!

I added some instrumentation to the Buffer class to check for premature collection. Lo and behold:

(gdb) bt
#0  0x00007f73b7cf3a75 in *__GI_raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f73b7cf75c0 in *__GI_abort () at abort.c:92
#2  0x00007f73b7cec941 in *__GI___assert_fail (assertion=0x99943d "pinned_ == false", file=<value optimized out>, line=198, function=0x999d00 "virtual node::Buffer::~Buffer()") at assert.c:81
#3  0x000000000058932c in node::Buffer::~Buffer() ()
#4  0x000000000073df14 in v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector) ()
#5  0x0000000000758f70 in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::internal::GCTracer*) ()
#6  0x0000000000759872 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollector, char const*, char const*) ()
#7  0x0000000000728ebb in v8::internal::Factory::NewStringFromUtf8(v8::internal::Vector<char const>, v8::internal::PretenureFlag) ()
#8  0x00000000006dc903 in v8::String::New(char const*, int) ()
#9  0x0000000000585d4b in node::MakeCallback(v8::Handle<v8::Object>, char const*, int, v8::Handle<v8::Value>*) ()
#10 0x00000000005b10ed in node::UDPWrap::OnRecv(uv_udp_s*, long, uv_buf_t, sockaddr*, unsigned int) ()
#11 0x00000000005d054a in uv__udp_recvmsg (loop=<value optimized out>, w=0x1bb4078, revents=<value optimized out>) at ../deps/uv/src/unix/udp.c:263
#12 0x00000000005d9be4 in ev_invoke_pending (loop=0xe39aa0) at ../deps/uv/src/unix/ev/ev.c:2149
#13 0x00000000005df66d in ev_run (loop=0xe39aa0, flags=2) at ../deps/uv/src/unix/ev/ev.c:2525
#14 0x00000000005ccca2 in uv_run (loop=<value optimized out>) at ../deps/uv/src/unix/core.c:207
#15 0x0000000000588a5b in node::Start(int, char**) ()
#16 0x00007f73b7cdec4d in __libc_start_main (main=<value optimized out>, argc=<value optimized out>, ubp_av=<value optimized out>, init=<value optimized out>, fini=<value optimized out>, rtld_fini=<value optimized out>, stack_end=0x7fff79df5a68) at libc-start.c:226
#17 0x000000000057f52d in _start ()

Note how the call to String::New() in stack frame 8 kicks off a GC sweep. This is actually a rather serious bug because it affects everything that uses MakeCallback(), which is pretty much all of Node.

@isaacs
Copy link
Author

isaacs commented Mar 24, 2012

@bnoordhuis Wow, nice find!

@bnoordhuis
Copy link
Member

Fixed in f86359c and 1ab95a5.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants