New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crash from use of bad file descriptor in epoll_ctl #70

Closed
ongardie opened this Issue Dec 4, 2014 · 1 comment

Comments

Projects
None yet
1 participant
@ongardie
Copy link
Member

ongardie commented Dec 4, 2014

With 9e6859a:

  • Added usleep(10 * 1000 * 1000); to Examples/SmokeTest before the return 0.
  • Set ClientImpl::ExactlyOnceRPCHelper::keepAliveIntervalMs to 6 * 1000.

Steps to reproduce:

  • Launch a single server
  • Start modified SmokeTest
  • Kill the server
  • Wait for the client to send a keepalive (about 6s in modified version)
~/logcabin:master$ bt ./build/Examples/SmokeTest   
warning: GDB: Failed to set controlling terminal: Operation not permitted
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
1417733182.449970 Event/File.cc:63 in setEvents() ERROR[17607:thread 3]: Modifying file -1 event with epoll_ctl failed: Bad file descriptor Exiting...

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff578d700 (LWP 17613)]
0x00007ffff63eb107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Thread 3 (Thread 0x7ffff578d700 (LWP 17613)):
#0  0x00007ffff63eb107 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff63ec4e8 in __GI_abort () at abort.c:89
#2  0x00000000004604cd in LogCabin::Event::File::setEvents (this=0x6a3808, fileEvents=1073741828) at build/Event/File.cc:62
#3  0x000000000045c927 in LogCabin::RPC::MessageSocket::sendMessage (this=0x6a3710, messageId=7, contents=...) at build/RPC/MessageSocket.cc:201
#4  0x000000000045881e in LogCabin::RPC::ClientSession::sendRequest (this=0x69cc40, request=...) at build/RPC/ClientSession.cc:257
#5  0x00000000004566d3 in LogCabin::RPC::ClientRPC::ClientRPC (this=0x7ffff578cb70, session=std::shared_ptr (expired, weak 0) 0x7ffff578cc30, service=1, serviceSpecificErrorVersion=1 '\001', opCode=5, request=...) at build/RPC/ClientRPC.cc:53
#6  0x00000000004182b0 in LogCabin::Client::LeaderRPC::call (this=0x69e450, opCode=LogCabin::Protocol::Client::READ_WRITE_TREE, request=..., response=...) at build/Client/LeaderRPC.cc:68
#7  0x000000000040f374 in LogCabin::Client::ClientImpl::keepAlive (this=0x69e2c0) at build/Client/ClientImpl.cc:480
#8  0x000000000040d0d2 in LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper::keepAliveThreadMain (this=0x69e2e0) at build/Client/ClientImpl.cc:135
#9  0x000000000041781d in std::_Mem_fn<void (LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper::*)()>::operator() (this=0x69df58, __object=0x69e2e0) at /usr/include/c++/4.4/tr1_impl/functional:552
#10 0x0000000000417792 in std::_Bind<std::_Mem_fn<void (LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper::*)()> (LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper*)>::__call<, 0>(std::tuple<> const&, std::_Index_tuple<0>) (this=0x69df58, __args=empty std::tuple) at /usr/include/c++/4.4/tr1_impl/functional:1137
#11 0x000000000041772f in std::_Bind<std::_Mem_fn<void (LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper::*)()> (LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper*)>::operator()<>() (this=0x69df58) at /usr/include/c++/4.4/tr1_impl/functional:1191
#12 0x0000000000417670 in std::thread::_Impl<std::_Bind<std::_Mem_fn<void (LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper::*)()> (LogCabin::Client::ClientImpl::ExactlyOnceRPCHelper*)> >::_M_run() (this=0x69df40) at /usr/include/c++/4.4/thread:114
#13 0x00007ffff6d2c970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x00007ffff7bc70a4 in start_thread (arg=0x7ffff578d700) at pthread_create.c:309
#15 0x00007ffff649bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 2 (Thread 0x7ffff5f8e700 (LWP 17611)):
#0  0x00007ffff649c2a3 in epoll_wait () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000000461114 in LogCabin::Event::Loop::runForever (this=0x69e500) at build/Event/Loop.cc:144
#2  0x000000000041a2dd in std::_Mem_fn<void (LogCabin::Event::Loop::*)()>::operator() (this=0x6a55e8, __object=0x69e500) at /usr/include/c++/4.4/tr1_impl/functional:552
#3  0x000000000041a252 in std::_Bind<std::_Mem_fn<void (LogCabin::Event::Loop::*)()> (LogCabin::Event::Loop*)>::__call<, 0>(std::tuple<> const&, std::_Index_tuple<0>) (this=0x6a55e8, __args=empty std::tuple) at /usr/include/c++/4.4/tr1_impl/functional:1137
#4  0x000000000041a1f5 in std::_Bind<std::_Mem_fn<void (LogCabin::Event::Loop::*)()> (LogCabin::Event::Loop*)>::operator()<>() (this=0x6a55e8) at /usr/include/c++/4.4/tr1_impl/functional:1191
#5  0x000000000041a136 in std::thread::_Impl<std::_Bind<std::_Mem_fn<void (LogCabin::Event::Loop::*)()> (LogCabin::Event::Loop*)> >::_M_run() (this=0x6a55d0) at /usr/include/c++/4.4/thread:114
#6  0x00007ffff6d2c970 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#7  0x00007ffff7bc70a4 in start_thread (arg=0x7ffff5f8e700) at pthread_create.c:309
#8  0x00007ffff649bccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Thread 1 (Thread 0x7ffff7fc9780 (LWP 17607)):
#0  0x00007ffff646d53d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff6495654 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
#2  0x0000000000407772 in main (argc=1, argv=0x7fffffffe458) at build/Examples/SmokeTest.cc:113

@ongardie ongardie added the bug label Dec 4, 2014

@ongardie

This comment has been minimized.

Copy link
Member

ongardie commented Dec 4, 2014

When the server is killed, the MessageSocket closes the socket:

#0  LogCabin::RPC::MessageSocket::disconnect (this=0x69cc40) at build/RPC/MessageSocket.cc:207
#1  0x000000000045cbac in LogCabin::RPC::MessageSocket::readable (this=0x69cc40) at build/RPC/MessageSocket.cc:232
#2  0x000000000045bf7f in LogCabin::RPC::MessageSocket::ReceiveSocket::handleFileEvent (this=0x69cd18, events=1) at build/RPC/MessageSocket.cc:102
#3  0x00000000004612de in LogCabin::Event::Loop::runForever (this=0x69e500) at build/Event/Loop.cc:152

But MessageSocket::sendMessage still tries to use it later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment