-
Notifications
You must be signed in to change notification settings - Fork 160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quick connections/reconnections lead to TSAN errors with gcc #77
Comments
This is the test we commented, it’s more like I said: you stop before connection is marked OPEN and the one after has enough time, then you have your tsan errors. Several stop before being open work, several stop after being connected work too; it seems the transition fwils |
You're right it's a good test to use to track this down.
|
Yeah definitely, I was a bit perplex by this var btw
Envoyé de mon iPhone
… Le 14 mai 2019 à 07:33, Benjamin Sergeant ***@***.***> a écrit :
You're right it's a good test to use to track this down.
There's _requestInitCancellation which we can use to interrupt the code that tries to reconnect.
We probably need more synchronization between the 2 threads, it's clearly a race condition between stop and connect.
WARNING: ThreadSanitizer: data race (pid=3630)
Write of size 8 at 0x7ffc53c08db8 by thread T5:
#0 void std::swap<ix::Socket*>(ix::Socket*&, ix::Socket*&) /usr/include/c++/5/bits/move.h:187 (ixwebsocket_unittest+0x0000005d9444)
#1 std::__shared_ptr<ix::Socket, (__gnu_cxx::_Lock_policy)2>::swap(std::__shared_ptr<ix::Socket, (__gnu_cxx::_Lock_policy)2>&) /usr/include/c++/5/bits/shared_ptr_base.h:1076 (ixwebsocket_unittest+0x0000005d90ce)
#2 std::__shared_ptr<ix::Socket, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<ix::Socket, (__gnu_cxx::_Lock_policy)2>&&) /usr/include/c++/5/bits/shared_ptr_base.h:1000 (ixwebsocket_unittest+0x0000005d8ef6)
#3 std::shared_ptr<ix::Socket>::operator=(std::shared_ptr<ix::Socket>&&) /usr/include/c++/5/bits/shared_ptr.h:294 (ixwebsocket_unittest+0x0000005d8b90)
#4 ix::WebSocketTransport::connectToUrl(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int) /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocketTransport.cpp:154 (ixwebsocket_unittest+0x0000005ee93e)
#5 ix::WebSocket::connect(int) /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocket.cpp:169 (ixwebsocket_unittest+0x0000005e36a1)
#6 ix::WebSocket::checkConnection(bool) /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocket.cpp:251 (ixwebsocket_unittest+0x0000005e3f1d)
#7 ix::WebSocket::run() /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocket.cpp:284 (ixwebsocket_unittest+0x0000005e45ad)
#8 void std::_Mem_fn_base<void (ix::WebSocket::*)(), true>::operator()<, void>(ix::WebSocket*) const /usr/include/c++/5/functional:600 (ixwebsocket_unittest+0x0000005e9794)
#9 void std::_Bind_simple<std::_Mem_fn<void (ix::WebSocket::*)()> (ix::WebSocket*)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/5/functional:1531 (ixwebsocket_unittest+0x0000005e96ab)
#10 std::_Bind_simple<std::_Mem_fn<void (ix::WebSocket::*)()> (ix::WebSocket*)>::operator()() /usr/include/c++/5/functional:1520 (ixwebsocket_unittest+0x0000005e94dc)
#11 std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (ix::WebSocket::*)()> (ix::WebSocket*)> >::_M_run() /usr/include/c++/5/thread:115 (ixwebsocket_unittest+0x0000005e9414)
#12 <null> <null> (libstdc++.so.6+0x0000000b8c7f)
Previous read of size 8 at 0x7ffc53c08db8 by main thread:
#0 std::__shared_ptr<ix::Socket, (__gnu_cxx::_Lock_policy)2>::operator->() const /usr/include/c++/5/bits/shared_ptr_base.h:1055 (ixwebsocket_unittest+0x0000005aa4c0)
#1 ix::WebSocketTransport::close(unsigned short, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned long, bool) /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocketTransport.cpp:1035 (ixwebsocket_unittest+0x0000005f320a)
#2 ix::WebSocket::close() /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocket.cpp:217 (ixwebsocket_unittest+0x0000005e3d6a)
#3 ix::WebSocket::stop() /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocket.cpp:147 (ixwebsocket_unittest+0x0000005e3502)
#4 stop /home/travis/build/machinezone/IXWebSocket/test/cmd_websocket_chat.cpp:95 (ixwebsocket_unittest+0x0000005c4b42)
#5 ____C_A_T_C_H____T_E_S_T____0 /home/travis/build/machinezone/IXWebSocket/test/cmd_websocket_chat.cpp:326 (ixwebsocket_unittest+0x0000005c6ab8)
#6 Catch::TestInvokerAsFunction::invoke() const /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:11868 (ixwebsocket_unittest+0x000000426050)
#7 Catch::TestCase::invoke() const /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:11769 (ixwebsocket_unittest+0x000000425462)
#8 Catch::RunContext::invokeActiveTestCase() /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:10622 (ixwebsocket_unittest+0x00000041e755)
#9 Catch::RunContext::runCurrentTest(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&) /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:10595 (ixwebsocket_unittest+0x00000041e3ba)
#10 Catch::RunContext::runTest(Catch::TestCase const&) /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:10365 (ixwebsocket_unittest+0x00000041c4da)
#11 runTests /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:10924 (ixwebsocket_unittest+0x000000420138)
#12 Catch::Session::runInternal() /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:11119 (ixwebsocket_unittest+0x000000421831)
#13 Catch::Session::run() /home/travis/build/machinezone/IXWebSocket/test/Catch2/single_include/catch.hpp:11076 (ixwebsocket_unittest+0x0000004214cd)
#14 int Catch::Session::run<char>(int, char const* const*) <null> (ixwebsocket_unittest+0x00000048ca3f)
#15 main /home/travis/build/machinezone/IXWebSocket/test/test_runner.cpp:25 (ixwebsocket_unittest+0x00000043c694)
As if synchronized via sleep:
#0 nanosleep <null> (libtsan.so.0+0x000000043c7a)
#1 void std::this_thread::sleep_for<double, std::ratio<1l, 1000l> >(std::chrono::duration<double, std::ratio<1l, 1000l> > const&) /usr/include/c++/5/thread:292 (ixwebsocket_unittest+0x000000562956)
#2 ix::WebSocket::checkConnection(bool) /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocket.cpp:247 (ixwebsocket_unittest+0x0000005e3eed)
#3 ix::WebSocket::run() /home/travis/build/machinezone/IXWebSocket/ixwebsocket/IXWebSocket.cpp:284 (ixwebsocket_unittest+0x0000005e45ad)
#4 void std::_Mem_fn_base<void (ix::WebSocket::*)(), true>::operator()<, void>(ix::WebSocket*) const /usr/include/c++/5/functional:600 (ixwebsocket_unittest+0x0000005e9794)
#5 void std::_Bind_simple<std::_Mem_fn<void (ix::WebSocket::*)()> (ix::WebSocket*)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) /usr/include/c++/5/functional:1531 (ixwebsocket_unittest+0x0000005e96ab)
#6 std::_Bind_simple<std::_Mem_fn<void (ix::WebSocket::*)()> (ix::WebSocket*)>::operator()() /usr/include/c++/5/functional:1520 (ixwebsocket_unittest+0x0000005e94dc)
#7 std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (ix::WebSocket::*)()> (ix::WebSocket*)> >::_M_run() /usr/include/c++/5/thread:115 (ixwebsocket_unittest+0x0000005e9414)
#8 <null> <null> (libstdc++.so.6+0x0000000b8c7f)
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I may have an idea of the problem and how to fix it, I will give it a try tomorrow |
If you can quickly describe your idea that would be great. I was thinking about having 2 different socket objects, but that sounds like a bandaid and a bad idea. |
The idea is to prevent the checkConnection() or at least the connect() inside it to set the _socket var before the close() is done (as the close is called on the other thread). |
The place of the mutex in the checkConnection has to be placed carefully because we need to be sure that we sent data on the background thread when closing on the old socket |
And if this is working, it may also solve the same tsan errors we had in the other test of close code and reason, when server is closing; if not the fix will looks like this one but adapted for no thread |
Got it. Thanks for the explanation. I might give this a try and let you know.
… On May 15, 2019, at 1:06 PM, Kumamon38 ***@***.***> wrote:
And if this is working, it may also solve the same tsan errors we had in the other test of close code and reason, when server is closing; if not the fix will looks like this one but adapted for no thread
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub <#77?email_source=notifications&email_token=AC2O6ULZOE2ENVJGIPNFW4TPVRUNZA5CNFSM4HMWK7PKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVPZQBI#issuecomment-492804101>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC2O6UP6P5WFFOYZT65K2LTPVRUNZANCNFSM4HMWK7PA>.
|
Okay, great 👍 |
I’ll be happy if all the errors preventing the PR to be merged and so on are finally gone |
So I added a mutex (recursive_mutex actually), I've enabled the test that was causing trouble, and now the tsan errors are gone. I've put the mutex in WebSocketTransport, so that they are kindof fine grained. The sad news is that I'd like not to use a recursive mutex. Now the failure on travis seems to be on macOS. If you can try to make a PR and try different strategies. I think that some of the complications relate to the fact that we are more correct now when processing the close code. But I'm not too sure about that. Maybe that problem has always been there. |
I have a 100% reproducible hang on macOS now (in master ...). That should make the bug easy to repro. |
So the hang was happening because we are polling in the background thread, without a timeout, and the control thread which is closing is not exiting of its main loop before calling poll. (control thread is waiting on _thread.join()). Last commit fixes this and we're back to a reliable green build -> 671c9f8 Unfortunately the close and friends unittest are not working now ... I think they are relying on code which can potentially hang. I wonder how we should fix this, but now we have some reliable code that triggers those problems. |
No description provided.
The text was updated successfully, but these errors were encountered: