-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Remove ThreadSanitizer warnings #4579
Conversation
libdevcore/Worker.cpp
Outdated
@@ -118,7 +118,8 @@ void Worker::terminate() | |||
std::unique_lock<Mutex> l(x_work); | |||
if (m_work) | |||
{ | |||
m_state.exchange(WorkerState::Killing); | |||
if (m_state.exchange(WorkerState::Killing) == WorkerState::Killing) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check also stopWorking() - it has similar pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a similar change there.
Codecov Report
@@ Coverage Diff @@
## develop #4579 +/- ##
===========================================
+ Coverage 53.38% 59.28% +5.9%
===========================================
Files 1611 1085 -526
Lines 69237 55458 -13779
Branches 7091 3617 -3474
===========================================
- Hits 36963 32880 -4083
+ Misses 31060 21486 -9574
+ Partials 1214 1092 -122
Continue to review full report at Codecov.
|
AppVeyor is stuck at "updateStats" test. I'll try it with ThreadSanitizer. |
I cannot reproduce the AppVeyor issue. |
libethashseal/EthashClient.cpp
Outdated
@@ -59,6 +59,11 @@ EthashClient::EthashClient( | |||
asEthashClient(*this); | |||
} | |||
|
|||
EthashClient::~EthashClient() | |||
{ | |||
terminate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it will be called twice - first in ~EthashClient
, then in ~Client
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, which is not a problem. I'll comment about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think not calling terminate
here shouldn't cause any problems, because no virtual functions are overriden in EthashClient
But if it fixes some TSAN warnings, then fine.
Also note that there's ClientTest
class without this call, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this change I see a warning:
WARNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=21629)
Write of size 8 at 0x7ba400000550 by main thread:
#0 ~Client /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:82 (eth+0x697c0d)
#1 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.h:37 (discriminator 2) (eth+0x7b4021)
#2 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.h:37 (eth+0x7b4049)
#3 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:76 (discriminator 1) (eth+0x83039f)
#4 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:347 (eth+0x82e179)
#5 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:91 (eth+0x82d596)
#6 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x514a19)
Previous read of size 8 at 0x7ba400000550 by thread T13:
#0 dev::Worker::workLoop() /home/yh/src/cpp-ethereum2/libdevcore/Worker.cpp:140 (eth+0x852597)
#1 operator() /home/yh/src/cpp-ethereum2/libdevcore/Worker.cpp:62 (eth+0x852afd)
#2 void std::_Bind_simple<dev::Worker::startWorking()::$_0 ()>::_M_invoke<>(std::_Index_tuple<>) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1390 (discriminator 2) (eth+0x852989)
#3 std::_Bind_simple<dev::Worker::startWorking()::$_0 ()>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1380 (eth+0x852939)
#4 std::thread::_State_impl<std::_Bind_simple<dev::Worker::startWorking()::$_0 ()> >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:197 (eth+0x8527fd)
#5 std::error_code::default_error_condition() const ??:? (libstdc++.so.6+0xbb83e)
Myaybe the destructor was forwarded or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see, vptr is probaly being modified somewhere between ~EthashClient
& ~Client
body and worker thread can access it at this moment, that's the problem.
Let's leave terminate()
here. And add it to ClientTest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
libethereum/Client.h
Outdated
@@ -198,6 +198,9 @@ class Client: public ClientBase, protected Worker | |||
virtual Block block(h256 const& _block) const override; | |||
using ClientBase::block; | |||
|
|||
/// should be called after the constructor of the most derived class finishes. | |||
void afterConstructor() { startWorking(); }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would call this method startWorking()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll try that.
libethereum/EthereumHost.cpp
Outdated
@@ -395,6 +412,8 @@ EthereumHost::EthereumHost(BlockChain const& _ch, OverlayDB const& _db, Transact | |||
EthereumHost::~EthereumHost() | |||
{ | |||
terminate(); | |||
m_peerObserver->disableSync(); // m_peerObserver contains a reference to BlockchainSync, so that should be disabled first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead just do m_peerObserver.reset()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see, probably not, because peers still have shared_ptr to observer.
But then - can we make m_sync
member of EthereumPeerObserver
to be shared_ptr? This should solve the problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
libethereum/ClientBase.h
Outdated
@@ -188,6 +188,8 @@ class ClientBase: public Interface | |||
std::unordered_map<h256, h256s> m_specialFilters = std::unordered_map<h256, std::vector<h256>>{{PendingChangedFilter, {}}, {ChainChangedFilter, {}}}; | |||
///< The dictionary of special filters and their additional data | |||
std::map<unsigned, ClientWatch> m_watches; ///< Each and every watch - these reference a filter. | |||
std::condition_variable m_signalled; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this move help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ClientBase::m_tq
uses this condition variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was the warning:
Write of size 8 at 0x7ba400004df8 by main thread:
#0 pthread_cond_destroy ??:? (eth+0x48d790) // cond destroy.
#1 ~Client /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:85 (discriminator 3) (eth+0x695d80)
#2 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.cpp:65 (discriminator 4) (eth+0x7b112d)
#3 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.cpp:63 (eth+0x7b11b9)
#4 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:76 (discriminator 1) (eth+0x82e62f)
#5 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:347 (eth+0x82c3c9)
#6 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:91 (eth+0x82b7e6)
#7 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x512b49)
Previous read of size 8 at 0x7ba400004df8 by thread T6:
#0 pthread_cond_broadcast ??:? (eth+0x4869fc)
#1 std::condition_variable::notify_all() ??:? (libstdc++.so.6+0xb5848)
#2 operator() /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:102 (eth+0x69bb21)
#3 std::_Function_handler<void (), dev::eth::Client::init(dev::p2p::Host*, boost::filesystem::path const&, dev::WithExisting, boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>)::$_5>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1731 (discriminator 1) (eth+0x69b9f1)
#4 std::function<void ()>::operator()() const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:2127 (eth+0x63dc03)
#5 dev::eth::Signal<>::HandlerAux::fire() /home/yh/src/cpp-ethereum2/libdevcore/../libethcore/Common.h:164 (eth+0x63c78d)
#6 dev::eth::Signal<>::operator()() /home/yh/src/cpp-ethereum2/libdevcore/../libethcore/Common.h:193 (eth+0x63096e)
#7 dev::eth::TransactionQueue::manageImport_WITH_LOCK(dev::FixedHash<32u> const&, dev::eth::Transaction const&) /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:171 (eth+0x74dc9a)
#8 dev::eth::TransactionQueue::import(dev::eth::Transaction const&, dev::eth::IfDropped) /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:99 (eth+0x74d734)
#9 dev::eth::TransactionQueue::verifierBody() /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:396 (eth+0x74f733)
#10 operator() /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:45 (eth+0x750181)
#11 void std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()>::_M_invoke<>(std::_Index_tuple<>) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1390 (discriminator 2) (eth+0x7500b9)
#12 std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1380 (eth+0x750069)
#13 std::thread::_State_impl<std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()> >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:197 (eth+0x74ff2d)
#14 std::error_code::default_error_condition() const ??:? (libstdc++.so.6+0xbb83e)
As if synchronized via sleep:
#0 nanosleep ??:? (eth+0x488457)
#1 void std::this_thread::sleep_for<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:323 (discriminator 1) (eth+0x51926a)
#2 dev::p2p::Host::stop() /home/yh/src/cpp-ethereum2/libp2p/Host.cpp:153 (discriminator 1) (eth+0x8cdf31)
#3 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:90 (eth+0x82b7d5)
#4 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x512b49)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, I would actually prefer to move m_tq
into Client
(so that it's close to m_bq
& m_bc
and all transaction queue handling logic is in one class)
You'd need something like protected virtual TransactionQueue& tq() = 0;
in the ClientBase
to make it work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By @gumb0 's suggestion, I'll try to move all these into Client
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I remove this change, I don't see the error anymore. Maybe the other fixes already killed this warning. So I just remove this commit.
Took @gumb0's suggestions and still saw no ThreadSanitizer warnings. |
The problem persists after a restart. |
libweb3jsonrpc/IpcServerBase.h
Outdated
@@ -46,7 +47,7 @@ template <class S> class IpcServerBase: public jsonrpc::AbstractServerConnector | |||
void GenerateResponse(S _connection); | |||
|
|||
protected: | |||
bool m_running = false; | |||
std::atomic<bool> m_running{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use then std::atomic::exchange()
in IpcServerBase::StartListening
& IpcServerBase::StopListening
instead of if+assignment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Will do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
libp2p/Host.cpp
Outdated
{ | ||
bool peerTypeIsRequired{}; | ||
{ | ||
RecursiveGuard l(x_sessions); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks weird... toConnect
is local list of shared_ptrs, do you try to protect Peer members with a mutex for sessions?
Maybe let's try to make Node::peerType
atomic, will it help?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that worked.
libp2p/Host.h
Outdated
@@ -318,6 +318,8 @@ class Host: public Worker | |||
/// Peers we try to connect regardless of p2p network. | |||
std::set<NodeID> m_requiredPeers; | |||
Mutex x_requiredPeers; | |||
/// returns true if a member of m_requiredPeers | |||
bool isRequiredPeer(NodeID const&); // cannot be const because of Mutex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the usual thing to do in this case is to make mutex mutable
and leave the method const
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@@ -164,7 +163,7 @@ class EthereumPeerObserver: public EthereumPeerObserverFace | |||
} | |||
|
|||
private: | |||
BlockChainSync& m_sync; | |||
shared_ptr<BlockChainSync> m_sync; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid we have the same problem with m_syncMutex
below. If EthereumHost
gets destroyed before EthereumPeerObserver
mutex is destroyed with it... Not sure yet what to do with it
m_tq
is destroyed with the Client
, also not sure if EthereumPeers
can outlive Client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(for myself) try to remove m_syncMutex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somehow I'm making other classes hold m_tq
as a weak pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I was able to remove m_syncMutex
and still seeing no warnings from ThreadSanitizer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About m_tq
, I decided to use shared/weak pointers.
libp2p/Host.cpp
Outdated
@@ -155,6 +155,9 @@ void Host::stop() | |||
// stop worker thread | |||
if (isWorking()) | |||
stopWorking(); | |||
|
|||
// This destroyes EthereumHost objects. EthereumHost objects contain a const reference to Client's, so they need to disappear first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What did you mean by this? I think EtherumHost
has references to member of Client
, but not to Client
itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the comment.
On a 8 core machine, I see more TSAN errors. |
4665473
to
cd0beb3
Compare
I'm again seeing some new TSAN warnings, but that looks like what #4610 is about. |
Again segfault in rndStateTest. |
ThreadSanitizer warned: WARNING: ThreadSanitizer: data race (pid=12735) Read of size 1 at 0x7b3000006b04 by main thread (mutexes: write M1110): #0 memcpy ??:? (eth+0x4977d7) #1 Peer /home/yh/src/cpp-ethereum/libdevcore/../libp2p/Peer.h:52 (eth+0x833846) #2 void __gnu_cxx::new_allocator<std::_List_node<dev::p2p::Peer> >::construct<dev::p2p::Peer, dev::p2p::Peer const&>(dev::p2p::Peer*, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/ext/new_allocator.h:136 (eth+0x908718) #3 void std::allocator_traits<std::allocator<std::_List_node<dev::p2p::Peer> > >::construct<dev::p2p::Peer, dev::p2p::Peer const&>(std::allocator<std::_List_node<dev::p2p::Peer> >&, dev::p2p::Peer*, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/alloc_traits.h:475 (eth+0x908551) #4 std::_List_node<dev::p2p::Peer>* std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::_M_create_node<dev::p2p::Peer const&>(dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:575 (eth+0x908400) #5 void std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::_M_insert<dev::p2p::Peer const&>(std::_List_iterator<dev::p2p::Peer>, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:1801 (eth+0x90834e) #6 std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::push_back(dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:1118 (eth+0x8e40eb) Previous write of size 4 at 0x7b3000006b04 by thread T30: #0 dev::p2p::Session::addRating(int) /home/yh/src/cpp-ethereum/libp2p/Session.cpp:89 (eth+0x951d86) #1 dev::p2p::Capability::addRating(int) /home/yh/src/cpp-ethereum/libp2p/Capability.cpp:59 (eth+0x8c1122)
For avoiding this warning from ThreadSanitizer: ================== WARNING: ThreadSanitizer: data race (pid=72207) Write of size 8 at 0x7ffca08dd960 by thread T30: #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*> >, std::is_move_constructible<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*>, std::is_move_assignable<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*> >::value, void>::type std::swap<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*>(boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*&, boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*&) /usr/include/c++/7/bits/move.h:199 (eth+0x00000057b9f1) #1 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::reset(boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*) /usr/include/c++/7/bits/unique_ptr.h:374 (eth+0x00000057b9f1) #2 dev::p2p::Host::run(boost::system::error_code const&) /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:665 (eth+0x00000057b9f1) #3 operator() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:720 (eth+0x00000057c426) #4 operator() /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/bind_handler.hpp:47 (eth+0x00000057c426) #5 asio_handler_invoke<boost::asio::detail::binder1<dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)>, boost::system::error_code> > /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/handler_invoke_hook.hpp:69 (eth+0x00000057c426) #6 invoke<boost::asio::detail::binder1<dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)>, boost::system::error_code>, dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)> > /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/handler_invoke_helpers.hpp:37 (eth+0x00000057c426) #7 do_complete /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/wait_handler.hpp:70 (eth+0x00000057c426) #8 boost::asio::detail::task_io_service_operation::complete(boost::asio::detail::task_io_service&, boost::system::error_code const&, unsigned long) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/task_io_service_operation.hpp:38 (eth+0x00000058c145) #9 boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/impl/task_io_service.ipp:372 (eth+0x00000058c145) #10 boost::asio::detail::task_io_service::run(boost::system::error_code&) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/impl/task_io_service.ipp:149 (eth+0x00000058c145) #11 boost::asio::io_service::run() /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/impl/io_service.ipp:59 (eth+0x00000057168e) #12 dev::p2p::Host::doWork() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:774 (eth+0x00000057168e) #13 dev::Worker::workLoop() /home/yh/src/cpp-ethereum-vptr/libdevcore/Worker.cpp:140 (eth+0x0000004a9f64) Previous read of size 8 at 0x7ffca08dd960 by main thread: #0 std::__uniq_ptr_impl<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::_M_ptr() const /usr/include/c++/7/bits/unique_ptr.h:147 (eth+0x0000005701cd) #1 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::get() const /usr/include/c++/7/bits/unique_ptr.h:337 (eth+0x0000005701cd) #2 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::operator bool() const /usr/include/c++/7/bits/unique_ptr.h:351 (eth+0x0000005701cd) #3 dev::p2p::Host::stop() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:152 (eth+0x0000005701cd) #4 dev::WebThreeDirect::~WebThreeDirect() /home/yh/src/cpp-ethereum-vptr/libwebthree/WebThree.cpp:90 (eth+0x000000456f75) #5 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:937 (eth+0x0000000c0ff2) Location is stack of main thread.
For removing the following warning from ThreadSanitizer ================== WARNING: ThreadSanitizer: data race (pid=68621) Write of size 8 at 0x7ffc9f6af3b8 by thread T30: #0 std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/7/bits/shared_ptr_base.h:1034 (eth+0x0000005805bb) #1 std::shared_ptr<dev::p2p::NodeTable>::operator=(std::shared_ptr<dev::p2p::NodeTable> const&) /usr/include/c++/7/bits/shared_ptr.h:93 (eth+0x0000005805bb) #2 dev::p2p::Host::startedWorking() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:761 (eth+0x0000005805bb) #3 operator() /home/yh/src/cpp-ethereum-vptr/libdevcore/Worker.cpp:61 (eth+0x0000004ab343) #4 __invoke_impl<void, dev::Worker::startWorking()::<lambda()> > /usr/include/c++/7/bits/invoke.h:60 (eth+0x0000004ab6a7) #5 __invoke<dev::Worker::startWorking()::<lambda()> > /usr/include/c++/7/bits/invoke.h:95 (eth+0x0000004ab6a7) #6 _M_invoke<0> /usr/include/c++/7/thread:234 (eth+0x0000004ab6a7) #7 operator() /usr/include/c++/7/thread:243 (eth+0x0000004ab6a7) #8 _M_run /usr/include/c++/7/thread:186 (eth+0x0000004ab6a7) #9 <null> <null> (libstdc++.so.6+0x0000000bc0fe) Previous read of size 8 at 0x7ffc9f6af3b8 by main thread (mutexes: write M1111): #0 std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2>::operator bool() const /usr/include/c++/7/bits/shared_ptr_base.h:1261 (eth+0x0000005716f8) #1 dev::p2p::Host::haveNetwork() const /home/yh/src/cpp-ethereum-vptr/libp2p/Host.h:224 (eth+0x0000005716f8) #2 dev::p2p::Host::start() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:129 (eth+0x0000005716f8) #3 dev::WebThreeDirect::startNetwork() /home/yh/src/cpp-ethereum-vptr/libdevcore/../libwebthree/WebThree.h:204 (eth+0x0000000bf0d9) #4 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:1117 (eth+0x0000000bf0d9) atomic_load and atomic_store work on most tool chains but not on gcc 4.9 https://stackoverflow.com/questions/34205880/why-does-atomic-load-with-a-shared-ptr-not-compile-with-gcc-4-9
I tried to simplify a little the code accessing |
@gumb0 I'm on it. |
Yes, TSAN seems happy. For the record, this is the command I run
and then send a SIGINT. |
@gumb0 do you want to make more changes? |
Not sure yet, I will try to finish review today. |
libp2p/Host.cpp
Outdated
unique_lock<mutex> l(x_runTimer); | ||
// ignore if already stopped/stopping, at the same time, | ||
// signal run() to prepare for shutdown and reset m_timer | ||
if (!m_run.exchange(false)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why do you do this under x_runTimer
lock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I move this out above the block, I see
==================
WARNING: ThreadSanitizer: data race (pid=65004)
Write of size 8 at 0x7ba400004df8 by main thread:
#0 pthread_cond_destroy <null> (libtsan.so.0+0x000000029892)
#1 dev::eth::Client::~Client() /home/yh/src/cpp-ethereum-vptr/libethereum/Client.cpp:82 (eth+0x00000022094c)
#2 dev::eth::EthashClient::~EthashClient() /home/yh/src/cpp-ethereum-vptr/libethashseal/EthashClient.cpp:62 (eth+0x000000389795)
#3 dev::eth::EthashClient::~EthashClient() /home/yh/src/cpp-ethereum-vptr/libethashseal/EthashClient.cpp:65 (eth+0x0000003897e8)
#4 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/include/c++/7/bits/unique_ptr.h:78 (eth+0x0000004551c3)
#5 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/include/c++/7/bits/unique_ptr.h:376 (eth+0x0000004551c3)
#6 dev::WebThreeDirect::~WebThreeDirect() /home/yh/src/cpp-ethereum-vptr/libwebthree/WebThree.cpp:91 (eth+0x0000004551c3)
#7 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:937 (eth+0x0000000bf8f2)
Previous read of size 8 at 0x7ba400004df8 by thread T28:
#0 pthread_cond_broadcast <null> (libtsan.so.0+0x00000002975f)
#1 std::condition_variable::notify_all() <null> (libstdc++.so.6+0x0000000b5df8)
#2 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (eth+0x0000001cf4f3)
#3 dev::eth::Signal<>::operator()() <null> (eth+0x0000001d6612)
#4 dev::eth::TransactionQueue::manageImport_WITH_LOCK(dev::FixedHash<32u> const&, dev::eth::Transaction const&) /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:171 (eth+0x00000031b951)
#5 dev::eth::TransactionQueue::import(dev::eth::Transaction const&, dev::eth::IfDropped) /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:99 (eth+0x00000031bee0)
#6 dev::eth::TransactionQueue::verifierBody() /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:396 (eth+0x00000031c49c)
#7 operator() /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:45 (eth+0x00000031ccac)
#8 __invoke_impl<void, dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::<lambda()> > /usr/include/c++/7/bits/invoke.h:60 (eth+0x00000031cd57)
#9 __invoke<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::<lambda()> > /usr/include/c++/7/bits/invoke.h:95 (eth+0x00000031cd57)
#10 _M_invoke<0> /usr/include/c++/7/thread:234 (eth+0x00000031cd57)
#11 operator() /usr/include/c++/7/thread:243 (eth+0x00000031cd57)
#12 _M_run /usr/include/c++/7/thread:186 (eth+0x00000031cd57)
#13 <null> <null> (libstdc++.so.6+0x0000000bc0fe)
As if synchronized via sleep:
#0 nanosleep <null> (libtsan.so.0+0x00000004c5a1)
#1 void std::this_thread::sleep_for<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&) /usr/include/c++/7/thread:376 (eth+0x0000000b723c)
#2 stopSealingAfterXBlocks(dev::eth::Client*, unsigned int, unsigned int&) /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:274 (eth+0x0000000b723c)
#3 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:1247 (eth+0x0000000bf614)
I'm confused now because Host.cpp
doesn't appear in the stacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gumb0 forgetting the TSAN warning, I can somehow explain. If I reset m_run
without the lock, the worker termination can immediately happen, m_timerSemaphore
might get a notification, even before this thread gets the lock. And, after this thread obtains the lock, it waits on m_timerSemaphore
forever.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But no, while (m_timer)
will be false
in this case, so it won't try to wait on m_timerSemaphore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These stacks I think show that in Client
condition variable m_signalled
is being destroyed while still being used by the m_tqReady
handler.
Can you try moving m_signalled
definition above m_tqReady
in Client.h
to change destruction order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I would name m_timerSemaphore
something like m_timerReset
(it's not a semaphore, it's a condition variable anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done an done. I'm not seeing TSAN warnings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, let's merge!
I'm somewhat worried to be honest, because it all got more complicated, but we need to proceed somehow, let's see what happens next
Turning invisible complications into visible complications... |
This PR removes various ThreadSanitizer warnings. Each commit deals with one or two warnings.
Fixes #4544.