Skip to content
This repository was archived by the owner on Oct 28, 2021. It is now read-only.

Conversation

pirapira
Copy link
Member

@pirapira pirapira commented Oct 5, 2017

This PR removes various ThreadSanitizer warnings. Each commit deals with one or two warnings.

Fixes #4544.

@@ -118,7 +118,8 @@ void Worker::terminate()
std::unique_lock<Mutex> l(x_work);
if (m_work)
{
m_state.exchange(WorkerState::Killing);
if (m_state.exchange(WorkerState::Killing) == WorkerState::Killing)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check also stopWorking() - it has similar pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a similar change there.

@codecov-io
Copy link

codecov-io commented Oct 6, 2017

Codecov Report

Merging #4579 into develop will increase coverage by 5.9%.
The diff coverage is 52.17%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4579      +/-   ##
===========================================
+ Coverage    53.38%   59.28%    +5.9%     
===========================================
  Files         1611     1085     -526     
  Lines        69237    55458   -13779     
  Branches      7091     3617    -3474     
===========================================
- Hits         36963    32880    -4083     
+ Misses       31060    21486    -9574     
+ Partials      1214     1092     -122
Impacted Files Coverage Δ
libethereum/EthereumHost.h 0% <ø> (ø) ⬆️
libethereum/EthereumPeer.h 77.77% <ø> (ø) ⬆️
libethereum/Client.h 66.66% <ø> (ø) ⬆️
libp2p/Session.cpp 73.19% <0%> (-1.34%) ⬇️
libp2p/Peer.cpp 55.88% <100%> (+4.15%) ⬆️
libp2p/Peer.h 88.88% <100%> (ø) ⬆️
libp2p/Host.h 78.57% <100%> (+8.57%) ⬆️
libethereum/EthereumPeer.cpp 40% <11.76%> (+0.86%) ⬆️
libethereum/EthereumHost.cpp 7.83% <15.38%> (-1.77%) ⬇️
libp2p/Host.cpp 72% <80.48%> (+1.19%) ⬆️
... and 946 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4100691...c34498b. Read the comment docs.

@pirapira
Copy link
Member Author

pirapira commented Oct 6, 2017

AppVeyor is stuck at "updateStats" test. I'll try it with ThreadSanitizer.

@pirapira
Copy link
Member Author

pirapira commented Oct 6, 2017

I cannot reproduce the AppVeyor issue.

@pirapira pirapira requested a review from gumb0 October 6, 2017 10:04
@@ -59,6 +59,11 @@ EthashClient::EthashClient(
asEthashClient(*this);
}

EthashClient::~EthashClient()
{
terminate();
Copy link
Member

@gumb0 gumb0 Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it will be called twice - first in ~EthashClient, then in ~Client ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, which is not a problem. I'll comment about that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not calling terminate here shouldn't cause any problems, because no virtual functions are overriden in EthashClient
But if it fixes some TSAN warnings, then fine.

Also note that there's ClientTest class without this call, too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this change I see a warning:

WARNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=21629)
  Write of size 8 at 0x7ba400000550 by main thread:
    #0 ~Client /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:82 (eth+0x697c0d)
    #1 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.h:37 (discriminator 2) (eth+0x7b4021)
    #2 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.h:37 (eth+0x7b4049)
    #3 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:76 (discriminator 1) (eth+0x83039f)
    #4 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:347 (eth+0x82e179)
    #5 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:91 (eth+0x82d596)
    #6 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x514a19)

  Previous read of size 8 at 0x7ba400000550 by thread T13:
    #0 dev::Worker::workLoop() /home/yh/src/cpp-ethereum2/libdevcore/Worker.cpp:140 (eth+0x852597)
    #1 operator() /home/yh/src/cpp-ethereum2/libdevcore/Worker.cpp:62 (eth+0x852afd)
    #2 void std::_Bind_simple<dev::Worker::startWorking()::$_0 ()>::_M_invoke<>(std::_Index_tuple<>) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1390 (discriminator 2) (eth+0x852989)
    #3 std::_Bind_simple<dev::Worker::startWorking()::$_0 ()>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1380 (eth+0x852939)
    #4 std::thread::_State_impl<std::_Bind_simple<dev::Worker::startWorking()::$_0 ()> >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:197 (eth+0x8527fd)
    #5 std::error_code::default_error_condition() const ??:? (libstdc++.so.6+0xbb83e)

Myaybe the destructor was forwarded or something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I see, vptr is probaly being modified somewhere between ~EthashClient & ~Client body and worker thread can access it at this moment, that's the problem.
Let's leave terminate() here. And add it to ClientTest

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -198,6 +198,9 @@ class Client: public ClientBase, protected Worker
virtual Block block(h256 const& _block) const override;
using ClientBase::block;

/// should be called after the constructor of the most derived class finishes.
void afterConstructor() { startWorking(); };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this method startWorking()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I'll try that.

@@ -395,6 +412,8 @@ EthereumHost::EthereumHost(BlockChain const& _ch, OverlayDB const& _db, Transact
EthereumHost::~EthereumHost()
{
terminate();
m_peerObserver->disableSync(); // m_peerObserver contains a reference to BlockchainSync, so that should be disabled first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we instead just do m_peerObserver.reset() ?

Copy link
Member

@gumb0 gumb0 Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, probably not, because peers still have shared_ptr to observer.

But then - can we make m_sync member of EthereumPeerObserver to be shared_ptr? This should solve the problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -188,6 +188,8 @@ class ClientBase: public Interface
std::unordered_map<h256, h256s> m_specialFilters = std::unordered_map<h256, std::vector<h256>>{{PendingChangedFilter, {}}, {ChainChangedFilter, {}}};
///< The dictionary of special filters and their additional data
std::map<unsigned, ClientWatch> m_watches; ///< Each and every watch - these reference a filter.
std::condition_variable m_signalled;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this move help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClientBase::m_tq uses this condition variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was the warning:

  Write of size 8 at 0x7ba400004df8 by main thread:
    #0 pthread_cond_destroy ??:? (eth+0x48d790) // cond destroy.
    #1 ~Client /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:85 (discriminator 3) (eth+0x695d80)
    #2 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.cpp:65 (discriminator 4) (eth+0x7b112d)
    #3 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.cpp:63 (eth+0x7b11b9)
    #4 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:76 (discriminator 1) (eth+0x82e62f)
    #5 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:347 (eth+0x82c3c9)
    #6 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:91 (eth+0x82b7e6)
    #7 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x512b49)

  Previous read of size 8 at 0x7ba400004df8 by thread T6:
    #0 pthread_cond_broadcast ??:? (eth+0x4869fc)
    #1 std::condition_variable::notify_all() ??:? (libstdc++.so.6+0xb5848)
    #2 operator() /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:102 (eth+0x69bb21)
    #3 std::_Function_handler<void (), dev::eth::Client::init(dev::p2p::Host*, boost::filesystem::path const&, dev::WithExisting, boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>)::$_5>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1731 (discriminator 1) (eth+0x69b9f1)
    #4 std::function<void ()>::operator()() const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:2127 (eth+0x63dc03)
    #5 dev::eth::Signal<>::HandlerAux::fire() /home/yh/src/cpp-ethereum2/libdevcore/../libethcore/Common.h:164 (eth+0x63c78d)
    #6 dev::eth::Signal<>::operator()() /home/yh/src/cpp-ethereum2/libdevcore/../libethcore/Common.h:193 (eth+0x63096e)
    #7 dev::eth::TransactionQueue::manageImport_WITH_LOCK(dev::FixedHash<32u> const&, dev::eth::Transaction const&) /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:171 (eth+0x74dc9a)
    #8 dev::eth::TransactionQueue::import(dev::eth::Transaction const&, dev::eth::IfDropped) /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:99 (eth+0x74d734)
    #9 dev::eth::TransactionQueue::verifierBody() /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:396 (eth+0x74f733)
    #10 operator() /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:45 (eth+0x750181)
    #11 void std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()>::_M_invoke<>(std::_Index_tuple<>) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1390 (discriminator 2) (eth+0x7500b9)
    #12 std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1380 (eth+0x750069)
    #13 std::thread::_State_impl<std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()> >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:197 (eth+0x74ff2d)
    #14 std::error_code::default_error_condition() const ??:? (libstdc++.so.6+0xbb83e)

  As if synchronized via sleep:
    #0 nanosleep ??:? (eth+0x488457)
    #1 void std::this_thread::sleep_for<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:323 (discriminator 1) (eth+0x51926a)
    #2 dev::p2p::Host::stop() /home/yh/src/cpp-ethereum2/libp2p/Host.cpp:153 (discriminator 1) (eth+0x8cdf31)
    #3 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:90 (eth+0x82b7d5)
    #4 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x512b49)

Copy link
Member

@gumb0 gumb0 Oct 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I would actually prefer to move m_tq into Client (so that it's close to m_bq & m_bc and all transaction queue handling logic is in one class)

You'd need something like protected virtual TransactionQueue& tq() = 0; in the ClientBase to make it work

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By @gumb0 's suggestion, I'll try to move all these into Client.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I remove this change, I don't see the error anymore. Maybe the other fixes already killed this warning. So I just remove this commit.

@pirapira
Copy link
Member Author

pirapira commented Oct 9, 2017

Took @gumb0's suggestions and still saw no ThreadSanitizer warnings.

@pirapira
Copy link
Member Author

CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage

The problem persists after a restart.

@@ -46,7 +47,7 @@ template <class S> class IpcServerBase: public jsonrpc::AbstractServerConnector
void GenerateResponse(S _connection);

protected:
bool m_running = false;
std::atomic<bool> m_running{false};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use then std::atomic::exchange() in IpcServerBase::StartListening & IpcServerBase::StopListening instead of if+assignment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Will do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

libp2p/Host.cpp Outdated
{
bool peerTypeIsRequired{};
{
RecursiveGuard l(x_sessions);
Copy link
Member

@gumb0 gumb0 Oct 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks weird... toConnect is local list of shared_ptrs, do you try to protect Peer members with a mutex for sessions?

Maybe let's try to make Node::peerType atomic, will it help?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that worked.

libp2p/Host.h Outdated
@@ -318,6 +318,8 @@ class Host: public Worker
/// Peers we try to connect regardless of p2p network.
std::set<NodeID> m_requiredPeers;
Mutex x_requiredPeers;
/// returns true if a member of m_requiredPeers
bool isRequiredPeer(NodeID const&); // cannot be const because of Mutex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the usual thing to do in this case is to make mutex mutable and leave the method const

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

pirapira added a commit that referenced this pull request Oct 10, 2017
pirapira added a commit that referenced this pull request Oct 10, 2017
pirapira added a commit that referenced this pull request Oct 10, 2017
@@ -164,7 +163,7 @@ class EthereumPeerObserver: public EthereumPeerObserverFace
}

private:
BlockChainSync& m_sync;
shared_ptr<BlockChainSync> m_sync;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid we have the same problem with m_syncMutex below. If EthereumHost gets destroyed before EthereumPeerObserver mutex is destroyed with it... Not sure yet what to do with it

m_tq is destroyed with the Client, also not sure if EthereumPeers can outlive Client

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(for myself) try to remove m_syncMutex.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somehow I'm making other classes hold m_tq as a weak pointer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I was able to remove m_syncMutex and still seeing no warnings from ThreadSanitizer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About m_tq, I decided to use shared/weak pointers.

libp2p/Host.cpp Outdated
@@ -155,6 +155,9 @@ void Host::stop()
// stop worker thread
if (isWorking())
stopWorking();

// This destroyes EthereumHost objects. EthereumHost objects contain a const reference to Client's, so they need to disappear first.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What did you mean by this? I think EtherumHost has references to member of Client, but not to Client itself?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment.

@pirapira
Copy link
Member Author

On a 8 core machine, I see more TSAN errors.

@pirapira pirapira changed the title Remove ThreadSanitizer warnings [wip] Remove ThreadSanitizer warnings Oct 12, 2017
@pirapira pirapira force-pushed the client-vptr branch 2 times, most recently from 4665473 to cd0beb3 Compare October 23, 2017 14:04
pirapira added a commit that referenced this pull request Oct 23, 2017
@pirapira
Copy link
Member Author

pirapira commented Oct 24, 2017

I'm again seeing some new TSAN warnings, but that looks like what #4610 is about.

@pirapira
Copy link
Member Author

Again segfault in rndStateTest.

ThreadSanitizer warned:

WARNING: ThreadSanitizer: data race (pid=12735)
  Read of size 1 at 0x7b3000006b04 by main thread (mutexes: write M1110):
    #0 memcpy ??:? (eth+0x4977d7)
    #1 Peer /home/yh/src/cpp-ethereum/libdevcore/../libp2p/Peer.h:52 (eth+0x833846)
    #2 void __gnu_cxx::new_allocator<std::_List_node<dev::p2p::Peer> >::construct<dev::p2p::Peer, dev::p2p::Peer const&>(dev::p2p::Peer*, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/ext/new_allocator.h:136 (eth+0x908718)
    #3 void std::allocator_traits<std::allocator<std::_List_node<dev::p2p::Peer> > >::construct<dev::p2p::Peer, dev::p2p::Peer const&>(std::allocator<std::_List_node<dev::p2p::Peer> >&, dev::p2p::Peer*, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/alloc_traits.h:475 (eth+0x908551)
    #4 std::_List_node<dev::p2p::Peer>* std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::_M_create_node<dev::p2p::Peer const&>(dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:575 (eth+0x908400)
    #5 void std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::_M_insert<dev::p2p::Peer const&>(std::_List_iterator<dev::p2p::Peer>, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:1801 (eth+0x90834e)
    #6 std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::push_back(dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:1118
(eth+0x8e40eb)

  Previous write of size 4 at 0x7b3000006b04 by thread T30:
    #0 dev::p2p::Session::addRating(int)
/home/yh/src/cpp-ethereum/libp2p/Session.cpp:89 (eth+0x951d86)
    #1 dev::p2p::Capability::addRating(int)
/home/yh/src/cpp-ethereum/libp2p/Capability.cpp:59 (eth+0x8c1122)
For avoiding this warning from ThreadSanitizer:
==================
WARNING: ThreadSanitizer: data race (pid=72207)
  Write of size 8 at 0x7ffca08dd960 by thread T30:
    #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*> >, std::is_move_constructible<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*>, std::is_move_assignable<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*> >::value, void>::type std::swap<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*>(boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*&, boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*&) /usr/include/c++/7/bits/move.h:199 (eth+0x00000057b9f1)
    #1 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::reset(boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*) /usr/include/c++/7/bits/unique_ptr.h:374 (eth+0x00000057b9f1)
    #2 dev::p2p::Host::run(boost::system::error_code const&) /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:665 (eth+0x00000057b9f1)
    #3 operator() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:720 (eth+0x00000057c426)
    #4 operator() /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/bind_handler.hpp:47 (eth+0x00000057c426)
    #5 asio_handler_invoke<boost::asio::detail::binder1<dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)>, boost::system::error_code> > /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/handler_invoke_hook.hpp:69 (eth+0x00000057c426)
    #6 invoke<boost::asio::detail::binder1<dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)>, boost::system::error_code>, dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)> > /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/handler_invoke_helpers.hpp:37 (eth+0x00000057c426) #7 do_complete /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/wait_handler.hpp:70
(eth+0x00000057c426)
    #8 boost::asio::detail::task_io_service_operation::complete(boost::asio::detail::task_io_service&, boost::system::error_code const&, unsigned long) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/task_io_service_operation.hpp:38 (eth+0x00000058c145)
    #9 boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/impl/task_io_service.ipp:372 (eth+0x00000058c145)
    #10 boost::asio::detail::task_io_service::run(boost::system::error_code&) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/impl/task_io_service.ipp:149 (eth+0x00000058c145)
    #11 boost::asio::io_service::run() /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/impl/io_service.ipp:59 (eth+0x00000057168e)
    #12 dev::p2p::Host::doWork() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:774 (eth+0x00000057168e)
    #13 dev::Worker::workLoop() /home/yh/src/cpp-ethereum-vptr/libdevcore/Worker.cpp:140 (eth+0x0000004a9f64)

  Previous read of size 8 at 0x7ffca08dd960 by main thread:
    #0 std::__uniq_ptr_impl<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::_M_ptr() const /usr/include/c++/7/bits/unique_ptr.h:147 (eth+0x0000005701cd)
    #1 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::get() const /usr/include/c++/7/bits/unique_ptr.h:337 (eth+0x0000005701cd)
    #2 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::operator bool() const /usr/include/c++/7/bits/unique_ptr.h:351 (eth+0x0000005701cd)
    #3 dev::p2p::Host::stop() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:152 (eth+0x0000005701cd)
    #4 dev::WebThreeDirect::~WebThreeDirect() /home/yh/src/cpp-ethereum-vptr/libwebthree/WebThree.cpp:90 (eth+0x000000456f75)
    #5 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:937 (eth+0x0000000c0ff2)

  Location is stack of main thread.
For removing the following warning from ThreadSanitizer
==================
WARNING: ThreadSanitizer: data race (pid=68621)
  Write of size 8 at 0x7ffc9f6af3b8 by thread T30:
    #0 std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/7/bits/shared_ptr_base.h:1034 (eth+0x0000005805bb)
    #1 std::shared_ptr<dev::p2p::NodeTable>::operator=(std::shared_ptr<dev::p2p::NodeTable> const&) /usr/include/c++/7/bits/shared_ptr.h:93 (eth+0x0000005805bb)
    #2 dev::p2p::Host::startedWorking() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:761 (eth+0x0000005805bb)
    #3 operator() /home/yh/src/cpp-ethereum-vptr/libdevcore/Worker.cpp:61 (eth+0x0000004ab343)
    #4 __invoke_impl<void, dev::Worker::startWorking()::<lambda()> > /usr/include/c++/7/bits/invoke.h:60 (eth+0x0000004ab6a7)
    #5 __invoke<dev::Worker::startWorking()::<lambda()> > /usr/include/c++/7/bits/invoke.h:95 (eth+0x0000004ab6a7)
    #6 _M_invoke<0> /usr/include/c++/7/thread:234 (eth+0x0000004ab6a7)
    #7 operator() /usr/include/c++/7/thread:243 (eth+0x0000004ab6a7)
    #8 _M_run /usr/include/c++/7/thread:186 (eth+0x0000004ab6a7)
    #9 <null> <null> (libstdc++.so.6+0x0000000bc0fe)

  Previous read of size 8 at 0x7ffc9f6af3b8 by main thread (mutexes:
write M1111):
    #0 std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2>::operator bool() const /usr/include/c++/7/bits/shared_ptr_base.h:1261 (eth+0x0000005716f8)
    #1 dev::p2p::Host::haveNetwork() const /home/yh/src/cpp-ethereum-vptr/libp2p/Host.h:224 (eth+0x0000005716f8)
    #2 dev::p2p::Host::start() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:129 (eth+0x0000005716f8)
    #3 dev::WebThreeDirect::startNetwork() /home/yh/src/cpp-ethereum-vptr/libdevcore/../libwebthree/WebThree.h:204 (eth+0x0000000bf0d9)
    #4 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:1117 (eth+0x0000000bf0d9)

atomic_load and atomic_store work on most tool chains but not on gcc 4.9
https://stackoverflow.com/questions/34205880/why-does-atomic-load-with-a-shared-ptr-not-compile-with-gcc-4-9
@gumb0 gumb0 self-requested a review November 23, 2017 16:55
@gumb0
Copy link
Member

gumb0 commented Nov 24, 2017

I tried to simplify a little the code accessing m_nodeTable by extracting most of the operations with it to separate methods. @pirapira please verify that TSAN is still ok with it

@pirapira
Copy link
Member Author

@gumb0 I'm on it.

@pirapira
Copy link
Member Author

Yes, TSAN seems happy.

For the record, this is the command I run

eth/eth --ropsten --kill --pin --peerset "required:20c9ad97c081d63397d7b685a412227a40e23c8bdc6688c6f37e97cfbc22d2b4d1db1510d8f61e6a8866ad7f0e17c02b14182d37ea7c3c8b9c2683aeb6b733a1@52.169.14.227:30303 required:6ce05930c72abc632c58e2e4324f7c7ea478cec0ed4fa2528982cf34483094e9cbc9216e7aa349691242576d552a2a56aaeae426c5303ded677ce455ba1acd9d@13.84.180.240:30303" --listen 20000

and then send a SIGINT.

@pirapira
Copy link
Member Author

@gumb0 do you want to make more changes?

@gumb0
Copy link
Member

gumb0 commented Nov 27, 2017

Not sure yet, I will try to finish review today.

libp2p/Host.cpp Outdated
unique_lock<mutex> l(x_runTimer);
// ignore if already stopped/stopping, at the same time,
// signal run() to prepare for shutdown and reset m_timer
if (!m_run.exchange(false))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why do you do this under x_runTimer lock?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I move this out above the block, I see

==================
WARNING: ThreadSanitizer: data race (pid=65004)
  Write of size 8 at 0x7ba400004df8 by main thread:
    #0 pthread_cond_destroy <null> (libtsan.so.0+0x000000029892)
    #1 dev::eth::Client::~Client() /home/yh/src/cpp-ethereum-vptr/libethereum/Client.cpp:82 (eth+0x00000022094c)
    #2 dev::eth::EthashClient::~EthashClient() /home/yh/src/cpp-ethereum-vptr/libethashseal/EthashClient.cpp:62 (eth+0x000000389795)
    #3 dev::eth::EthashClient::~EthashClient() /home/yh/src/cpp-ethereum-vptr/libethashseal/EthashClient.cpp:65 (eth+0x0000003897e8)
    #4 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/include/c++/7/bits/unique_ptr.h:78 (eth+0x0000004551c3)
    #5 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/include/c++/7/bits/unique_ptr.h:376 (eth+0x0000004551c3)
    #6 dev::WebThreeDirect::~WebThreeDirect() /home/yh/src/cpp-ethereum-vptr/libwebthree/WebThree.cpp:91 (eth+0x0000004551c3)
    #7 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:937 (eth+0x0000000bf8f2)

  Previous read of size 8 at 0x7ba400004df8 by thread T28:
    #0 pthread_cond_broadcast <null> (libtsan.so.0+0x00000002975f)
    #1 std::condition_variable::notify_all() <null> (libstdc++.so.6+0x0000000b5df8)
    #2 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (eth+0x0000001cf4f3)
    #3 dev::eth::Signal<>::operator()() <null> (eth+0x0000001d6612)
    #4 dev::eth::TransactionQueue::manageImport_WITH_LOCK(dev::FixedHash<32u> const&, dev::eth::Transaction const&) /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:171 (eth+0x00000031b951)
    #5 dev::eth::TransactionQueue::import(dev::eth::Transaction const&, dev::eth::IfDropped) /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:99 (eth+0x00000031bee0)
    #6 dev::eth::TransactionQueue::verifierBody() /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:396 (eth+0x00000031c49c)
    #7 operator() /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:45 (eth+0x00000031ccac)
    #8 __invoke_impl<void, dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::<lambda()> > /usr/include/c++/7/bits/invoke.h:60 (eth+0x00000031cd57)
    #9 __invoke<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::<lambda()> > /usr/include/c++/7/bits/invoke.h:95 (eth+0x00000031cd57)
    #10 _M_invoke<0> /usr/include/c++/7/thread:234 (eth+0x00000031cd57)
    #11 operator() /usr/include/c++/7/thread:243 (eth+0x00000031cd57)
    #12 _M_run /usr/include/c++/7/thread:186 (eth+0x00000031cd57)
    #13 <null> <null> (libstdc++.so.6+0x0000000bc0fe)

  As if synchronized via sleep:
    #0 nanosleep <null> (libtsan.so.0+0x00000004c5a1)
    #1 void std::this_thread::sleep_for<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&) /usr/include/c++/7/thread:376 (eth+0x0000000b723c)
    #2 stopSealingAfterXBlocks(dev::eth::Client*, unsigned int, unsigned int&) /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:274 (eth+0x0000000b723c)
    #3 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:1247 (eth+0x0000000bf614)

I'm confused now because Host.cpp doesn't appear in the stacks.

Copy link
Member Author

@pirapira pirapira Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 forgetting the TSAN warning, I can somehow explain. If I reset m_run without the lock, the worker termination can immediately happen, m_timerSemaphore might get a notification, even before this thread gets the lock. And, after this thread obtains the lock, it waits on m_timerSemaphore forever.

Copy link
Member

@gumb0 gumb0 Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no, while (m_timer) will be false in this case, so it won't try to wait on m_timerSemaphore

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These stacks I think show that in Client condition variable m_signalled is being destroyed while still being used by the m_tqReady handler.

Can you try moving m_signalled definition above m_tqReady in Client.h to change destruction order?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I would name m_timerSemaphore something like m_timerReset (it's not a semaphore, it's a condition variable anyway)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done an done. I'm not seeing TSAN warnings.

Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's merge! :shipit:

I'm somewhat worried to be honest, because it all got more complicated, but we need to proceed somehow, let's see what happens next

@pirapira
Copy link
Member Author

Turning invisible complications into visible complications...

@pirapira pirapira merged commit 40e1025 into develop Nov 29, 2017
@pirapira pirapira deleted the client-vptr branch November 29, 2017 10:17
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants