Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ThreadSanitizer warnings #4579

Merged
merged 10 commits into from Nov 29, 2017

Conversation

Projects
None yet
4 participants
@pirapira
Copy link
Member

commented Oct 5, 2017

This PR removes various ThreadSanitizer warnings. Each commit deals with one or two warnings.

Fixes #4544.

@@ -118,7 +118,8 @@ void Worker::terminate()
std::unique_lock<Mutex> l(x_work);
if (m_work)
{
m_state.exchange(WorkerState::Killing);
if (m_state.exchange(WorkerState::Killing) == WorkerState::Killing)

This comment has been minimized.

Copy link
@chfast

chfast Oct 6, 2017

Collaborator

Check also stopWorking() - it has similar pattern.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 6, 2017

Author Member

I added a similar change there.

@pirapira pirapira force-pushed the client-vptr branch from 0401d74 to 5698639 Oct 6, 2017

@codecov-io

This comment has been minimized.

Copy link

commented Oct 6, 2017

Codecov Report

Merging #4579 into develop will increase coverage by 5.9%.
The diff coverage is 52.17%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4579      +/-   ##
===========================================
+ Coverage    53.38%   59.28%    +5.9%     
===========================================
  Files         1611     1085     -526     
  Lines        69237    55458   -13779     
  Branches      7091     3617    -3474     
===========================================
- Hits         36963    32880    -4083     
+ Misses       31060    21486    -9574     
+ Partials      1214     1092     -122
Impacted Files Coverage Δ
libethereum/EthereumHost.h 0% <ø> (ø) ⬆️
libethereum/EthereumPeer.h 77.77% <ø> (ø) ⬆️
libethereum/Client.h 66.66% <ø> (ø) ⬆️
libp2p/Session.cpp 73.19% <0%> (-1.34%) ⬇️
libp2p/Peer.cpp 55.88% <100%> (+4.15%) ⬆️
libp2p/Peer.h 88.88% <100%> (ø) ⬆️
libp2p/Host.h 78.57% <100%> (+8.57%) ⬆️
libethereum/EthereumPeer.cpp 40% <11.76%> (+0.86%) ⬆️
libethereum/EthereumHost.cpp 7.83% <15.38%> (-1.77%) ⬇️
libp2p/Host.cpp 72% <80.48%> (+1.19%) ⬆️
... and 946 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4100691...c34498b. Read the comment docs.

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Oct 6, 2017

AppVeyor is stuck at "updateStats" test. I'll try it with ThreadSanitizer.

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Oct 6, 2017

I cannot reproduce the AppVeyor issue.

@pirapira pirapira requested a review from gumb0 Oct 6, 2017

@@ -59,6 +59,11 @@ EthashClient::EthashClient(
asEthashClient(*this);
}

EthashClient::~EthashClient()
{
terminate();

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

So it will be called twice - first in ~EthashClient, then in ~Client ?

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

Yes, which is not a problem. I'll comment about that.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

Done.

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

I think not calling terminate here shouldn't cause any problems, because no virtual functions are overriden in EthashClient
But if it fixes some TSAN warnings, then fine.

Also note that there's ClientTest class without this call, too.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

Without this change I see a warning:

WARNING: ThreadSanitizer: data race on vptr (ctor/dtor vs virtual call) (pid=21629)
  Write of size 8 at 0x7ba400000550 by main thread:
    #0 ~Client /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:82 (eth+0x697c0d)
    #1 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.h:37 (discriminator 2) (eth+0x7b4021)
    #2 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.h:37 (eth+0x7b4049)
    #3 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:76 (discriminator 1) (eth+0x83039f)
    #4 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:347 (eth+0x82e179)
    #5 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:91 (eth+0x82d596)
    #6 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x514a19)

  Previous read of size 8 at 0x7ba400000550 by thread T13:
    #0 dev::Worker::workLoop() /home/yh/src/cpp-ethereum2/libdevcore/Worker.cpp:140 (eth+0x852597)
    #1 operator() /home/yh/src/cpp-ethereum2/libdevcore/Worker.cpp:62 (eth+0x852afd)
    #2 void std::_Bind_simple<dev::Worker::startWorking()::$_0 ()>::_M_invoke<>(std::_Index_tuple<>) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1390 (discriminator 2) (eth+0x852989)
    #3 std::_Bind_simple<dev::Worker::startWorking()::$_0 ()>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1380 (eth+0x852939)
    #4 std::thread::_State_impl<std::_Bind_simple<dev::Worker::startWorking()::$_0 ()> >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:197 (eth+0x8527fd)
    #5 std::error_code::default_error_condition() const ??:? (libstdc++.so.6+0xbb83e)

Myaybe the destructor was forwarded or something.

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

Ok I see, vptr is probaly being modified somewhere between ~EthashClient & ~Client body and worker thread can access it at this moment, that's the problem.
Let's leave terminate() here. And add it to ClientTest

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

Done.

@@ -198,6 +198,9 @@ class Client: public ClientBase, protected Worker
virtual Block block(h256 const& _block) const override;
using ClientBase::block;

/// should be called after the constructor of the most derived class finishes.
void afterConstructor() { startWorking(); };

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

I would call this method startWorking()

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

OK, I'll try that.

@@ -395,6 +412,8 @@ EthereumHost::EthereumHost(BlockChain const& _ch, OverlayDB const& _db, Transact
EthereumHost::~EthereumHost()
{
terminate();
m_peerObserver->disableSync(); // m_peerObserver contains a reference to BlockchainSync, so that should be disabled first.

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

Can we instead just do m_peerObserver.reset() ?

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

Ah I see, probably not, because peers still have shared_ptr to observer.

But then - can we make m_sync member of EthereumPeerObserver to be shared_ptr? This should solve the problem

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

I'll try that.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

Done.

@@ -188,6 +188,8 @@ class ClientBase: public Interface
std::unordered_map<h256, h256s> m_specialFilters = std::unordered_map<h256, std::vector<h256>>{{PendingChangedFilter, {}}, {ChainChangedFilter, {}}};
///< The dictionary of special filters and their additional data
std::map<unsigned, ClientWatch> m_watches; ///< Each and every watch - these reference a filter.
std::condition_variable m_signalled;

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

How does this move help?

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

ClientBase::m_tq uses this condition variable.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

This was the warning:

  Write of size 8 at 0x7ba400004df8 by main thread:
    #0 pthread_cond_destroy ??:? (eth+0x48d790) // cond destroy.
    #1 ~Client /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:85 (discriminator 3) (eth+0x695d80)
    #2 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.cpp:65 (discriminator 4) (eth+0x7b112d)
    #3 ~EthashClient /home/yh/src/cpp-ethereum2/libethashseal/EthashClient.cpp:63 (eth+0x7b11b9)
    #4 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:76 (discriminator 1) (eth+0x82e62f)
    #5 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/bits/unique_ptr.h:347 (eth+0x82c3c9)
    #6 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:91 (eth+0x82b7e6)
    #7 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x512b49)

  Previous read of size 8 at 0x7ba400004df8 by thread T6:
    #0 pthread_cond_broadcast ??:? (eth+0x4869fc)
    #1 std::condition_variable::notify_all() ??:? (libstdc++.so.6+0xb5848)
    #2 operator() /home/yh/src/cpp-ethereum2/libethereum/Client.cpp:102 (eth+0x69bb21)
    #3 std::_Function_handler<void (), dev::eth::Client::init(dev::p2p::Host*, boost::filesystem::path const&, dev::WithExisting, boost::multiprecision::number<boost::multiprecision::backends::cpp_int_backend<256u, 256u, (boost::multiprecision::cpp_integer_type)0, (boost::multiprecision::cpp_int_check_type)0, void>, (boost::multiprecision::expression_template_option)0>)::$_5>::_M_invoke(std::_Any_data const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1731 (discriminator 1) (eth+0x69b9f1)
    #4 std::function<void ()>::operator()() const /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:2127 (eth+0x63dc03)
    #5 dev::eth::Signal<>::HandlerAux::fire() /home/yh/src/cpp-ethereum2/libdevcore/../libethcore/Common.h:164 (eth+0x63c78d)
    #6 dev::eth::Signal<>::operator()() /home/yh/src/cpp-ethereum2/libdevcore/../libethcore/Common.h:193 (eth+0x63096e)
    #7 dev::eth::TransactionQueue::manageImport_WITH_LOCK(dev::FixedHash<32u> const&, dev::eth::Transaction const&) /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:171 (eth+0x74dc9a)
    #8 dev::eth::TransactionQueue::import(dev::eth::Transaction const&, dev::eth::IfDropped) /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:99 (eth+0x74d734)
    #9 dev::eth::TransactionQueue::verifierBody() /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:396 (eth+0x74f733)
    #10 operator() /home/yh/src/cpp-ethereum2/libethereum/TransactionQueue.cpp:45 (eth+0x750181)
    #11 void std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()>::_M_invoke<>(std::_Index_tuple<>) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1390 (discriminator 2) (eth+0x7500b9)
    #12 std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()>::operator()() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/functional:1380 (eth+0x750069)
    #13 std::thread::_State_impl<std::_Bind_simple<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::$_4 ()> >::_M_run() /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:197 (eth+0x74ff2d)
    #14 std::error_code::default_error_condition() const ??:? (libstdc++.so.6+0xbb83e)

  As if synchronized via sleep:
    #0 nanosleep ??:? (eth+0x488457)
    #1 void std::this_thread::sleep_for<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/6.3.0/../../../../include/c++/6.3.0/thread:323 (discriminator 1) (eth+0x51926a)
    #2 dev::p2p::Host::stop() /home/yh/src/cpp-ethereum2/libp2p/Host.cpp:153 (discriminator 1) (eth+0x8cdf31)
    #3 ~WebThreeDirect /home/yh/src/cpp-ethereum2/libwebthree/WebThree.cpp:90 (eth+0x82b7d5)
    #4 main /home/yh/src/cpp-ethereum2/eth/main.cpp:1262 (discriminator 15) (eth+0x512b49)

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 9, 2017

Member

I see, I would actually prefer to move m_tq into Client (so that it's close to m_bq & m_bc and all transaction queue handling logic is in one class)

You'd need something like protected virtual TransactionQueue& tq() = 0; in the ClientBase to make it work

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

By @gumb0 's suggestion, I'll try to move all these into Client.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 9, 2017

Author Member

When I remove this change, I don't see the error anymore. Maybe the other fixes already killed this warning. So I just remove this commit.

@pirapira pirapira force-pushed the client-vptr branch from 5698639 to f01dbac Oct 9, 2017

pirapira added a commit that referenced this pull request Oct 9, 2017

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Oct 9, 2017

Took @gumb0's suggestions and still saw no ThreadSanitizer warnings.

@pirapira pirapira force-pushed the client-vptr branch from f01dbac to ae6f28a Oct 9, 2017

pirapira added a commit that referenced this pull request Oct 9, 2017

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Oct 10, 2017

CMake Error: CMAKE_C_COMPILER not set, after EnableLanguage
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage

The problem persists after a restart.

@@ -46,7 +47,7 @@ template <class S> class IpcServerBase: public jsonrpc::AbstractServerConnector
void GenerateResponse(S _connection);

protected:
bool m_running = false;
std::atomic<bool> m_running{false};

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 10, 2017

Member

Let's use then std::atomic::exchange() in IpcServerBase::StartListening & IpcServerBase::StopListening instead of if+assignment

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 10, 2017

Author Member

That makes sense. Will do.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 10, 2017

Author Member

Done.

{
bool peerTypeIsRequired{};
{
RecursiveGuard l(x_sessions);

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 10, 2017

Member

this looks weird... toConnect is local list of shared_ptrs, do you try to protect Peer members with a mutex for sessions?

Maybe let's try to make Node::peerType atomic, will it help?

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 10, 2017

Author Member

Yes, that worked.

@@ -318,6 +318,8 @@ class Host: public Worker
/// Peers we try to connect regardless of p2p network.
std::set<NodeID> m_requiredPeers;
Mutex x_requiredPeers;
/// returns true if a member of m_requiredPeers
bool isRequiredPeer(NodeID const&); // cannot be const because of Mutex

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 10, 2017

Member

the usual thing to do in this case is to make mutex mutable and leave the method const

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 10, 2017

Author Member

Done.

@pirapira pirapira force-pushed the client-vptr branch from ae6f28a to efb748d Oct 10, 2017

pirapira added a commit that referenced this pull request Oct 10, 2017

@pirapira pirapira force-pushed the client-vptr branch from efb748d to 58c714d Oct 10, 2017

pirapira added a commit that referenced this pull request Oct 10, 2017

pirapira added a commit that referenced this pull request Oct 10, 2017

@@ -164,7 +163,7 @@ class EthereumPeerObserver: public EthereumPeerObserverFace
}

private:
BlockChainSync& m_sync;
shared_ptr<BlockChainSync> m_sync;

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 11, 2017

Member

I'm afraid we have the same problem with m_syncMutex below. If EthereumHost gets destroyed before EthereumPeerObserver mutex is destroyed with it... Not sure yet what to do with it

m_tq is destroyed with the Client, also not sure if EthereumPeers can outlive Client

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 12, 2017

Author Member

(for myself) try to remove m_syncMutex.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 12, 2017

Author Member

Somehow I'm making other classes hold m_tq as a weak pointer.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 12, 2017

Author Member

And I was able to remove m_syncMutex and still seeing no warnings from ThreadSanitizer.

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 12, 2017

Author Member

About m_tq, I decided to use shared/weak pointers.

@@ -155,6 +155,9 @@ void Host::stop()
// stop worker thread
if (isWorking())
stopWorking();

// This destroyes EthereumHost objects. EthereumHost objects contain a const reference to Client's, so they need to disappear first.

This comment has been minimized.

Copy link
@gumb0

gumb0 Oct 11, 2017

Member

What did you mean by this? I think EtherumHost has references to member of Client, but not to Client itself?

This comment has been minimized.

Copy link
@pirapira

pirapira Oct 12, 2017

Author Member

Updated the comment.

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Oct 12, 2017

On a 8 core machine, I see more TSAN errors.

@pirapira pirapira changed the title Remove ThreadSanitizer warnings [wip] Remove ThreadSanitizer warnings Oct 12, 2017

@pirapira pirapira force-pushed the client-vptr branch from 58c714d to b0326b3 Oct 12, 2017

@pirapira pirapira force-pushed the client-vptr branch from 4665473 to cd0beb3 Oct 23, 2017

pirapira added a commit that referenced this pull request Oct 23, 2017

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Oct 24, 2017

I'm again seeing some new TSAN warnings, but that looks like what #4610 is about.

@pirapira pirapira force-pushed the client-vptr branch from 285f222 to b7dfc77 Oct 25, 2017

pirapira added a commit that referenced this pull request Oct 25, 2017

pirapira added a commit that referenced this pull request Oct 25, 2017

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Oct 27, 2017

Again segfault in rndStateTest.

pirapira added some commits Oct 9, 2017

Protect Peer::m_score and m_rating under atomic
ThreadSanitizer warned:

WARNING: ThreadSanitizer: data race (pid=12735)
  Read of size 1 at 0x7b3000006b04 by main thread (mutexes: write M1110):
    #0 memcpy ??:? (eth+0x4977d7)
    #1 Peer /home/yh/src/cpp-ethereum/libdevcore/../libp2p/Peer.h:52 (eth+0x833846)
    #2 void __gnu_cxx::new_allocator<std::_List_node<dev::p2p::Peer> >::construct<dev::p2p::Peer, dev::p2p::Peer const&>(dev::p2p::Peer*, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/ext/new_allocator.h:136 (eth+0x908718)
    #3 void std::allocator_traits<std::allocator<std::_List_node<dev::p2p::Peer> > >::construct<dev::p2p::Peer, dev::p2p::Peer const&>(std::allocator<std::_List_node<dev::p2p::Peer> >&, dev::p2p::Peer*, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/alloc_traits.h:475 (eth+0x908551)
    #4 std::_List_node<dev::p2p::Peer>* std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::_M_create_node<dev::p2p::Peer const&>(dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:575 (eth+0x908400)
    #5 void std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::_M_insert<dev::p2p::Peer const&>(std::_List_iterator<dev::p2p::Peer>, dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:1801 (eth+0x90834e)
    #6 std::__cxx11::list<dev::p2p::Peer, std::allocator<dev::p2p::Peer> >::push_back(dev::p2p::Peer const&) /usr/bin/../lib/gcc/x86_64-linux-gnu/7.2.0/../../../../include/c++/7.2.0/bits/stl_list.h:1118
(eth+0x8e40eb)

  Previous write of size 4 at 0x7b3000006b04 by thread T30:
    #0 dev::p2p::Session::addRating(int)
/home/yh/src/cpp-ethereum/libp2p/Session.cpp:89 (eth+0x951d86)
    #1 dev::p2p::Capability::addRating(int)
/home/yh/src/cpp-ethereum/libp2p/Capability.cpp:59 (eth+0x8c1122)
Protect Host::m_nodeTable with a mutex
For removing the following warning from ThreadSanitizer
==================
WARNING: ThreadSanitizer: data race (pid=68621)
  Write of size 8 at 0x7ffc9f6af3b8 by thread T30:
    #0 std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2>::operator=(std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2> const&) /usr/include/c++/7/bits/shared_ptr_base.h:1034 (eth+0x0000005805bb)
    #1 std::shared_ptr<dev::p2p::NodeTable>::operator=(std::shared_ptr<dev::p2p::NodeTable> const&) /usr/include/c++/7/bits/shared_ptr.h:93 (eth+0x0000005805bb)
    #2 dev::p2p::Host::startedWorking() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:761 (eth+0x0000005805bb)
    #3 operator() /home/yh/src/cpp-ethereum-vptr/libdevcore/Worker.cpp:61 (eth+0x0000004ab343)
    #4 __invoke_impl<void, dev::Worker::startWorking()::<lambda()> > /usr/include/c++/7/bits/invoke.h:60 (eth+0x0000004ab6a7)
    #5 __invoke<dev::Worker::startWorking()::<lambda()> > /usr/include/c++/7/bits/invoke.h:95 (eth+0x0000004ab6a7)
    #6 _M_invoke<0> /usr/include/c++/7/thread:234 (eth+0x0000004ab6a7)
    #7 operator() /usr/include/c++/7/thread:243 (eth+0x0000004ab6a7)
    #8 _M_run /usr/include/c++/7/thread:186 (eth+0x0000004ab6a7)
    #9 <null> <null> (libstdc++.so.6+0x0000000bc0fe)

  Previous read of size 8 at 0x7ffc9f6af3b8 by main thread (mutexes:
write M1111):
    #0 std::__shared_ptr<dev::p2p::NodeTable, (__gnu_cxx::_Lock_policy)2>::operator bool() const /usr/include/c++/7/bits/shared_ptr_base.h:1261 (eth+0x0000005716f8)
    #1 dev::p2p::Host::haveNetwork() const /home/yh/src/cpp-ethereum-vptr/libp2p/Host.h:224 (eth+0x0000005716f8)
    #2 dev::p2p::Host::start() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:129 (eth+0x0000005716f8)
    #3 dev::WebThreeDirect::startNetwork() /home/yh/src/cpp-ethereum-vptr/libdevcore/../libwebthree/WebThree.h:204 (eth+0x0000000bf0d9)
    #4 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:1117 (eth+0x0000000bf0d9)

atomic_load and atomic_store work on most tool chains but not on gcc 4.9
https://stackoverflow.com/questions/34205880/why-does-atomic-load-with-a-shared-ptr-not-compile-with-gcc-4-9
Protect m_timer under x_runTimer
For avoiding this warning from ThreadSanitizer:
==================
WARNING: ThreadSanitizer: data race (pid=72207)
  Write of size 8 at 0x7ffca08dd960 by thread T30:
    #0 std::enable_if<std::__and_<std::__not_<std::__is_tuple_like<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*> >, std::is_move_constructible<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*>, std::is_move_assignable<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*> >::value, void>::type std::swap<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*>(boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*&, boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*&) /usr/include/c++/7/bits/move.h:199 (eth+0x00000057b9f1)
    #1 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::reset(boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >*) /usr/include/c++/7/bits/unique_ptr.h:374 (eth+0x00000057b9f1)
    #2 dev::p2p::Host::run(boost::system::error_code const&) /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:665 (eth+0x00000057b9f1)
    #3 operator() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:720 (eth+0x00000057c426)
    #4 operator() /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/bind_handler.hpp:47 (eth+0x00000057c426)
    #5 asio_handler_invoke<boost::asio::detail::binder1<dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)>, boost::system::error_code> > /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/handler_invoke_hook.hpp:69 (eth+0x00000057c426)
    #6 invoke<boost::asio::detail::binder1<dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)>, boost::system::error_code>, dev::p2p::Host::run(const boost::system::error_code&)::<lambda(const boost::system::error_code&)> > /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/handler_invoke_helpers.hpp:37 (eth+0x00000057c426) #7 do_complete /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/wait_handler.hpp:70
(eth+0x00000057c426)
    #8 boost::asio::detail::task_io_service_operation::complete(boost::asio::detail::task_io_service&, boost::system::error_code const&, unsigned long) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/task_io_service_operation.hpp:38 (eth+0x00000058c145)
    #9 boost::asio::detail::task_io_service::do_run_one(boost::asio::detail::scoped_lock<boost::asio::detail::posix_mutex>&, boost::asio::detail::task_io_service_thread_info&, boost::system::error_code const&) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/impl/task_io_service.ipp:372 (eth+0x00000058c145)
    #10 boost::asio::detail::task_io_service::run(boost::system::error_code&) /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/detail/impl/task_io_service.ipp:149 (eth+0x00000058c145)
    #11 boost::asio::io_service::run() /home/yh/.hunter/_Base/b96750b/a5f7296/9d3cb74/Install/include/boost/asio/impl/io_service.ipp:59 (eth+0x00000057168e)
    #12 dev::p2p::Host::doWork() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:774 (eth+0x00000057168e)
    #13 dev::Worker::workLoop() /home/yh/src/cpp-ethereum-vptr/libdevcore/Worker.cpp:140 (eth+0x0000004a9f64)

  Previous read of size 8 at 0x7ffca08dd960 by main thread:
    #0 std::__uniq_ptr_impl<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::_M_ptr() const /usr/include/c++/7/bits/unique_ptr.h:147 (eth+0x0000005701cd)
    #1 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::get() const /usr/include/c++/7/bits/unique_ptr.h:337 (eth+0x0000005701cd)
    #2 std::unique_ptr<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > >, std::default_delete<boost::asio::basic_deadline_timer<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime>, boost::asio::deadline_timer_service<boost::posix_time::ptime, boost::asio::time_traits<boost::posix_time::ptime> > > > >::operator bool() const /usr/include/c++/7/bits/unique_ptr.h:351 (eth+0x0000005701cd)
    #3 dev::p2p::Host::stop() /home/yh/src/cpp-ethereum-vptr/libp2p/Host.cpp:152 (eth+0x0000005701cd)
    #4 dev::WebThreeDirect::~WebThreeDirect() /home/yh/src/cpp-ethereum-vptr/libwebthree/WebThree.cpp:90 (eth+0x000000456f75)
    #5 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:937 (eth+0x0000000c0ff2)

  Location is stack of main thread.

@pirapira pirapira force-pushed the client-vptr branch from b7dfc77 to bc76378 Oct 29, 2017

@gumb0 gumb0 self-requested a review Nov 23, 2017

@gumb0

This comment has been minimized.

Copy link
Member

commented Nov 24, 2017

I tried to simplify a little the code accessing m_nodeTable by extracting most of the operations with it to separate methods. @pirapira please verify that TSAN is still ok with it

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Nov 24, 2017

@gumb0 I'm on it.

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Nov 24, 2017

Yes, TSAN seems happy.

For the record, this is the command I run

eth/eth --ropsten --kill --pin --peerset "required:20c9ad97c081d63397d7b685a412227a40e23c8bdc6688c6f37e97cfbc22d2b4d1db1510d8f61e6a8866ad7f0e17c02b14182d37ea7c3c8b9c2683aeb6b733a1@52.169.14.227:30303 required:6ce05930c72abc632c58e2e4324f7c7ea478cec0ed4fa2528982cf34483094e9cbc9216e7aa349691242576d552a2a56aaeae426c5303ded677ce455ba1acd9d@13.84.180.240:30303" --listen 20000

and then send a SIGINT.

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Nov 27, 2017

@gumb0 do you want to make more changes?

@gumb0

This comment has been minimized.

Copy link
Member

commented Nov 27, 2017

Not sure yet, I will try to finish review today.

unique_lock<mutex> l(x_runTimer);
// ignore if already stopped/stopping, at the same time,
// signal run() to prepare for shutdown and reset m_timer
if (!m_run.exchange(false))

This comment has been minimized.

Copy link
@gumb0

gumb0 Nov 27, 2017

Member

Not sure why do you do this under x_runTimer lock?

This comment has been minimized.

Copy link
@pirapira

pirapira Nov 27, 2017

Author Member

When I move this out above the block, I see

==================
WARNING: ThreadSanitizer: data race (pid=65004)
  Write of size 8 at 0x7ba400004df8 by main thread:
    #0 pthread_cond_destroy <null> (libtsan.so.0+0x000000029892)
    #1 dev::eth::Client::~Client() /home/yh/src/cpp-ethereum-vptr/libethereum/Client.cpp:82 (eth+0x00000022094c)
    #2 dev::eth::EthashClient::~EthashClient() /home/yh/src/cpp-ethereum-vptr/libethashseal/EthashClient.cpp:62 (eth+0x000000389795)
    #3 dev::eth::EthashClient::~EthashClient() /home/yh/src/cpp-ethereum-vptr/libethashseal/EthashClient.cpp:65 (eth+0x0000003897e8)
    #4 std::default_delete<dev::eth::Client>::operator()(dev::eth::Client*) const /usr/include/c++/7/bits/unique_ptr.h:78 (eth+0x0000004551c3)
    #5 std::unique_ptr<dev::eth::Client, std::default_delete<dev::eth::Client> >::reset(dev::eth::Client*) /usr/include/c++/7/bits/unique_ptr.h:376 (eth+0x0000004551c3)
    #6 dev::WebThreeDirect::~WebThreeDirect() /home/yh/src/cpp-ethereum-vptr/libwebthree/WebThree.cpp:91 (eth+0x0000004551c3)
    #7 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:937 (eth+0x0000000bf8f2)

  Previous read of size 8 at 0x7ba400004df8 by thread T28:
    #0 pthread_cond_broadcast <null> (libtsan.so.0+0x00000002975f)
    #1 std::condition_variable::notify_all() <null> (libstdc++.so.6+0x0000000b5df8)
    #2 std::function<void ()>::operator()() const /usr/include/c++/7/bits/std_function.h:706 (eth+0x0000001cf4f3)
    #3 dev::eth::Signal<>::operator()() <null> (eth+0x0000001d6612)
    #4 dev::eth::TransactionQueue::manageImport_WITH_LOCK(dev::FixedHash<32u> const&, dev::eth::Transaction const&) /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:171 (eth+0x00000031b951)
    #5 dev::eth::TransactionQueue::import(dev::eth::Transaction const&, dev::eth::IfDropped) /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:99 (eth+0x00000031bee0)
    #6 dev::eth::TransactionQueue::verifierBody() /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:396 (eth+0x00000031c49c)
    #7 operator() /home/yh/src/cpp-ethereum-vptr/libethereum/TransactionQueue.cpp:45 (eth+0x00000031ccac)
    #8 __invoke_impl<void, dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::<lambda()> > /usr/include/c++/7/bits/invoke.h:60 (eth+0x00000031cd57)
    #9 __invoke<dev::eth::TransactionQueue::TransactionQueue(unsigned int, unsigned int)::<lambda()> > /usr/include/c++/7/bits/invoke.h:95 (eth+0x00000031cd57)
    #10 _M_invoke<0> /usr/include/c++/7/thread:234 (eth+0x00000031cd57)
    #11 operator() /usr/include/c++/7/thread:243 (eth+0x00000031cd57)
    #12 _M_run /usr/include/c++/7/thread:186 (eth+0x00000031cd57)
    #13 <null> <null> (libstdc++.so.6+0x0000000bc0fe)

  As if synchronized via sleep:
    #0 nanosleep <null> (libtsan.so.0+0x00000004c5a1)
    #1 void std::this_thread::sleep_for<long, std::ratio<1l, 1000l> >(std::chrono::duration<long, std::ratio<1l, 1000l> > const&) /usr/include/c++/7/thread:376 (eth+0x0000000b723c)
    #2 stopSealingAfterXBlocks(dev::eth::Client*, unsigned int, unsigned int&) /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:274 (eth+0x0000000b723c)
    #3 main /home/yh/src/cpp-ethereum-vptr/eth/main.cpp:1247 (eth+0x0000000bf614)

I'm confused now because Host.cpp doesn't appear in the stacks.

This comment has been minimized.

Copy link
@pirapira

pirapira Nov 27, 2017

Author Member

@gumb0 forgetting the TSAN warning, I can somehow explain. If I reset m_run without the lock, the worker termination can immediately happen, m_timerSemaphore might get a notification, even before this thread gets the lock. And, after this thread obtains the lock, it waits on m_timerSemaphore forever.

This comment has been minimized.

Copy link
@gumb0

gumb0 Nov 27, 2017

Member

But no, while (m_timer) will be false in this case, so it won't try to wait on m_timerSemaphore

This comment has been minimized.

Copy link
@gumb0

gumb0 Nov 27, 2017

Member

These stacks I think show that in Client condition variable m_signalled is being destroyed while still being used by the m_tqReady handler.

Can you try moving m_signalled definition above m_tqReady in Client.h to change destruction order?

This comment has been minimized.

Copy link
@gumb0

gumb0 Nov 27, 2017

Member

Also I would name m_timerSemaphore something like m_timerReset (it's not a semaphore, it's a condition variable anyway)

This comment has been minimized.

Copy link
@pirapira

pirapira Nov 27, 2017

Author Member

Done an done. I'm not seeing TSAN warnings.

@gumb0

gumb0 approved these changes Nov 27, 2017

Copy link
Member

left a comment

Ok, let's merge! :shipit:

I'm somewhat worried to be honest, because it all got more complicated, but we need to proceed somehow, let's see what happens next

@pirapira

This comment has been minimized.

Copy link
Member Author

commented Nov 27, 2017

Turning invisible complications into visible complications...

@pirapira pirapira merged commit 40e1025 into develop Nov 29, 2017

9 checks passed

ci/circleci: Linux-Clang5 Your tests passed on CircleCI!
Details
ci/circleci: Linux-GCC6-Debug Your tests passed on CircleCI!
Details
ci/circleci: macOS-XCode9 Your tests passed on CircleCI!
Details
codecov/patch 52.17% of diff hit (target 50%)
Details
codecov/project 59.28% (+5.9%) compared to 4100691
Details
codecov/project/app 55.25% (+6.13%) compared to 4100691
Details
codecov/project/tests 82.9% (+0.02%) compared to 4100691
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@pirapira pirapira deleted the client-vptr branch Nov 29, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.