Skip to content
This repository has been archived by the owner on Oct 28, 2021. It is now read-only.

Various network-related logging changes #5605

Merged
merged 20 commits into from May 24, 2019
Merged

Various network-related logging changes #5605

merged 20 commits into from May 24, 2019

Conversation

halfalicious
Copy link
Contributor

@halfalicious halfalicious commented May 17, 2019

  • Move log messages from global thread-safe loggers to class-instance-loggers (where thread safety isn't needed because logging is done in functions called on a single thread)
  • Add a log suffix to the RLPXHandshake, Session, and EthereumPeer log messages which is comprised of either the remote node ID and endpoint or just the remote node id (to remove the need of having to manually include this information in each log message). This suffix also removes the need for the LOG_SCOPED_CONTEXT macro so I've removed it as well
  • Manually log the peer ID where appropriate in the BlockChainSync, EthereumCapability, and WarpCapability classes (since these classes no longer automatically get peer client information in their log messages due to the removal of LOG_SCOPED_CONTEXT in Session)
  • Move RLPXHandshake log messages related to receiving specific messages from a remote node from before the read (which reads the message bytes from the socket) to after the read so identifying issues related to non-responsive nodes is more intuitive
  • Remove redundant error messages from RLPXHandshake

Here's a snippet of the new logs:

DEBUG 05-23 21:07:41 p2p  net    Starting peer session with Geth/v1.7.1-stable-05101641/linux-amd64/go1.9 (protocol: V5) ##0546e7dd… capabilities: (eth,63) port: 0
DEBUG 05-23 21:07:41 p2p  net    New session for capability eth; idOffset: 16 with ##0546e7dd…@34.237.25.200:30303
TRACE 05-23 21:07:41 p2p  net    Status to (##0546e7dd…@34.237.25.200:30303)
TRACE 05-23 21:07:41 p2p  p2pcap Ping to (##0546e7dd…@34.237.25.200:30303)
TRACE 05-23 21:07:41 p2p  net    Ping to (##0546e7dd…@34.237.25.200:30303)
DEBUG 05-23 21:07:41 p2p  net    Peer connection successfully established with ##0546e7dd…@34.237.25.200:30303
TRACE 05-23 21:07:41 p2p  net    Starting RLPX handshake with ##374fd3c0…@35.198.202.233:30303
TRACE 05-23 21:07:41 p2p  rlpx   egress auth to (##374fd3c0…@35.198.202.233:30303)
TRACE 05-23 21:07:41 p2p  net    Starting RLPX handshake with ##348197fd…@47.91.239.54:30303
TRACE 05-23 21:07:41 p2p  rlpx   egress auth to (##348197fd…@47.91.239.54:30303)
DEBUG 05-23 21:07:41 p2p  net    p2p.host.nodeTable.events.nodeEntryAdded ##c8f41b59…
DEBUG 05-23 21:07:41 p2p  net    p2p.host.peers.events.peerAdded ##c8f41b59… 188.166.216.1:30303
TRACE 05-23 21:07:41 p2p  net    Attempting connection to ##c8f41b59…@188.166.216.1:30303 from ##5c1c3e78…
DEBUG 05-23 21:07:41 p2p  net    p2p.host.nodeTable.events.nodeEntryAdded ##fd6f0c38…
DEBUG 05-23 21:07:41 p2p  net    p2p.host.peers.events.peerAdded ##fd6f0c38… 31.3.23.102:30303
TRACE 05-23 21:07:41 p2p  net    Attempting connection to ##fd6f0c38…@31.3.23.102:30303 from ##5c1c3e78…
DEBUG 05-23 21:07:41 p2p  net    p2p.host.nodeTable.events.nodeEntryAdded ##c8e5b4b6…
DEBUG 05-23 21:07:41 p2p  net    p2p.host.peers.events.peerAdded ##c8e5b4b6… 101.207.224.48:40115
TRACE 05-23 21:07:41 p2p  net    Attempting connection to ##c8e5b4b6…@101.207.224.48:40115 from ##5c1c3e78…
TRACE 05-23 21:07:41 p2p  net    Starting RLPX handshake with ##9d676310…@18.197.177.150:30303
TRACE 05-23 21:07:41 p2p  rlpx   egress auth to (##9d676310…@18.197.177.150:30303)
TRACE 05-23 21:07:41 p2p  net    Received Status (16) from (##0546e7dd…@34.237.25.200:30303)
DEBUG 05-23 21:07:41 p2p  ethcap Status (from ##0546e7dd…): 63 / 4514 / #5e31ddf9…, TD: 3478195 = #f2df48a6…
DEBUG 05-23 21:07:41 p2p  sync   Peer ##0546e7dd… not suitable for sync: Network identifier mismatch. Host network id: 1, peer network id: 4514
TRACE 05-23 21:07:41 p2p  net    Disconnecting (our reason: Subprotocol reason.) from (##0546e7dd…@34.237.25.200:30303)
TRACE 05-23 21:07:41 p2p  net    Disconnect to (##0546e7dd…@34.237.25.200:30303)

@codecov-io
Copy link

codecov-io commented May 17, 2019

Codecov Report

Merging #5605 into master will decrease coverage by <.01%.
The diff coverage is 28.64%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #5605      +/-   ##
==========================================
- Coverage   62.33%   62.33%   -0.01%     
==========================================
  Files         350      350              
  Lines       29428    29441      +13     
  Branches     3322     3321       -1     
==========================================
+ Hits        18344    18351       +7     
- Misses       9886     9891       +5     
- Partials     1198     1199       +1

@halfalicious
Copy link
Contributor Author

halfalicious commented May 17, 2019

@gumb0 : I've removed the LOG_SCOPED_CONTEXT from Session, which means we get no client or node ID information in BlockChainSync logs. I think it's a good idea to include node ID (and maybe endpoint) information in the BlockChainSync logs - I know that we can get these from other channels but I think we should try to make the logs for each channel as independent as possible from other channels so we have more freedom when it comes to enabling / disabling channels for debugging purposes.

I don't think we can automatically include this information via the new boost logger suffix attribute since BlockChainSync is called to sync data from different peers, and I don't think we want to go back to using LOG_SCOPED_CONTEXT in Session either since it can result in information for the wrong node being logged (e.g. Session::~Session() -> EthereumCapability::onDisconnect() -> EthereumPeerObserver::onPeerAborting() -> BlockChainSync::onPeerAborting() -> BlockChainSync::continueSync() where continueSync() calls syncPeer() for each of the connected peers).

The only other option I can think of is to manually include the node ID in each BlockChainSync log message but this is a bit cumbersome...can you think of a better approach?

[Edit] Whatever approach we end up using will also have to be used in EthereumCapability / WarpCapability and perhaps other classes in those code paths e.g. EthereumPeerObserver.

@gumb0
Copy link
Member

gumb0 commented May 17, 2019

No ideas how to make it more elegant, but still - almost all of the code of BlockChainSync is either handling eth messages or/and sending eth messages, so it's kind of tightly coupled with EthereumPeer and EthereumCapability and maybe it's enough to log peer ID only in the beginning of the handler in EthereumCapability class, or when we send a message in EthereumPeer class; and then log peer ID only in several special places in BlockChainSync like in a special case of syncPeer (special about it being that from a handler of one peer we access other peers)

@gumb0
Copy link
Member

gumb0 commented May 17, 2019

Also in these higher-level classes (BlockChainSync, EthereumCapability, EthereumPeer) I'd probably log only Node IDs, endpoints are kind of less relevant here, than in lower-level networking.

@halfalicious halfalicious force-pushed the network-logging branch 4 times, most recently from f7abdb1 to 7269bf9 Compare May 20, 2019 02:17
@halfalicious halfalicious requested review from gumb0 and chfast and removed request for gumb0 May 20, 2019 04:27
@halfalicious halfalicious changed the title [WIP] Network logging changes Various network-related logging changes May 20, 2019
Copy link
Collaborator

@chfast chfast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks more/less ok.

@@ -90,8 +90,8 @@ class RLPXHandshake: public std::enable_shared_from_this<RLPXHandshake>
void readAckEIP8();

/// Closes connection and ends transitions.
void error();

void error(boost::system::error_code _ech = boost::system::error_code());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
void error(boost::system::error_code _ech = boost::system::error_code());
void error(boost::system::error_code _ech = {});

Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm now I'm thinking LOG_SCOPED_CONTEXT or similar thing might have been useful, at least when there're many logging statements in one function, and instead of repeating << peerID we could use LOG_SCOPED_CONTEXT once at the top of the function.

But maybe some messages would become less nice (the ones where now you output << peerID in the middle of the message), so I don't know

@@ -379,7 +380,10 @@ void BlockChainSync::requestBlocks(NodeID const& _peerID)
}
}
else
m_host.peer(_peerID).requestBlockHeaders(start, 1, 0, false);
{
LOG(m_loggerDetail) << "Requesting block headers from " << _peerID;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is logged already in EthereumPeer, do we need another message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah good point, will remove

@halfalicious
Copy link
Contributor Author

halfalicious commented May 21, 2019

Hm now I'm thinking LOG_SCOPED_CONTEXT or similar thing might have been useful, at least when there're many logging statements in one function, and instead of repeating << peerID we could use LOG_SCOPED_CONTEXT once at the top of the function.

But maybe some messages would become less nice (the ones where now you output << peerID in the middle of the message), so I don't know

@gumb0 : Yea, I thought of that but I'm not exactly sure how LOG_SCOPED_CONTEXT works so I wasn't sure if there were caveats to using it that I'm not aware of (e.g. perf overhead) which would make calling it in these functions inappropriate. Additionally, it can introduce complications e.g. logging the wrong peer ID if you call LOG_SCOPED_CONTEXT in a function which calls another function that initiates syncing with other peers. I also didn't want to spend too much time on these logging changes.

I can take a look at using it after I've wrapped up some of the higher priority work on my plate e.g. our peering vs Geth and syncing getting stuck. What do you think?

@halfalicious
Copy link
Contributor Author

halfalicious commented May 21, 2019

I'm working on debugging another issue and I noticed that BlockChainSync::syncPeer can be called from both the client thread and the network thread - I think this means that both it and any function it calls (EthereumPeer::requestBlockHeaders, BlockChainSync::requestBlocks, EthereumPeer::requestByHashes) should use the global thread-safe loggers. Note that BlockChainSync::syncPeer used instance loggers before my changes e.g.:

if (!peer.statusReceived())
{
LOG(m_loggerDetail) << "Can't sync with peer " << _peerID << " - Status not received yet.";
return;
}
if (peer.isConversing())
{
LOG(m_loggerDetail) << "Can't sync with peer " << _peerID << " - outstanding asks.";
return;
}
if (isSyncPaused())
{
LOG(m_loggerDetail) << "Can't sync with peer " << _peerID
<< " - sync state is paused. Block queue status: "
<< host().bq().status();
return;
}

@gumb0 : Should I convert these logging calls to use clog or define global thread-safe loggers for the "sync" channel (in Log.h)? I'm leaning towards the latter since clog() calls are a bit verbose, but not sure if it makes sense to define global loggers which will only be used in a single file? Or perhaps I could address this by defining them in BlockChainSync.cpp?

@gumb0
Copy link
Member

gumb0 commented May 21, 2019

I can take a look at using it after I've wrapped up some of the higher priority work on my plate e.g. our peering vs Geth and syncing getting stuck. What do you think?

Yeah it's fine to leave it like this for now.

@gumb0
Copy link
Member

gumb0 commented May 21, 2019

I'm working on debugging another issue and I noticed that BlockChainSync::syncPeer can be called from both the client thread and the network thread

Is it because of this single call?

m_bqRoomAvailable = host().bq().onRoomAvailable([this]()
{
RecursiveGuard l(x_sync);
m_state = SyncState::Blocks;
continueSync();
});

If so, can we use here m_host.capabilityHost()->postWork(...) to make the call to syncPeer happen from network thread?

Then we'll avoid the mess with loggers, and it might be a good idea for thread-safety in general.

@gumb0
Copy link
Member

gumb0 commented May 21, 2019

But if it's ever needed, it's also possible to create non-global but thread-safe loggers. For that you'd need another function like this

aleth/libdevcore/Log.h

Lines 124 to 129 in b094d0c

using Logger = boost::log::sources::severity_channel_logger<>;
inline Logger createLogger(int _severity, std::string const& _channel)
{
return Logger(
boost::log::keywords::severity = _severity, boost::log::keywords::channel = _channel);
}

but with severity_channel_logger_mt (multi-threaded)

@halfalicious
Copy link
Contributor Author

halfalicious commented May 21, 2019

I'm working on debugging another issue and I noticed that BlockChainSync::syncPeer can be called from both the client thread and the network thread

Is it because of this single call?

m_bqRoomAvailable = host().bq().onRoomAvailable([this]()
{
RecursiveGuard l(x_sync);
m_state = SyncState::Blocks;
continueSync();
});

If so, can we use here m_host.capabilityHost()->postWork(...) to make the call to syncPeer happen from network thread?

Then we'll avoid the mess with loggers, and it might be a good idea for thread-safety in general.

ost.capabilityHost()->postWork(...) to make the call to syncPeer happen from network thread?

@gumb0 : Yes, I think that's the only place where syncPeer is invoked on a non-network thread (syncPeer is called in peer data handlers i.e. onPeer* which from what I understand are always called on the network thread, or in continueSync and continueSync is called in onPeerBlockHeaders / onPeerBlockBodies and in the m_bqRoomAvailable handler)

Perhaps I should post the entire m_bqRoomAvailable handler to the network thread since that would also fix the "sync stuck" race condition? This would presumably also remove the need to take the x_sync mutex in the handler, though I think it would probably be a good idea to still acquire it in the case that we decide to call it from another thread in the future...what do you think?

@gumb0
Copy link
Member

gumb0 commented May 21, 2019

Yeah, good idea to move it all to network thread, though I think mutex will be still needed. I think other calls to syncPeer are under mutex, and I see at least BlockChainSync::status() is the function that might access from another thread some values modified inside syncPeer

@halfalicious
Copy link
Contributor Author

halfalicious commented May 22, 2019

ink mutex will be still needed. I think other calls to syncPeer are under mutex, and I see at least BlockChainSync::status() is the function that might access from another thread some values modified inside syncPeer

@gumb0 : Ah I didn't realize that BlockChainSync::status() might call it from another thread but you're right, it can be called from the Client thread - agreed then, keeping the mutex makes sense.

@halfalicious
Copy link
Contributor Author

halfalicious commented May 22, 2019

ink mutex will be still needed. I think other calls to syncPeer are under mutex, and I see at least BlockChainSync::status() is the function that might access from another thread some values modified inside syncPeer

@gumb0 : Ah I didn't realize that BlockChainSync::status() might call it from another thread but you're right, it can be called from the Client thread - agreed then, keeping the mutex makes sense.

@gumb0 I thought about this a little more and since my changes don't impact the thread safety of BlockChainSync logging (I don't convert any calls to thread-safe loggers to member loggers in functions that can be called from multiple threads) and posting the m_bqRoomAvailable handler to the network thread would affect the "sync stuck" race condition, I think I'd prefer to make these changes as a part of my "sync stuck" fix.

libethereum/EthereumPeer.h Outdated Show resolved Hide resolved
libp2p/Session.cpp Outdated Show resolved Hide resolved
halfalicious and others added 18 commits May 22, 2019 21:47
RLPxHandshake suffix is the remote node information. This enables us to make the RLPX handshake log messages more compact since we aren't manually including remote node information in each log statement.
Also add remote node information as log suffix to new Session member loggers. Note that member loggers aren't thread-safe, but Session functions are only called from the network thread so using member loggers should be ok (and is more performant).
We are no longer including the peer client information in each BlockChainSync log message (since it bloats the logs). However, this information can be useful for debugging purposes and the initial peering could have happened long in the past, so we promote the PeerSessionInfos log message from trace to debug verbosity so that this information is more available in logs for debugging purposes.
Logs are now written to new "warpcap" log channel and are logged via class instance loggers rather than via global thread-safe loggers. Instance loggers are more performant and we don't need thread-safety since logs are only written in functions executed on the network thread.

Also make some minor modifications to format of existing log messages.
Messages were logged via global thread-safe loggers - we don't need the performance overhead of thread-safe loggers since messages are logged in functions executed on the network thread. As a bonus, these messages are now also logged to the "ethcap" channel rather than the "net" channel, which makes sense since these messages are logged during execution of EthereumCapability functionality.
Logs were written to global thread-safe loggers, which aren't required (and which are less performant than their non-thread-safe counterparts) since the functions in which the logs are written are only executed on the network thread.

Also make some minor formatting modifications to some existing log messages.
Use {} initializers and remove unnecessary log message in BlockChainSync::requestBlocks
Move some log messages from error to warning verbosity, make boost log attribute configuration more compact, and use m_id.hex() rather than first writing the value to a stream.
@halfalicious
Copy link
Contributor Author

Rebased

Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several more comments. Sorry, but this PR is quite huge, I keep discovering new stuff in it.

libethereum/EthereumCapability.h Outdated Show resolved Hide resolved
libethereum/EthereumPeer.h Outdated Show resolved Hide resolved
@@ -455,12 +457,13 @@ bool WarpCapability::interpretCapabilityPacket(NodeID const& _peerID, unsigned _
}
catch (Exception const&)
{
cnetlog << "Warp Peer causing an Exception: "
<< boost::current_exception_diagnostic_information() << " " << _r;
LOG(m_loggerError) << "Warp Peer " << _peerID << " causing an exception: "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In EthreumCapability these are WARNING level now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice catch

libethereum/WarpCapability.cpp Outdated Show resolved Hide resolved
libethereum/WarpCapability.cpp Outdated Show resolved Hide resolved
libp2p/Session.cpp Outdated Show resolved Hide resolved

if (!checkRead(tlen, ec, length))
return;
else if (!m_io->authAndDecryptFrame(bytesRef(m_data.data(), tlen)))
{
cnetlog << "frame decrypt failed";
LOG(m_netLoggerError) << "Frame decrypt failed";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't log every invalid packet to ERROR

@@ -380,7 +379,9 @@ void Session::doRead()
RLP r(frame.cropped(1));
bool ok = readPacket(hProtocolId, packetType, r);
if (!ok)
cnetlog << "Couldn't interpret packet. " << RLP(r);
LOG(m_netLoggerError)
<< "Couldn't interpret " << p2pPacketTypeToString(packetType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, too, I don't think it's for ERROR

@@ -406,7 +407,9 @@ bool Session::checkRead(std::size_t _expected, boost::system::error_code _ec, st
{
// with static m_data-sized buffer this shouldn't happen unless there's a regression
// sec recommends checking anyways (instead of assert)
cnetlog << "Error reading - TCP read buffer length differs from expected frame size.";
LOG(m_netLoggerError)
<< "Error reading - TCP read buffer length differs from expected frame size ("
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also seems too low-level for ERROR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumb0 The comment above the log statement gives me the impression that this should never happen which is why I chose to log this as error (since if it does happen we want to know about it).

@@ -434,8 +437,8 @@ bool Session::canHandle(

void Session::disableCapability(std::string const& _capabilityName, std::string const& _problem)
{
cnetdetails << "DISABLE: Disabling capability '" << _capabilityName
<< "'. Reason: " << _problem;
LOG(m_netLoggerDetail) << "DISABLE: Disabling capability '" << _capabilityName
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be useful on DEBUG

@halfalicious
Copy link
Contributor Author

Several more comments. Sorry, but this PR is quite huge, I keep discovering new stuff in it.

No worries at all! I know these changes probably aren't very interesting so I really appreciate the in-depth review 😄

@halfalicious halfalicious force-pushed the network-logging branch 3 times, most recently from 4fbfa0c to 2dcbeb3 Compare May 24, 2019 04:22
Copy link
Member

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor comments, ok to merge when addressed

libp2p/RLPxHandshake.cpp Outdated Show resolved Hide resolved
libp2p/Session.cpp Show resolved Hide resolved
Changed public functions of Session to use thread-safe logger since they can potentially be called from any thread. Also demote some network error messages from error to debug and make minor modifications to some log messages.
@halfalicious halfalicious merged commit feb3c57 into master May 24, 2019
@halfalicious halfalicious deleted the network-logging branch May 24, 2019 17:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants