expose individual peer information tracked by overlay via the peers endpoint #3041

MonsieurNicolas · 2021-05-10T17:50:00Z

Right now core expose aggregate overlay metrics (exported via the metrics endpoint) and computes per peer metrics (not exposed as metrics).

We should export those as part of the peers endpoint (maybe with a new flag if it's too verbose).

This would allow to get a sense of certain things like number of bytes transferred etc on a per peer basis when debugging.

From a quick look at the code, it looks like many metrics are not computed on a per peer basis (because this information is not useful in the "network survey" context), but should be exposed as part of this work.
For example, getOverlayMetrics().mMessageDrop doesn't have a per peer equivalent.

On that note, we may want to use some template helper of sorts to update both metric types at once instead of manually keeping metrics them in sync with code like

    getOverlayMetrics().mByteRead.Mark(byteCount);
    mPeerMetrics.mByteRead += byteCount;

The text was updated successfully, but these errors were encountered:

hidenori-shinohara · 2022-04-05T16:00:37Z

As Nicolas mentioned above, we have two classes for metrics:

OverlayMetrics => aggregated over peers
PeerMetrics => Per peer metrics.

PeerMetrics is almost a subset of OverlayMetrics. The two major differences are:

PeerMetrics keeps track of mConnectedTime and OverlayMetrics doesn't. This makes sense and I think we should keep this as is.
PeerMetrics keeps track of message counts on top of byte counts for some metrics. Specifically, it's the following 4.

        uint64_t mUniqueFloodMessageRecv;
        uint64_t mDuplicateFloodMessageRecv;
        uint64_t mUniqueFetchMessageRecv;
        uint64_t mDuplicateFetchMessageRecv;

As far as I know, we are not going to calculate and expose all of these metrics for each peer.

Question 1: Which aggregated metrics in OverlayMetrics do we want to expose for each peer?
Question 2: What do we want to do for the 4 metrics that we only keep track of for each peer and don’t aggregate?
- We can either
  - Option 1: Keep them as is (only do per-peer)
  - Option 2: Keep track of them for each peer AND aggregate by adding them to OverlayMetrics
  - Option 3: Not track of them at all.
- Option 3 seems bad since we’ve added those metrics for a reason. I’m not sure about Option 1 or 2.

I think these will change the approach (e.g., if we add many metrics to PeerMetrics, maybe we should stop using PeerMetrics as there'd be a lot of duplicates?)

The following is the list of metrics in OverlayMetrics and whether we already keep track of them for each peer.

Aggregated metrics tracked in `PeerMetrics`

mMessageRead
mMessageWrite
mByteRead
mByteWrite
mUniqueFloodBytesRecv ← PeersMetrics also counts the number of messages
mDuplicateFloodBytesRecv← PeersMetrics also counts the number of messages
mUniqueFetchBytesRecv← PeersMetrics also counts the number of messages
mDuplicateFetchBytesRecv← PeersMetrics also counts the number of messages

Aggregated metrics not tracked in `PeerMetrics`

Meter
- mMessageDrop
- mAsyncRead
- mAsyncWrite
- mErrorRead
- mErrorWrite
- mTimeoutIdle
- mTimeoutStraggler
- mSendErrorMeter
- mSendHelloMeter
- mSendAuthMeter
- mSendDontHaveMeter
- mSendGetPeersMeter
- mSendPeersMeter
- mSendGetTxSetMeter
- mSendTxSetMeter
- mSendTransactionMeter
- mSendGetSCPQuorumSetMeter
- mSendSCPQuorumSetMeter
- mSendSCPMessageSetMeter
- mSendGetSCPStateMeter
- mSendSurveyRequestMeter
- mSendSurveyResponseMeter
- mSendSendMoreMeter
- mMessagesBroadcast
  - The number of messages broadcast. We broadcast a message to the peers that haven’t sent us the message.
  - I believe it’s possible to turn that into a peer-based metric. ⇒ The number of messages that we have broadcast for each peer.
  - The coding might be slightly more complex since we pick the peers in FloodGate.cpp.
  - We should be able to do so by getting the overlay manager from mApp.
Timer
- mConnectionLatencyTimer
- mRecvErrorTimer
- mRecvHelloTimer
- mRecvAuthTimer
- mRecvDontHaveTimer
- mRecvGetPeersTimer
- mRecvPeersTimer
- mRecvGetTxSetTimer
- mRecvTxSetTimer
- mRecvTransactionTimer
- mRecvGetSCPQuorumSetTimer
- mRecvSCPQuorumSetTimer
- mRecvSCPMessageTimer
- mRecvGetSCPStateTimer
- mRecvSendMoreTimer
- mRecvSCPPrepareTimer
- mRecvSCPConfirmTimer
- mRecvSCPNominateTimer
- mRecvSCPExternalizeTimer
- mRecvSurveyRequestTimer
- mRecvSurveyResponseTimer
- mMessageDelayInWriteQueueTimer
- mMessageDelayInAsyncWriteTimer
- mOutboundQueueDelaySCP
- mOutboundQueueDelayTxs

Non-aggregated metrics in `OverlayMetrics`

mItemFetcherNextPeer → Count the number of times we switched to a different peer.
mPendingPeersSize → It doesn’t really make sense to keep track of this for each peer.
mAuthenticatedPeersSize → It doesn’t make sense to do this for each peer.
mFlowControlPercent

@marta-lokhova What do you think?

marta-lokhova · 2022-04-05T20:20:33Z

What do we want to do for the 4 metrics that we only keep track of for each peer and don’t aggregate?

Not sure I understand, why do we need to change this? The issue is around exposing this information in the peers endpoint. We don't need to rework the metrics, we just need to identify what we need to track. We shouldn't remove any of the PeerMetrics, because they are used in the survey.

I think these will change the approach (e.g., if we add many metrics to PeerMetrics, maybe we should stop using PeerMetrics as there'd be a lot of duplicates?)

I'm not sure I understand what you mean. Why will there be duplicates?

Regarding what information we need to expose in the peers endpoint, I'd add:

async read/write metrics, as well as the write queue delay (see overlay.delay.async-write and overlay.delay.write-queue)
flow control metrics (see overlay.outbound-queue.*)
message drop rate
I don't think we need to expose metrics per individual message type, as it's really verbose and not that helpful. We just want to give a general idea about how the connection is doing.

hidenori-shinohara · 2022-04-05T20:29:01Z

I'll work on adding those!

why do we need to change this?

We don't. I was just wondering!

Why will there be duplicates?

The PeerMetrics struct and the OverlayMetrics struct have a lot of members with the exact same. If we continued to add more members of the same name, I figured that it might make sense to consolidate those two structs.

marta-lokhova · 2022-04-05T23:14:21Z

The PeerMetrics struct and the OverlayMetrics struct have a lot of members with the exact same. If we continued to add more members of the same name, I figured that it might make sense to consolidate those two structs.

I don't think we should consolidate those, as they track different things (per peer internal counters that are only available on demand vs aggregate metrics that are stored in the global registry)

MonsieurNicolas added the enhancement label May 10, 2021

MonsieurNicolas added this to To do in v19.0.0 via automation Mar 29, 2022

MonsieurNicolas assigned hidenori-shinohara Mar 29, 2022

MonsieurNicolas mentioned this issue Mar 29, 2022

transaction queues: pull mode #3384

Closed

5 tasks

hidenori-shinohara mentioned this issue Apr 6, 2022

Calculate and expose more individual peer info through peers endpoint #3393

Merged

6 tasks

MonsieurNicolas moved this from To do to In progress in v19.0.0 Apr 14, 2022

latobarita closed this as completed in #3393 Apr 16, 2022

v19.0.0 automation moved this from In progress to Done Apr 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

expose individual peer information tracked by overlay via the peers endpoint #3041

expose individual peer information tracked by overlay via the peers endpoint #3041

MonsieurNicolas commented May 10, 2021

hidenori-shinohara commented Apr 5, 2022 •

edited

marta-lokhova commented Apr 5, 2022

hidenori-shinohara commented Apr 5, 2022 •

edited

marta-lokhova commented Apr 5, 2022

expose individual peer information tracked by overlay via the peers endpoint #3041

expose individual peer information tracked by overlay via the peers endpoint #3041

Comments

MonsieurNicolas commented May 10, 2021

hidenori-shinohara commented Apr 5, 2022 • edited

Aggregated metrics tracked in PeerMetrics

Aggregated metrics not tracked in PeerMetrics

Non-aggregated metrics in OverlayMetrics

marta-lokhova commented Apr 5, 2022

hidenori-shinohara commented Apr 5, 2022 • edited

marta-lokhova commented Apr 5, 2022

hidenori-shinohara commented Apr 5, 2022 •

edited

Aggregated metrics tracked in `PeerMetrics`

Aggregated metrics not tracked in `PeerMetrics`

Non-aggregated metrics in `OverlayMetrics`

hidenori-shinohara commented Apr 5, 2022 •

edited