Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose individual peer information tracked by overlay via the peers endpoint #3041

Closed
MonsieurNicolas opened this issue May 10, 2021 · 4 comments · Fixed by #3393
Closed
Assignees
Projects

Comments

@MonsieurNicolas
Copy link
Contributor

Right now core expose aggregate overlay metrics (exported via the metrics endpoint) and computes per peer metrics (not exposed as metrics).

We should export those as part of the peers endpoint (maybe with a new flag if it's too verbose).

This would allow to get a sense of certain things like number of bytes transferred etc on a per peer basis when debugging.

From a quick look at the code, it looks like many metrics are not computed on a per peer basis (because this information is not useful in the "network survey" context), but should be exposed as part of this work.
For example, getOverlayMetrics().mMessageDrop doesn't have a per peer equivalent.

On that note, we may want to use some template helper of sorts to update both metric types at once instead of manually keeping metrics them in sync with code like

    getOverlayMetrics().mByteRead.Mark(byteCount);
    mPeerMetrics.mByteRead += byteCount;
@hidenori-shinohara
Copy link
Contributor

hidenori-shinohara commented Apr 5, 2022

As Nicolas mentioned above, we have two classes for metrics:

  • OverlayMetrics => aggregated over peers
  • PeerMetrics => Per peer metrics.

PeerMetrics is almost a subset of OverlayMetrics. The two major differences are:

  • PeerMetrics keeps track of mConnectedTime and OverlayMetrics doesn't. This makes sense and I think we should keep this as is.
  • PeerMetrics keeps track of message counts on top of byte counts for some metrics. Specifically, it's the following 4.
        uint64_t mUniqueFloodMessageRecv;
        uint64_t mDuplicateFloodMessageRecv;
        uint64_t mUniqueFetchMessageRecv;
        uint64_t mDuplicateFetchMessageRecv;

As far as I know, we are not going to calculate and expose all of these metrics for each peer.

  • Question 1: Which aggregated metrics in OverlayMetrics do we want to expose for each peer?
  • Question 2: What do we want to do for the 4 metrics that we only keep track of for each peer and don’t aggregate?
    • We can either
      • Option 1: Keep them as is (only do per-peer)
      • Option 2: Keep track of them for each peer AND aggregate by adding them to OverlayMetrics
      • Option 3: Not track of them at all.
    • Option 3 seems bad since we’ve added those metrics for a reason. I’m not sure about Option 1 or 2.

I think these will change the approach (e.g., if we add many metrics to PeerMetrics, maybe we should stop using PeerMetrics as there'd be a lot of duplicates?)

The following is the list of metrics in OverlayMetrics and whether we already keep track of them for each peer.

Aggregated metrics tracked in PeerMetrics

  • mMessageRead
  • mMessageWrite
  • mByteRead
  • mByteWrite
  • mUniqueFloodBytesRecvPeersMetrics also counts the number of messages
  • mDuplicateFloodBytesRecvPeersMetrics also counts the number of messages
  • mUniqueFetchBytesRecvPeersMetrics also counts the number of messages
  • mDuplicateFetchBytesRecvPeersMetrics also counts the number of messages

Aggregated metrics not tracked in PeerMetrics

  • Meter
    • mMessageDrop
    • mAsyncRead
    • mAsyncWrite
    • mErrorRead
    • mErrorWrite
    • mTimeoutIdle
    • mTimeoutStraggler
    • mSendErrorMeter
    • mSendHelloMeter
    • mSendAuthMeter
    • mSendDontHaveMeter
    • mSendGetPeersMeter
    • mSendPeersMeter
    • mSendGetTxSetMeter
    • mSendTxSetMeter
    • mSendTransactionMeter
    • mSendGetSCPQuorumSetMeter
    • mSendSCPQuorumSetMeter
    • mSendSCPMessageSetMeter
    • mSendGetSCPStateMeter
    • mSendSurveyRequestMeter
    • mSendSurveyResponseMeter
    • mSendSendMoreMeter
    • mMessagesBroadcast
      • The number of messages broadcast. We broadcast a message to the peers that haven’t sent us the message.
      • I believe it’s possible to turn that into a peer-based metric. ⇒ The number of messages that we have broadcast for each peer.
      • The coding might be slightly more complex since we pick the peers in FloodGate.cpp.
      • We should be able to do so by getting the overlay manager from mApp.
  • Timer
    • mConnectionLatencyTimer
    • mRecvErrorTimer
    • mRecvHelloTimer
    • mRecvAuthTimer
    • mRecvDontHaveTimer
    • mRecvGetPeersTimer
    • mRecvPeersTimer
    • mRecvGetTxSetTimer
    • mRecvTxSetTimer
    • mRecvTransactionTimer
    • mRecvGetSCPQuorumSetTimer
    • mRecvSCPQuorumSetTimer
    • mRecvSCPMessageTimer
    • mRecvGetSCPStateTimer
    • mRecvSendMoreTimer
    • mRecvSCPPrepareTimer
    • mRecvSCPConfirmTimer
    • mRecvSCPNominateTimer
    • mRecvSCPExternalizeTimer
    • mRecvSurveyRequestTimer
    • mRecvSurveyResponseTimer
    • mMessageDelayInWriteQueueTimer
    • mMessageDelayInAsyncWriteTimer
    • mOutboundQueueDelaySCP
    • mOutboundQueueDelayTxs

Non-aggregated metrics in OverlayMetrics

  • mItemFetcherNextPeer → Count the number of times we switched to a different peer.
  • mPendingPeersSize → It doesn’t really make sense to keep track of this for each peer.
  • mAuthenticatedPeersSize → It doesn’t make sense to do this for each peer.
  • mFlowControlPercent

@marta-lokhova What do you think?

@marta-lokhova
Copy link
Contributor

What do we want to do for the 4 metrics that we only keep track of for each peer and don’t aggregate?

Not sure I understand, why do we need to change this? The issue is around exposing this information in the peers endpoint. We don't need to rework the metrics, we just need to identify what we need to track. We shouldn't remove any of the PeerMetrics, because they are used in the survey.

I think these will change the approach (e.g., if we add many metrics to PeerMetrics, maybe we should stop using PeerMetrics as there'd be a lot of duplicates?)

I'm not sure I understand what you mean. Why will there be duplicates?

Regarding what information we need to expose in the peers endpoint, I'd add:

  • async read/write metrics, as well as the write queue delay (see overlay.delay.async-write and overlay.delay.write-queue)
  • flow control metrics (see overlay.outbound-queue.*)
  • message drop rate
  • I don't think we need to expose metrics per individual message type, as it's really verbose and not that helpful. We just want to give a general idea about how the connection is doing.

@hidenori-shinohara
Copy link
Contributor

hidenori-shinohara commented Apr 5, 2022

I'll work on adding those!

why do we need to change this?

We don't. I was just wondering!

Why will there be duplicates?

The PeerMetrics struct and the OverlayMetrics struct have a lot of members with the exact same. If we continued to add more members of the same name, I figured that it might make sense to consolidate those two structs.

@marta-lokhova
Copy link
Contributor

The PeerMetrics struct and the OverlayMetrics struct have a lot of members with the exact same. If we continued to add more members of the same name, I figured that it might make sense to consolidate those two structs.

I don't think we should consolidate those, as they track different things (per peer internal counters that are only available on demand vs aggregate metrics that are stored in the global registry)

@MonsieurNicolas MonsieurNicolas moved this from To do to In progress in v19.0.0 Apr 14, 2022
v19.0.0 automation moved this from In progress to Done Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
3 participants