Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calculate and expose more individual peer info through peers endpoint #3393

Merged
merged 3 commits into from Apr 16, 2022

Conversation

hidenori-shinohara
Copy link
Contributor

Description

Resolves #3041

This PR calculates and exposes more metrics for each peer through the peers endpoint. The new output can be slightly verbose. If compact=true, the new output will be the same as the current output.

Checklist

  • Reviewed the contributing document
  • Rebased on top of master (no merge commits)
  • Ran clang-format v8.0.0 (via make format or the Visual Studio extension)
  • Compiles
  • Ran all tests
  • If change impacts performance, include supporting evidence per the performance document

docs/software/commands.md Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
a = std::chrono::duration_cast<std::chrono::milliseconds>(b);
}
};
updateMax(peerMetrics.mMaxMessageDelayInWriteQueue, qdelay);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use all-time max? Not sure it's helpful, as this data decays overtime and provides less insight into what's going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to show P99s from the last 300-second window

src/overlay/Peer.cpp Outdated Show resolved Hide resolved
src/overlay/Peer.cpp Outdated Show resolved Hide resolved
@hidenori-shinohara
Copy link
Contributor Author

hidenori-shinohara commented Apr 8, 2022

Here's a sample output for one peer when compact=false:

         {                                                                                                            
            "address" : "54.173.54.200:11625",
            "async_read" : 149,    
            "async_write" : 511,              
            "byte_read" : 353760,            
            "byte_write" : 315188,            
            "duplicate_fetch_message_recv" : 0,
            "duplicate_flood_message_recv" : 547,
            "elapsed" : 18,                    
            "flow_control" : {                    
               "local_capacity" : {
                  "flood" : 200, 
                  "reading" : 201                 
               },                                 
               "outbound_queue_delay_scp_p99" : 0,
               "outbound_queue_delay_txs_p99" : 0,
               "peer_capacity" : 198,        
               "state" : "enabled"
            },                                     
            "id" : "sdf_1",                        
            "latency" : 22,        
            "message_delay_in_async_write_p99" : 0,
            "message_delay_in_write_queue_p99" : 92,
            "message_drop" : 0,
            "message_read" : 748,                  
            "message_write" : 705,                 
            "olver" : 20,                                                                                             
            "unique_fetch_message_recv" : 0,
            "unique_flood_message_recv" : 0,
            "ver" : "stellar-core 18.5.0 (d387c6a710322135ac076804490af22c4587b96d)"                                  
         }

src/overlay/Peer.h Outdated Show resolved Hide resolved
Copy link
Contributor

@marta-lokhova marta-lokhova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me overall! One thing I realized is that reporting p99 for the timers maybe isn't as useful as, say, p75, so perhaps we should report that instead.

@@ -1007,7 +1064,12 @@ Peer::maybeSendNextBatch()
auto& timer = front.mMessage->type() == SCP_MESSAGE
? om.mOutboundQueueDelaySCP
: om.mOutboundQueueDelayTxs;
timer.Update(mApp.getClock().now() - front.mTimeEmplaced);
auto& peerTimer = front.mMessage->type() == SCP_MESSAGE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: maybe better to rename timer to aggregateTimer for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

@MonsieurNicolas MonsieurNicolas added this to In progress in v19.0.0 Apr 14, 2022
@hidenori-shinohara
Copy link
Contributor Author

I will squash the commits tomorrow

@hidenori-shinohara
Copy link
Contributor Author

I just finished squashing the commits.

@MonsieurNicolas
Copy link
Contributor

r+ 0064a59

@latobarita latobarita merged commit 9d0704e into stellar:master Apr 16, 2022
v19.0.0 automation moved this from In progress to Done Apr 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

expose individual peer information tracked by overlay via the peers endpoint
4 participants