Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - More metrics + RPC tweaks #2041

Closed

Conversation

divagant-martian
Copy link
Collaborator

@divagant-martian divagant-martian commented Dec 2, 2020

Issue Addressed

NA

Proposed Changes

This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here

  • Add metrics for rpc errors per client, error type and direction
  • Add metrics for downscoring events per source type, client and penalty type
  • Add metrics for gossip validation results per client for non-accepted messages
  • Make the RPC handler return errors and requests/responses in the order we see them
  • Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds
  • Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors
  • Remove some unused code in the PeerAction and the rpc handler
  • Remove the unused variant RateLimited. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change)

Metrics for Nimbus looked like this:
Downscoring events: increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])
image

RPC Errors: increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])
image

Unaccepted gossip message: increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])
image

@divagant-martian divagant-martian marked this pull request as draft December 2, 2020 18:13
@paulhauner paulhauner added the work-in-progress PR is a work-in-progress label Dec 3, 2020
@divagant-martian divagant-martian changed the title add rpc error metrics + remove unused rate limited err More metrics + RPC tweaks Dec 4, 2020
@divagant-martian divagant-martian marked this pull request as ready for review December 5, 2020 10:32
@divagant-martian divagant-martian removed the work-in-progress PR is a work-in-progress label Dec 5, 2020
@AgeManning
Copy link
Member

bors r+

bors bot pushed a commit that referenced this pull request Dec 8, 2020
## Issue Addressed

NA

## Proposed Changes
This was mostly done to find the reason why LH was dropping peers from Nimbus. It proved to be useful so I think it's worth it. But there is also some functional stuff here
- Add metrics for rpc errors per client, error type and direction
- Add metrics for downscoring events per source type, client and penalty type
- Add metrics for gossip validation results per client for non-accepted messages
- Make the RPC handler return errors and requests/responses in the order we see them
- Allow a small burst for the Ping rate limit, from 1 every 5 seconds to 2 every 10 seconds
- Send rate limiting errors with a particular code and use that same code to identify them. I picked something different to 128 since that is most likely what other clients are using for their own errors
- Remove some unused code in the `PeerAction` and the rpc handler
- Remove the unused variant `RateLimited`. tTis was never produced directly, since the only way to get the request's protocol is via de handler. The handler upon receiving from LH a response with an error (rate limited in this case) emits this event with the missing info (It was always like this, just pointing out that we do downscore rate limiting errors regardless of the change)

Metrics for Nimbus looked like this:
Downscoring events: `increase(libp2p_peer_actions_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210880-862bf280-3676-11eb-94c0-399f0bf5aa2e.png)

RPC Errors: `increase(libp2p_rpc_errors_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101210997-ba071800-3676-11eb-847a-f32405ede002.png)

Unaccepted gossip message: `increase(gossipsub_unaccepted_messages_per_client{client="Nimbus"}[5m])`
![image](https://user-images.githubusercontent.com/26765164/101211124-f470b500-3676-11eb-9459-132ecff058ec.png)
@bors bors bot changed the title More metrics + RPC tweaks [Merged by Bors] - More metrics + RPC tweaks Dec 8, 2020
@bors bors bot closed this Dec 8, 2020
@divagant-martian divagant-martian deleted the rpc-error-metrics branch January 4, 2021 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants