Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Drop peers when outbound message queue is full #2450

Closed
wants to merge 13 commits into from

Conversation

lrettig
Copy link
Member

@lrettig lrettig commented Jun 10, 2021

Motivation

Closely related to #2435
Partially addresses #1868, #2386 (and maybe #2405)

Changes

When the outbound message queue for a given connection is full, rather than blocking on Send, begin dropping messages and printing errors (rather than allowing the blocking to propagate up the gossip stack)

It would be good to go one step further and drop peers that are hanging, but this is a quick, simple fix for the acute issue we're seeing on the testnets. See #2385.

Test Plan

Includes a new unit test for both conn and msgconn types

TODO

  • Explain motivation or link existing issue(s)
  • Test changes and document test plan
  • Update documentation as needed

DevOps Notes

  • This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
  • This PR does not affect public APIs
  • This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
  • This PR does not make changes to log messages (which monitoring infrastructure may rely on)

Prevents blocking further up the gossip stack
@dshulyak
Copy link
Contributor

This queue is not only for broadcast, but also for direct requests (msg server and fetcher), is it the case? If so we can't simply drop messages, we must disconnect peer and return with error.

@moshababo moshababo modified the milestones: Aug 2 Sprint, Aug 16 Sprint Aug 16, 2021
@lrettig lrettig modified the milestones: Aug 16 Sprint, Aug 30 Sprint Aug 16, 2021
@lrettig lrettig requested a review from dshulyak August 23, 2021 17:22
@lrettig
Copy link
Member Author

lrettig commented Aug 23, 2021

This queue is not only for broadcast, but also for direct requests (msg server and fetcher), is it the case? If so we can't simply drop messages, we must disconnect peer and return with error.

This is correct. I updated this to drop peers if the outbound queue is full. Could you please take a look and see if this is what you had in mind?

@lrettig lrettig changed the title Drop outbound messages when msgcon queue full Drop peers when outbound message queue is full Aug 23, 2021
@dshulyak
Copy link
Contributor

bors try

@lrettig
Copy link
Member Author

lrettig commented Aug 29, 2021

bors merge

bors bot pushed a commit that referenced this pull request Aug 29, 2021
## Motivation
Closely related to #2435
Partially addresses #1868, #2386 (and maybe #2405)

## Changes
When the outbound message queue for a given connection is full, rather than blocking on `Send`, begin dropping messages and printing errors (rather than allowing the blocking to propagate up the gossip stack)

It would be good to go one step further and drop peers that are hanging, but this is a quick, simple fix for the acute issue we're seeing on the testnets. See #2385.

## Test Plan
Includes a new unit test for both `conn` and `msgconn` types

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [ ] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Aug 29, 2021

Build failed (retrying...):

bors bot pushed a commit that referenced this pull request Aug 29, 2021
## Motivation
Closely related to #2435
Partially addresses #1868, #2386 (and maybe #2405)

## Changes
When the outbound message queue for a given connection is full, rather than blocking on `Send`, begin dropping messages and printing errors (rather than allowing the blocking to propagate up the gossip stack)

It would be good to go one step further and drop peers that are hanging, but this is a quick, simple fix for the acute issue we're seeing on the testnets. See #2385.

## Test Plan
Includes a new unit test for both `conn` and `msgconn` types

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [ ] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Aug 29, 2021

Build failed:

@lrettig
Copy link
Member Author

lrettig commented Aug 30, 2021

bors merge

bors bot pushed a commit that referenced this pull request Aug 30, 2021
## Motivation
Closely related to #2435
Partially addresses #1868, #2386 (and maybe #2405)

## Changes
When the outbound message queue for a given connection is full, rather than blocking on `Send`, begin dropping messages and printing errors (rather than allowing the blocking to propagate up the gossip stack)

It would be good to go one step further and drop peers that are hanging, but this is a quick, simple fix for the acute issue we're seeing on the testnets. See #2385.

## Test Plan
Includes a new unit test for both `conn` and `msgconn` types

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [ ] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Aug 30, 2021

Build failed:

@lrettig
Copy link
Member Author

lrettig commented Aug 30, 2021

bors merge

bors bot pushed a commit that referenced this pull request Aug 30, 2021
## Motivation
Closely related to #2435
Partially addresses #1868, #2386 (and maybe #2405)

## Changes
When the outbound message queue for a given connection is full, rather than blocking on `Send`, begin dropping messages and printing errors (rather than allowing the blocking to propagate up the gossip stack)

It would be good to go one step further and drop peers that are hanging, but this is a quick, simple fix for the acute issue we're seeing on the testnets. See #2385.

## Test Plan
Includes a new unit test for both `conn` and `msgconn` types

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [ ] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Aug 30, 2021

Build failed (retrying...):

bors bot pushed a commit that referenced this pull request Aug 30, 2021
## Motivation
Closely related to #2435
Partially addresses #1868, #2386 (and maybe #2405)

## Changes
When the outbound message queue for a given connection is full, rather than blocking on `Send`, begin dropping messages and printing errors (rather than allowing the blocking to propagate up the gossip stack)

It would be good to go one step further and drop peers that are hanging, but this is a quick, simple fix for the acute issue we're seeing on the testnets. See #2385.

## Test Plan
Includes a new unit test for both `conn` and `msgconn` types

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [ ] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Aug 30, 2021

Build failed:

@lrettig lrettig assigned antonlerner and unassigned lrettig Aug 30, 2021
@lrettig lrettig modified the milestones: Aug 16 Sprint, Aug 30 Sprint Aug 30, 2021
@lrettig
Copy link
Member Author

lrettig commented Aug 30, 2021

This is done and ready to be merged, but stuck on flaky tests. Assigning to @antonlerner to make sure it gets merged. Let me know if I've missed anything or if there are any questions. Thanks!

@dshulyak
Copy link
Contributor

dshulyak commented Sep 2, 2021

bors try

bors bot added a commit that referenced this pull request Sep 2, 2021
@bors
Copy link

bors bot commented Sep 2, 2021

try

Build failed:

@lrettig
Copy link
Member Author

lrettig commented Sep 2, 2021

bors merge

bors bot pushed a commit that referenced this pull request Sep 2, 2021
## Motivation
Closely related to #2435
Partially addresses #1868, #2386 (and maybe #2405)

## Changes
When the outbound message queue for a given connection is full, rather than blocking on `Send`, begin dropping messages and printing errors (rather than allowing the blocking to propagate up the gossip stack)

It would be good to go one step further and drop peers that are hanging, but this is a quick, simple fix for the acute issue we're seeing on the testnets. See #2385.

## Test Plan
Includes a new unit test for both `conn` and `msgconn` types

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [ ] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)


Co-authored-by: Dmitry Shulyak <yashulyak@gmail.com>
@bors
Copy link

bors bot commented Sep 2, 2021

Pull request successfully merged into develop.

Build succeeded:

@bors bors bot changed the title Drop peers when outbound message queue is full [Merged by Bors] - Drop peers when outbound message queue is full Sep 2, 2021
@bors bors bot closed this Sep 2, 2021
@bors bors bot deleted the drop-p2p-messages branch September 2, 2021 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants