Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - Correct a race condition when dialing peers #4056

Closed
wants to merge 4 commits into from

Conversation

AgeManning
Copy link
Member

There is a race condition which occurs when multiple discovery queries return at almost the exact same time and they independently contain a useful peer we would like to connect to.

The condition can occur that we can add the same peer to the dial queue, before we get a chance to process the queue.
This ends up displaying an error to the user:

ERRO Dialing an already dialing peer

Although this error is harmless it's not ideal.

There are two solutions to resolving this:

  1. As we decide to dial the peer, we change the state in the peer-db to dialing (before we add it to the queue) which would prevent other requests from adding to the queue.
  2. We prevent duplicates in the dial queue

This PR has opted for 2. because 1. will complicate the code in that we are changing states in non-intuitive places. Although this technically adds a very slight performance cost, its probably a cleaner solution as we can keep the state-changing logic in one place.

@AgeManning AgeManning added the ready-for-review The code is ready for review label Mar 7, 2023
@divagant-martian
Copy link
Collaborator

if looks like you need to bump our msrv, also if you think it's fine, could we bump the delay_map version of deposit_contract to avoid having two versions?

@AgeManning AgeManning added the v4.0.0 Mainnet Capella release expected late March 2023 label Mar 13, 2023
@AgeManning
Copy link
Member Author

Updated the MSRV, I dont think deposit contract uses delay_map. There is an older version used in discv5, but we will update that dep with a new release which we need soon anyway.

@divagant-martian
Copy link
Collaborator

oh yeah that was a gross misreading on my part, it's the discv5 one

@AgeManning AgeManning added ready-for-merge This PR is ready to merge. and removed ready-for-review The code is ready for review labels Mar 16, 2023
@AgeManning
Copy link
Member Author

bors r+

bors bot pushed a commit that referenced this pull request Mar 16, 2023
There is a race condition which occurs when multiple discovery queries return at almost the exact same time and they independently contain a useful peer we would like to connect to.

The condition can occur that we can add the same peer to the dial queue, before we get a chance to process the queue. 
This ends up displaying an error to the user: 
```
ERRO Dialing an already dialing peer
```
Although this error is harmless it's not ideal. 

There are two solutions to resolving this:
1. As we decide to dial the peer, we change the state in the peer-db to dialing (before we add it to the queue) which would prevent other requests from adding to the queue. 
2. We prevent duplicates in the dial queue

This PR has opted for 2. because 1. will complicate the code in that we are changing states in non-intuitive places. Although this technically adds a very slight performance cost, its probably a cleaner solution as we can keep the state-changing logic in one place.
@bors bors bot changed the title Correct a race condition when dialing peers [Merged by Bors] - Correct a race condition when dialing peers Mar 16, 2023
@bors bors bot closed this Mar 16, 2023
Woodpile37 pushed a commit to Woodpile37/lighthouse that referenced this pull request Jan 6, 2024
There is a race condition which occurs when multiple discovery queries return at almost the exact same time and they independently contain a useful peer we would like to connect to.

The condition can occur that we can add the same peer to the dial queue, before we get a chance to process the queue.
This ends up displaying an error to the user:
```
ERRO Dialing an already dialing peer
```
Although this error is harmless it's not ideal.

There are two solutions to resolving this:
1. As we decide to dial the peer, we change the state in the peer-db to dialing (before we add it to the queue) which would prevent other requests from adding to the queue.
2. We prevent duplicates in the dial queue

This PR has opted for 2. because 1. will complicate the code in that we are changing states in non-intuitive places. Although this technically adds a very slight performance cost, its probably a cleaner solution as we can keep the state-changing logic in one place.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v4.0.0 Mainnet Capella release expected late March 2023
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants