Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: avoid double conns, better state tracking #1505

Merged
merged 6 commits into from
Sep 21, 2023

Conversation

Frando
Copy link
Member

@Frando Frando commented Sep 20, 2023

Description

On main the test sync_full_basic is flakey. This PR (hopefully) fixes it.

The reason was: We have the situation that two peers initiate connections to each other at (roughly) the same time. In #1491 this was sometimes prevented, but not reliably.

This PR fixes it by:

  • When connecting, we set a SyncState::Dialing for the (namespace, peer)
  • When accepting, if our own state is Dialing for the incoming request for (namespace, peer) we compare our peer id with that of the incoming request, and abort if ours is higher (doesn't matter which way, we care about a predictable outcome only
  • Through this, only one of the two simoultanoues connections will survive
  • Also added a Abort frame to the wire protocol to transfer to inform the dialer about the reason of the declined request, which is either "we do double sync, and will take the other conn" (AlreadySyncing) or "this replica is not here" (NotAvailable)

This PR also:

  • Further improves logging
  • Improves errors

Notes & open questions

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.

@Frando Frando force-pushed the sync-fix-parallel-connect branch 2 times, most recently from fe5d473 to 1ba1941 Compare September 20, 2023 23:47
@divagant-martian divagant-martian added this to the v0.6.0 milestone Sep 21, 2023
Copy link
Contributor

@divagant-martian divagant-martian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the only things that jump to me is doing explicit imports of std::result::Result if you are using anyhow::Result, setting the error will overwrite the generic so that's not necessary (if that's the reason)

iroh-sync/src/net.rs Outdated Show resolved Hide resolved
iroh-sync/src/net/codec.rs Outdated Show resolved Hide resolved
iroh/src/sync_engine/live.rs Outdated Show resolved Hide resolved
iroh/src/sync_engine/live.rs Outdated Show resolved Hide resolved
iroh/src/sync_engine/live.rs Outdated Show resolved Hide resolved
iroh/src/sync_engine/live.rs Outdated Show resolved Hide resolved
iroh/src/sync_engine/live.rs Outdated Show resolved Hide resolved
@divagant-martian divagant-martian added this pull request to the merge queue Sep 21, 2023
Merged via the queue into main with commit d8cc9df Sep 21, 2023
15 checks passed
@dignifiedquire dignifiedquire deleted the sync-fix-parallel-connect branch November 1, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants