Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sync): track incoming sync requests, allow subscriptions without sync, close inactive replicas #1491

Merged
merged 17 commits into from
Sep 20, 2023

Conversation

Frando
Copy link
Member

@Frando Frando commented Sep 18, 2023

Description

This PR adds a few missing parts for sync in iroh:

  • Allow to subscribe to replicas if they're not in the set of syncing replicas (this was not possible before).
  • Properly track open replicas in the LiveSync and close them once they're not needed anymore. Therefore adds iroh_sync::store::close_replica to remove a replica instance from the store-internal cache of open replicas. This is needed because otherwise it grows unbounded (for anchor nodes especially).
  • Pass incoming connections to the LiveSync actor and handle them there. Before, incoming connections for sync were always processed if a replica with the requested NamespaceId is found in the local replica store. This PR changes this mechanism so that incoming sync connections are passed into the sync engine first. This leads to the changes that motivate this PR:
    • Incoming sync requests are only answered if the document (replica) in question was opened for sync before (via a call to start_sync). Note that this is not persisted at the moment, but should be (separate change though).
    • We can track the sync status for outgoing and incoming sync flows. This will reduce the number of sync connections, because currently we will oftenly double-sync in both directions (unneeded, because the sync is always bidirectional). Now, we abort if a sync is already in progress.
  • Expose events about successfull and failed sync runs to the upper layer via the document subscriptions with a new LiveEvent:.SyncFinished that reports all available information

Notes & open questions

Do we want to track history of syncs internally or leave it up to the app via the events? Not sure myself.

The diff of sync_engine/live.rs is a little hard to read, because there's a change from TopicId to NamespaceId in a few places that is mostly unrelated. Sorry for the mixup, but it makes sense to track everything by NamespaceId and it ended up in a commit here.

Logging is much improved in this PR.

Maybe we can improve the code flow a little to make things more obvious.

Change checklist

  • Self-review.
  • Documentation updates if relevant.
  • Tests if relevant.

@Frando Frando changed the title (WIP) fix/refactor(sync): handle incoming conns in engine fix/refactor(sync): properly track incoming sync requests Sep 18, 2023
@Frando Frando changed the title fix/refactor(sync): properly track incoming sync requests fix/refactor(sync): properly track incoming requests Sep 18, 2023
@Frando Frando changed the title fix/refactor(sync): properly track incoming requests fix/refactor(sync): properly track sync requests Sep 18, 2023
|namespace, _| {
futures::future::ready(
bob_replica_store_task
.open_replica(&namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a blocking call, we should probably mark it as such

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferred for later: #1502

&mut bob_writer,
&mut bob_reader,
bob_store,
|namespace, _| {
futures::future::ready(bob_store.open_replica(&namespace).map(Into::into))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deferred to #1502

@Frando Frando changed the title fix/refactor(sync): properly track sync requests feat(sync): track incoming sync requests, allow subscriptions without sync, close inactive replicas Sep 20, 2023
…ve sync (#1495)

## Description

Based on #1491 

This adds two related features that were missing so far:

* Add `iroh_sync::store::close_replica` to remove a replica instance
from the store-internal cache of open replicas. This is needed because
otherwise it grows unbounded (for anchor nodes especially).
* Properly handle open replicas in the `LiveSync`, and through this
allow to subscribe to replica events for replicas that are not in the
set of actively syncing replicas. This was not possible before (you
could only subscribe to replicas that are active for sync)
* Add test for subscribing without sync

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist

- [x] Self-review.
- [x] Documentation updates if relevant.
- [x] Tests if relevant.
@Frando Frando added this pull request to the merge queue Sep 20, 2023
@Frando Frando removed this pull request from the merge queue due to a manual request Sep 20, 2023
@Frando Frando added this pull request to the merge queue Sep 20, 2023
Merged via the queue into main with commit 6c07ad3 Sep 20, 2023
15 checks passed
github-merge-queue bot pushed a commit that referenced this pull request Sep 21, 2023
## Description

On `main` the test `sync_full_basic` is flakey. This PR (hopefully)
fixes it.

The reason was: We have the situation that two peers initiate
connections to each other at (roughly) the same time. In #1491 this was
sometimes prevented, but not reliably.

This PR fixes it by:
* When connecting, we set a `SyncState::Dialing` for the `(namespace,
peer)`
* When accepting, if our own state is `Dialing` for the incoming request
for `(namespace, peer)` we compare our peer id with that of the incoming
request, and abort if ours is higher (doesn't matter which way, we care
about a predictable outcome only
* Through this, only one of the two simoultanoues connections will
survive
* Also added a `Abort` frame to the wire protocol to transfer to inform
the dialer about the reason of the declined request, which is either "we
do double sync, and will take the other conn" (`AlreadySyncing`) or
"this replica is not here" (`NotAvailable`)

This PR also:
* Further improves logging
* Improves errors

## Notes & open questions

<!-- Any notes, remarks or open questions you have to make about the PR.
-->

## Change checklist

- [x] Self-review.
- [x] Documentation updates if relevant.
- [x] Tests if relevant.

---------

Co-authored-by: Diva M <divma@protonmail.com>
@dignifiedquire dignifiedquire deleted the sync-accept-in-engine branch November 1, 2023 14:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants