-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add iroh-sync and integrate into iroh node #1333
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Frando
force-pushed
the
sync-integration
branch
from
August 7, 2023 16:26
1d8f2d9
to
4f2b9b7
Compare
Frando
force-pushed
the
sync-integration
branch
from
August 10, 2023 11:19
a0a0264
to
3f07efb
Compare
Frando
changed the title
[WIP] Sync integration
feat: add iroh-sync and integrate into iroh node
Aug 10, 2023
This was referenced Aug 10, 2023
Closed
Frando
force-pushed
the
sync-integration
branch
from
August 10, 2023 12:18
c7170a5
to
f626fae
Compare
Arqu
reviewed
Aug 11, 2023
Frando
force-pushed
the
sync-integration
branch
from
August 11, 2023 10:47
83656b8
to
23bacb4
Compare
* removes content support from iroh-sync * adds a quick-and-dirty writable database to iroh-bytes (will be replaced with a better generic writable database soon) * adds a `Downloader` to queue get requests for individual hashes from individual peers * adds a `BlobStore` that combines the writable db with the downloader * adds a `Doc` abstraction that combines an iroh-sync `Replica` with a `BlobStore` to download content from peers on-demand * updates the sync repl example to plug it all together * also adds very basic persistence to `Replica` (encode to byte string) and uses this in the repl example
* make the REPL in the sync example work properly with rustyline for editing and reading input, shell-style argument parsing and clap for parsing commands * add a docs store for opening and closing docs * add author to doc struct
uses flume channels to allow for combined sync and async usage
3 tasks
## Description So far in #1333, if a RPC or in-memory client called `doc.subscribe()` the event callback would never be dropped, even if the client did drop the event stream. This PR fixes this, by having the event callbacks return whether the callback should stay active or not. We can't use the removal token here, because calling `LiveSync::unsubscribe` from within the event callback would deadlock the actor. Also adds a a `LiveStatus` to the doc info RPC call. For now only contains the number of subscribers. More info, e.g. on peers, can come later. ## Notes & open questions * As of #1333 and unchanged by this PR: `doc.subscribe` will fail for documents that are not in the `LiveSync` (they are added via `doc.import` or `doc.start_sync`). This is unfortunate, because you'd often want to setup a subscription before starting sync, to catch all events. ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [x] Tests if relevant.
switches to also use tokio::codec under the hood now. Also introduces a max message length of 1GiB for now.
dignifiedquire
approved these changes
Aug 24, 2023
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a long time coming, and finally a big PR that is not from me <3
matheus23
pushed a commit
that referenced
this pull request
Nov 14, 2024
## Description This PR adds `iroh-sync`, a document synchronization protocol, to iroh, and integrates with `iroh-net`, `iroh-gossip` and `iroh-bytes`. * At the core is the `iroh-sync` crate, with a set reconciliation algorithm implemented by @dignifiedquire. See [the old iroh-sync repo](https://github.com/n0-computer/iroh-sync/) for the prehistory and #1216 for the initial PR (fully included in this PR, and by now outdated) * Iroh sync is integrated in the iroh node, with iroh-gossip, in the RPC interface, and the CLI. * `LiveSync` is the handle to an actor that integrates sync with [gossip](#1149 ) to broadcast and receive document updates from peers. For each open document a gossip swarm is joined with a `TopicId` derived from the doc namespace. * mod `download` contains the new downloader. It will be improved in #1344 . * mod `client` is the new high-level RPC client. It currently only has methods for dealing with docs and sync, other things should be added once we merged this. CLI commands for sync are in `commands/sync.rs`. Will be much better with #1356 . * `examples/sync.rs` has a REPL to modify and sync docs. It does a full setup without using the iroh console. Also includes code to sync directories, and a hammer command for load testing. * The PR also introduces `iroh::client::Iroh`, a wrapper around the RPC client, and `iroh::client::Doc`, a wrapper around RPC client for a single document ## Notes & open questions #### Should likely happen before merge: * [x] Make `iroh_sync::Store:::list_authors` and `list_replicas` return iterators `iroh-sync` *fixed in #1366 * * [ ] Add `iroh_sync::Store::close_replica` * [x] `ContentStatus` in `on_insert` callback is reported as `Ready` if the content is still `baomap::PartialEntry` (in-process download) *fixed in a8e8093* #### Can happen after merge, but before `0.6` release * [ ] Implement `AuthorImport` and `AuthorShare` RPC & CLI commands * [ ] sync store `list_namespaces` and `list_authors` internally collect, return iterator instead * [ ] Fix cross-compiles to arm/android. See cross-rs/cross#1311 * [ ] Ensure that fingerpring calculation is efficient and/or cached for large documents. Currently calculating the initial fingerprint iterates once over all entries in a document. * [ ] Make content downloads be more reliable * [ ] Add some way to download content from peers independent of the first insertion event for a remote entry. The downloader with retries is tracked in #1334 and 1344, but independent of that, we still would currently only ever try to queue a download when the `on_insert` callback triggers, which is only once. There should be a way, even if manual for now, to try to download missing content in a replica from peers. * [ ] during `iroh-sync` sync include info if content is available for each entry * [ ] Add basic peer management and persistence. Currently live sync will stop to do anything after a node restart. * [ ] Persist the addressbook of peers for a document, to reconnect when restarting the node * [ ] Implement `PeerAdd` and `PeerList` RPC & CLI commands. The latter needs changes in `iroh-net` to expose information of currently-connected peers and their peer info. * [ ] Make read-only replicas possible * [ ] Improve reliablity of replica sync. * sync is triggered on each `NeighborUp` event from gossip. check that we don't sync too much. * maybe include peer info in gossip messages, to queue syncs with those (but not all at once) * track and exchange the timestamp of last full sync for peers, to know if you missed gossiped message and react accordingly * add more tests with peers coming and leaving #### Open questions * [ ] `iroh_sync::EntrySignature` should the signatures include a namespace prefix? * [ ] do we want the 1:1 mapping of `NamespaceId`and gossip `TopicId`, or would the topic id as a hash be better? #### Other TODOs collected from the code * [ ] Port `hammer` and `fs` commands from REPL example to iroh cli * [ ] docs/naming: settle terminology about keypairs, private/secret/signing keys, public keys/identifiers and make docs and symbols consistent * [ ] Make `bytes_get` streaming in the RPC interface * [ ] Allow to unset the subscription on a replica * [ ] `iroh-sync` shouldn't depend on `iroh-bytes` only for `Hash` type -> #1354 * [ ] * [ ] Move `sync::live::PeerSource` to iroh-net or even better -> #1354 * [ ] `StoreInstance::put` propagate error and verify timestamp is reasonable. * [ ] `StoreInstance::get_range` implement inverted range * [ ] `iroh_sync`: Remove some items only used in tests (marked with #[cfg(test)]) * [ ] `iroh_sync` fs store: verify get method fetches all keys with this namespace * [ ] `ranger::SimpleStore::get_range`: optimize * [ ] `ranger::Peer` avoid allocs? * [ ] `fs::StoreInstance::get_fingerprint` optimize * [ ] `SyncEngine::doc_subscribe` remove unwrap, handle error ## Change checklist - [x] Self-review. - [x] Documentation updates if relevant. - [ ] Tests if relevant. --------- Co-authored-by: dignifiedquire <me@dignifiedquire.com> Co-authored-by: Asmir Avdicevic <asmir.avdicevic64@gmail.com> Co-authored-by: Kasey <klhuizinga@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This PR adds
iroh-sync
, a document synchronization protocol, to iroh, and integrates withiroh-net
,iroh-gossip
andiroh-bytes
.iroh-sync
crate, with a set reconciliation algorithm implemented by @dignifiedquire. See the old iroh-sync repo for the prehistory and [WIP] feat: iroh-sync #1216 for the initial PR (fully included in this PR, and by now outdated)LiveSync
is the handle to an actor that integrates sync with gossip to broadcast and receive document updates from peers. For each open document a gossip swarm is joined with aTopicId
derived from the doc namespace.download
contains the new downloader. It will be improved in feat(iroh): downloader #1344 .client
is the new high-level RPC client. It currently only has methods for dealing with docs and sync, other things should be added once we merged this. CLI commands for sync are incommands/sync.rs
. Will be much better with feat: Iroh console (REPL) and restructured CLI #1356 .examples/sync.rs
has a REPL to modify and sync docs. It does a full setup without using the iroh console. Also includes code to sync directories, and a hammer command for load testing.iroh::client::Iroh
, a wrapper around the RPC client, andiroh::client::Doc
, a wrapper around RPC client for a single documentNotes & open questions
Should likely happen before merge:
iroh_sync::Store:::list_authors
andlist_replicas
return iteratorsiroh-sync
*fixed in iroh-sync API cleanup & docs #1366 *iroh_sync::Store::close_replica
ContentStatus
inon_insert
callback is reported asReady
if the content is stillbaomap::PartialEntry
(in-process download) fixed in a8e8093Can happen after merge, but before
0.6
releaseAuthorImport
andAuthorShare
RPC & CLI commandslist_namespaces
andlist_authors
internally collect, return iterator insteadon_insert
callback triggers, which is only once. There should be a way, even if manual for now, to try to download missing content in a replica from peers.iroh-sync
sync include info if content is available for each entryPeerAdd
andPeerList
RPC & CLI commands. The latter needs changes iniroh-net
to expose information of currently-connected peers and their peer info.NeighborUp
event from gossip. check that we don't sync too much.Open questions
iroh_sync::EntrySignature
should the signatures include a namespace prefix?NamespaceId
and gossipTopicId
, or would the topic id as a hash be better?Other TODOs collected from the code
hammer
andfs
commands from REPL example to iroh clibytes_get
streaming in the RPC interfaceiroh-sync
shouldn't depend oniroh-bytes
only forHash
type -> Iroh-common base library #1354sync::live::PeerSource
to iroh-net or even better -> Iroh-common base library #1354StoreInstance::put
propagate error and verify timestamp is reasonable.StoreInstance::get_range
implement inverted rangeiroh_sync
: Remove some items only used in tests (marked with #[cfg(test)])iroh_sync
fs store: verify get method fetches all keys with this namespaceranger::SimpleStore::get_range
: optimizeranger::Peer
avoid allocs?fs::StoreInstance::get_fingerprint
optimizeSyncEngine::doc_subscribe
remove unwrap, handle errorChange checklist