-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(iroh): downloader #1420
feat(iroh): downloader #1420
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very nice PR! The docs make it quite straightforward to read and understand. Thanks for that!
I did not get through to the end, will continue tomorrow or later tonight. I did not see anything that would block a merge, apart from minor nits. Will try to read through the tests still.
I think we will want to move away from the default 500ms delay ASAP, this just doesn't feel good - but that can come after merge with #1470 I'd say.
collection_parser, | ||
}; | ||
|
||
let service = Service::new(getter, dialer, concurrency_limits, msg_rx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I usually call the Service
an Actor
- I think both terms work well, but maybe let's bikeshed and then (later) unifiy across the Iroh codebase? Makes it easier to understand when reading/learning the codebase if these "structs that have a async run(self)
method and process events from different sources in a loop" are called similarily.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer service
because it does not marry with the Actor model and it's clear enough on intent, but I'm fine with adjusting if the team agrees
) { | ||
// this is simply INITIAL_REQUEST_DELAY * attempt_num where attempt_num (as an ordinal | ||
// number) is maxed at INITIAL_RETRY_COUNT | ||
let delay = INITIAL_REQUEST_DELAY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen exponential back-off calculations used in places like this (e.g. for reconnects). Not sure if we'd gain much by that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also not sure, current maximum accumulated delay amounts to 10seconds if my math is right. This is accumulated delay, which does not take into account the time it takes to do a download. max total delay of 10seconds sounds ok to me in general
Err(FailureAction::DropPeer(reason)) => { | ||
debug!(%peer, ?kind, %reason, "peer will be dropped"); | ||
if let Some(_connection) = peer_info.conn.take() { | ||
// TODO(@divma): this will fail open streams, do we want this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets at least log this in a way that we can see if this is an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we log it in the previous line. Something else you think we should log or that should be added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the connection id?
} | ||
|
||
/// Get a blob or collection | ||
pub async fn get<D: Store, C: CollectionParser>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rklaehn would be good if you could review the below get_*
functions, given you are the most familiar with fetching blob
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work. Left all the useful comments I can give for this round 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great, and I did not find anything that @dignifiedquire did not yet mention.
I really like the test setup with check_invariants
and the mocked impls.
Let's get this in :-)
Description
Adds the
Downloader
as specified in #1334 plus some backchannel convosFeatures include:
Notes & open questions
TODOs
I know I've said this a hundred times with a hundred different things but would love to test this as well under stress scenarios and a large number of peers. don't hate meIn reality after abstracting away all the IO most scenarios can be simulated easily. What would remain for a much later time when the need and opportunity for real case testing scenario arises is to tune the concurrency parameters
Future work
Downloading Ranges
There was the requirement of downloading a Hash, a range of a Hash, a collection and (not mentioned but potentially implied) ranges of collections. There is no support for ranges right now because of the great duplication of the
get
code in order to take advantage of proper errors added in #1362. In principle, adding ranges should be really easy. This is because it's an extension of theDownloadKind
and would simply need calculating the missing ranges not based on the difference between what we have and the whole range, but the given range. I would prefer to find a way to deduplicate the get code before doing this extension.Also, as far as I can tell, there is no need for this yet.
Prioritizing candidates per role:
Provider
andCandidate
A nice extension, as discussed at some point, is to differentiate candidates we know have the data, from those that might have the data. This has added benefit that when a peer is available to perform another download under the concurrency limits, a hash we know they have could be downloaded right away instead of waiting for the delay. At this point making this doesn't make sense because we will likely attempt a download before the peer has retrieved the data themselves. To implement this, we would need to add the notifications of fully downloaded hashes as available into gossip first.
Leveraging the info from gossip
When declaring that a hash
X
should be downloaded, it's also an option to query gossip for peers that are subscribed to the topic to whichX
belongs to use them as candidates. This could be done connecting theProviderMap
togossip
. For now I don't see the need to do this.Open questions about Future work
Generally curious how this will end. My question here is whether we should do this for every download, or just on data that we expect to be big. Is there any way to obtain such hint without relying on a query every single time?
Change checklist