This repository has been archived by the owner on May 15, 2024. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 4
Retrieval Optimization To Singularity #143
Closed
Tracked by
#139
hannahhoward opened this issue
Oct 4, 2023
· 3 comments
· Fixed by data-preservation-programs/singularity#404 or #195
Closed
Tracked by
#139
Retrieval Optimization To Singularity #143
hannahhoward opened this issue
Oct 4, 2023
· 3 comments
· Fixed by data-preservation-programs/singularity#404 or #195
Comments
To achieve the actual benefits of this you need to also implement data-preservation-programs/singularity#366, which is nearly identical in nature, and there will likely be a lot of code you can share. |
@gammazero currently fixing a critical issue The retrieval optimization work is on going |
gammazero
added a commit
to data-preservation-programs/singularity
that referenced
this issue
Oct 27, 2023
Optimize retrieval so that when requested retrieval ranges do not align with singularity file ranges, only the minimal number of retrieval requests are made. This is accomplished by creating a separate reader for each singularity file range. For reads that are larger than a range, multiple ranges are read until the read request is satisfied or until all data is read. For reads smaller than the amount of data remaining in the range, the range reader is maintained so that it can continue to be read from by subsequent reads. This approach associates a reader with each Singularity file range, and not the ranges requested via the API (in HTTP range header). This avoids needing to parse the range header in order to create readers where each reads some number of Singularity ranges. Rather, as arbitrary requested ranges are read, an existing reader for the corresponding singularity range(s) is reused if the requested range falls on a singularity range from a previous read. This also means that there is only a single retrieval for each singularity range, whereas if readers were associated with requested ranges then multiple readers could overlap the same singularity range and require multiple retrievals of the same range. Fixes #366 Fixes filecoin-project/motion#143 As an optimization, only one singularity range reader is maintained at a time. This works because once a new singularity range is selected by the requested range read, then it is highly unlikely that a subsequent read request will fall on a a singularity range that was already read from, previous to the new one. Additional changes: - The `filecoinReader` implementation supports the `io.WriteTo` interface to allow direct copying to an `io.Writer`. - The `FilecoinRetriever` interface supports the `RetrieveReader` function that returns an `io.ReadCloser` to read data from.
Reopened - finishing optimization to minimize separate requests to singularity within the same range. |
gammazero
added a commit
that referenced
this issue
Oct 30, 2023
Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches. Fixes #143
gammazero
added a commit
that referenced
this issue
Oct 30, 2023
Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches. Fixes #143
gammazero
added a commit
that referenced
this issue
Oct 30, 2023
Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches. Fixes #143
gammazero
added a commit
that referenced
this issue
Oct 31, 2023
Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches. Fixes #143
gammazero
added a commit
that referenced
this issue
Nov 2, 2023
Read HTTP range headers and do fetch for entire range instead of allowing the io.CopyN, used by http.ServeContent, to do multiple small fetches. Fixes #143
gammazero
added a commit
that referenced
this issue
Nov 3, 2023
Pass retrieval request through to singularity. Singularity will handle range requests. Fixes #143
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Currently, retrievial through singularity is implemented through a read seeker abstraction. Since the io.ReadSeeker abstraction does not know about the over range parameters of the incoming HTTP request to motion, each call to the io.Reader.Read method is implemented by making a seperate range request into Singularity for the exact parameters of that individual read operation.
Under the hood, the implementation of HTTP.ServeContent that we employ uses io.Copy, which in turn executes read operations in the neigborhood of 32K, and extremely small request. Worse, inside Singularity, we duplicate this issue, turning read requests into 32k Filecoin retrievals, which is obviously extremely inefficient.
The proposed optimization here is as follows:
Sidebar: all this works but it seems like we're in the guts a bit, and I wonder if ReedSeeker + HTTP.ServeContent is the right abstraction any more.
The text was updated successfully, but these errors were encountered: