-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Reprovider.Strategy for entity DAGs (HAMT/UnixFS dirs, big files) #8676
Comments
This seems reasonable. The biggest issue with this is that we unofficially create a special case for unixfs files as a single independent entities and that make it harder to create new interessting cross files features in the future. Two options that I would like to have would break with such thing:
What I would like to see.I would like to see some priority system. |
|
Improving provider strategies was previously discussed in: #6221, #5774, ipfs-inactive/package-managers#84. In this issue I want to propose a well-scoped improvement of codec-aware strategy that could be shipped without refactoring the entire system.
TLDR
Problem statement
Right now, we support three values in
Reprovider.Strategy
which tells reprovider what should be announced. Valid strategies are:If the repository gets too big,
all
andpinned
are too expensive and folks are forced to useroots
which is codec-agnostic and will only announce the root block of UnixFS DAG.This means in case of big UnixFS datasets, the user has to write additional orchestration code to go the extra mile and manually pin every file withing a bigger DAG, and make sure those sub-pins are removed when the entire DAG is no longer needed.
Proposed solution: codec-aware (UnixFs) strategy
Depending on a codec, different blocks may have different importance. In case of UnixFS the important blocks are manifest (root) blocks of directories and files. Sub-blocks of individual files with the data itself are not as critical as those manifest blocks. It is CID of manifest block that is looked up on DHT first.
A big data provider may want to opt-in to codec-aware strategy as "best-effort" way to provide something on DHT rather than nothing: in case of UnixFS only provide these manifest blocks on the DHT, facilitating initial lookup without the cost of announcing all the sub-blocks.
Open questions
The text was updated successfully, but these errors were encountered: