Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ipfs pin add on large trees is extremely slow even when most of the content is already pinned #4985

Open
ToxicFrog opened this issue Apr 28, 2018 · 5 comments

Comments

@ToxicFrog
Copy link

ToxicFrog commented Apr 28, 2018

Version information:

go-ipfs version: 0.4.13-
Repo version: 6
System version: amd64/linux
Golang version: go1.9.4

Type:

Enhancement

Description:

ipfs pin add is slow on large data sets, because it needs to walk the entire tree in order to verify that it exists on the node (and, if not, fetch it from elsewhere) before the pin is considered successful. This is expected.

However, it looks like when pinning something that itself contains other pinned objects, it re-checks all of the contents, even though it should know that the contents are already present.

Reproduction:

hash=$(ipfs add <a whole lot of data>)
ipfs pin ls --type=recursive  # should output hash
ipfs files mkdir /test
ipfs files cp /ipfs/$hash /test/foo
ipfs pin add $(ipfs files stat --hash /test)

The call to ipfs pin add will take as long as the original ipfs add did, or nearly so, even though all that data is already pinned and known to be present.

Ideally, pin add -- or whatever underlying function it calls to make sure the data is all available in the node -- should be smart enough to realize that when it encounters a recursively pinned object, it doesn't need to traverse that object's children, as they are guaranteed to be present.

@bonedaddy
Copy link

bonedaddy commented May 19, 2018

Have you found any particular tree size, or file size which causes issues?

@ToxicFrog
Copy link
Author

ToxicFrog commented May 20, 2018

No, but I also haven't really systematically tested. I can say that a dozen files and <100MB is fine, 500 files and 1.5GB is not -- but it probably depends on the system in question.

@magik6k
Copy link
Member

magik6k commented May 20, 2018

Note that if you know the previous hash there is ipfs pin update <from-path> <to-path> which will be much faster in most cases (if you want to keep the previous hash pinned you can specify --unpin=false)

@obo20
Copy link

obo20 commented Nov 27, 2018

@magik6k Could you expand a bit on what the pin update command does? I've never encountered anybody that seems to fully understand it. Is this basically for if you update small parts of a large file and think most child hashes will be the same?

Can this be used with directories? The IPFS experimental filestore option?

@magik6k
Copy link
Member

magik6k commented Nov 28, 2018

When you call pin add it has to traverse entire DAG to ensure that all nodes are present locally. caching list of all nodes in memory can be quite expensive, so we just don't do that for pinning (we do for GC, but that can't be easily reused).

pin update is optimized version of pin add which assumes that DAG referred by <from-path> shares some structure and content with <to-path> tree. Basically when traversing new DAG, we also check matching paths in the new DAG, and if CIDs of some path are the same, we know that whatever is under that path is already pinned, so we don't have to check it.

Can this be used with directories?

Yep

The IPFS experimental filestore option?

I don't think so, AFAIK filestore doesn't provide easy way to update referenced files. cc @kevina

@momack2 momack2 added this to Inbox in ipfs/go-ipfs May 9, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Development

No branches or pull requests

4 participants