Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPFS Repo GC Is Unrealistic At Non-Trivial Scale: rm -rf + resync is faster #7213

Closed
bonedaddy opened this issue Apr 25, 2020 · 5 comments
Closed
Labels
kind/bug A bug in existing code (including security flaws)

Comments

@bonedaddy
Copy link
Contributor

Version information:

go-ipfs version: 0.4.23-
Repo version: 7
System version: amd64/linux
Golang version: go1.13.7

Description:

At any non-trivial scale, when an IPFS node has several hundreds of thousands of pins, and even a million or more, the process of running garbage collection of go-ipfs nodes is basically an impossible task. While running a GC process on a node with ~750k pins, in a 30 minute process not 1 pin was removed, and my on-disk size of the repository go-ipfs uses had not gone down even a single byte.

In practice, the act of rm -rf /path/to/datastore and resyncing the node is astronomically faster than dealing with a full garbage collection process. In fact this solution is not only faster, but it doesn't block your IPFS node to the extent that garbage collection does. I can't think of an alternative that doesn't involve waiting hours, if not days to perform a full garbage collection on an ipfs node.

This is a pretty big concern, and makes using go-ipfs extremely unfeasible outside of hobby environments, and test environments.

@bonedaddy bonedaddy added the kind/bug A bug in existing code (including security flaws) label Apr 25, 2020
@Stebalien
Copy link
Member

This is an experience report, not a bug report.

@Stebalien
Copy link
Member

Expanding on that, we are very well aware that go-ipfs's garbage collection system does not scale to large repos. However, debating it to death in yet another bug report isn't going to bring us closer to fixing the issue.

@bonedaddy
Copy link
Contributor Author

I disagree that this is an "experience report". Taking days to run GC is a bug, but to each their own

@Stebalien
Copy link
Member

I agree we need to fix this issue. However, we have a limited amount of developer bandwidth split across a massive project. I closed this issue because we have many open issues on the same topic and yet another "this sucks" issue isn't going to get us any closer to fixing it:

If you do manage to fix this issue, I'd be happy to accept a patch.

@vaultec81
Copy link

Why couldn't we have graph style index that keeps track of all the IPFS object refs/descendants rather then having to query the entire data store for GC?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug A bug in existing code (including security flaws)
Projects
None yet
Development

No branches or pull requests

3 participants