IPFS Repo GC Is Unrealistic At Non-Trivial Scale: `rm -rf` + resync is faster #7213

bonedaddy · 2020-04-25T00:36:59Z

Version information:

go-ipfs version: 0.4.23-
Repo version: 7
System version: amd64/linux
Golang version: go1.13.7

Description:

At any non-trivial scale, when an IPFS node has several hundreds of thousands of pins, and even a million or more, the process of running garbage collection of go-ipfs nodes is basically an impossible task. While running a GC process on a node with ~750k pins, in a 30 minute process not 1 pin was removed, and my on-disk size of the repository go-ipfs uses had not gone down even a single byte.

In practice, the act of rm -rf /path/to/datastore and resyncing the node is astronomically faster than dealing with a full garbage collection process. In fact this solution is not only faster, but it doesn't block your IPFS node to the extent that garbage collection does. I can't think of an alternative that doesn't involve waiting hours, if not days to perform a full garbage collection on an ipfs node.

This is a pretty big concern, and makes using go-ipfs extremely unfeasible outside of hobby environments, and test environments.

The text was updated successfully, but these errors were encountered:

Stebalien · 2020-04-25T01:18:52Z

This is an experience report, not a bug report.

Stebalien · 2020-04-25T01:23:45Z

Expanding on that, we are very well aware that go-ipfs's garbage collection system does not scale to large repos. However, debating it to death in yet another bug report isn't going to bring us closer to fixing the issue.

bonedaddy · 2020-04-25T05:35:51Z

I disagree that this is an "experience report". Taking days to run GC is a bug, but to each their own

Stebalien · 2020-04-25T05:50:45Z

I agree we need to fix this issue. However, we have a limited amount of developer bandwidth split across a massive project. I closed this issue because we have many open issues on the same topic and yet another "this sucks" issue isn't going to get us any closer to fixing it:

If you do manage to fix this issue, I'd be happy to accept a patch.

vaultec81 · 2020-05-02T20:32:00Z

Why couldn't we have graph style index that keeps track of all the IPFS object refs/descendants rather then having to query the entire data store for GC?

bonedaddy added the kind/bug A bug in existing code (including security flaws) label Apr 25, 2020

Stebalien closed this as completed Apr 25, 2020

bonedaddy mentioned this issue Apr 25, 2020

Fast Track Temporal Internal Replacement With TemporalX For Production RTradeLtd/Temporal#480

Open

gammazero mentioned this issue Nov 4, 2020

[META] Garbage Collection Enhancement/Rework #7752

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPFS Repo GC Is Unrealistic At Non-Trivial Scale: `rm -rf` + resync is faster #7213

IPFS Repo GC Is Unrealistic At Non-Trivial Scale: `rm -rf` + resync is faster #7213

bonedaddy commented Apr 25, 2020

Stebalien commented Apr 25, 2020

Stebalien commented Apr 25, 2020

bonedaddy commented Apr 25, 2020

Stebalien commented Apr 25, 2020

vaultec81 commented May 2, 2020

IPFS Repo GC Is Unrealistic At Non-Trivial Scale: rm -rf + resync is faster #7213

IPFS Repo GC Is Unrealistic At Non-Trivial Scale: rm -rf + resync is faster #7213

Comments

bonedaddy commented Apr 25, 2020

Version information:

Description:

Stebalien commented Apr 25, 2020

Stebalien commented Apr 25, 2020

bonedaddy commented Apr 25, 2020

Stebalien commented Apr 25, 2020

vaultec81 commented May 2, 2020

IPFS Repo GC Is Unrealistic At Non-Trivial Scale: `rm -rf` + resync is faster #7213

IPFS Repo GC Is Unrealistic At Non-Trivial Scale: `rm -rf` + resync is faster #7213