IPFS Garbage Collection v2 #8

gammazero · 2021-02-16T19:40:56Z

No description provided.

momack2

Nice! Let’s do an estimate on additional incurred pinning cost for a large datastore like pinata

proposals/new-ipfs-gc.md

momack2 · 2021-02-17T19:44:42Z

proposals/new-ipfs-gc.md

+- Ability for GC to remove content identified by CID
+  - New `--cid` option: `ipfs repo gc --cid=<cid>`
+
+Removing pins will be slower because it requires walking DAG to decrement reference counts. Adding items to MFS and removing items from MFS will be slower due to needing to update reference counts.  Hopefully any increase in execution time will be insignificant.


Seems like we should use piñata as an example here and estimate what the impact would be (since they add pins all the time, but don’t GC as often now)

I caught up with Matt to try and grab some numbers here:
https://www.notion.so/Pinata-5157fa6d2a4741a593bb8ff32bfdb08e

@momack2 I will add some rough estimates by timing some things that I think will be similar in time to the new GC, and comparing with the old GC.

@jnthnvctr - I added a link to this in the doc, and brought some of the content into the doc. Thank you!

proposals/new-ipfs-gc.md

BigLep · 2021-02-23T21:26:07Z

Associating a solution proposal: https://hackmd.io/bgU7ti6_THK5gAGcrAhoEw?both

BigLep

Is there any metrics around this so we can show the before and after effect? I assume for each GC run that we'd want to emit:

How long it took
How many objects we traversed
How many were gc'd
(I assume there is really good insight from languages that do auto GC that we could draw from for this like the JVM)

If we need to implement some metrics, should we maybe implement it first to capture the current state on PL gateways and then after we do the improvements be able to show the impact.

BigLep · 2021-02-24T18:12:52Z

proposals/new-ipfs-gc.md

+
+## Purpose and Impact
+
+### Background and Intent


I think more anecdotes/justification from the conversation in https://protocollabs.slack.com/archives/G01KZD3FETY/p1613779249428300 can be added. The transitive impact that this has on NFTs is important as we don't want to hamper that usecase.

This link did not work for me, but I added it to related discussions below

This link did not work for me, but I added it to the "Related discussions" section anyway

I accidentally approved

BigLep

Thanks for putting this together!

proposals/new-ipfs-gc.md

BigLep · 2021-02-24T18:24:30Z

proposals/new-ipfs-gc.md

+- Maintaining reference counts for blocks will make it quick to determine if a block can be removed, because it is not necessary to load all pinned CID into memory and search this.
+- Specific content can be removed by performing GC on the blocks associated with a IPLD DAG identified by CID.
+
+### What does done look like?


Related to other comments about metrics, should done also include metrics about GC time and the time it takes to pin/unpin now given the reference counting?

I don't think it makes more sense to have time metrics for pinning, because the pinning process may also need to fetch content from the network as it is being pinned.

I don't think that metrics for unpin become any more important, as this should be equivalent to walking a dag whose.

proposals/new-ipfs-gc.md

BigLep · 2021-02-24T18:25:55Z

proposals/new-ipfs-gc.md

+- Removing pins will be slower because it requires walking DAG to decrement reference counts.
+- Adding items to MFS and removing items from MFS will be slower due to needing to update reference counts.
+- Keeping reference counts will consume for storage space


Will we have metrics for all of these? I think it would be good to add the instrumentation before the improvements so we can compare before/after.

I think that eventually there should be metrics for this, but probably not for the initila implementation. Reference counts will be stored in a specific location and a user should be able to see how much space is used on disk in this location.

BigLep · 2021-02-24T18:28:10Z

proposals/new-ipfs-gc.md

+The project could fail, or fail to be deliverable in the proposed time if reference counting incurs unforseen complexity in its implementation or changes required to other subsystems.  The risk of the will be mitigated by early review of reference counting design and prototyping.
+
+### Alternatives
+There does not currently appear to be alternate ways to solve these issues.  There may be tools that are better suited to implementing portions of GC, such as a specific key-value store that is better suited to managing reference counts


I know we'll have folks from the Stewards team take a look at the design but are there others at PL or in the PL community that have expertise in "GC in practice"? Is there a channel we can post to see? For example, if there is anyone in our midst who worked on tuning GC at Sun/Oracle, they likely have valuable high-level guidance on doing efficient GC in practice.

I have reached out to some people already, and names can be discovered by looking at previous work here: ipfs/kubo#7752

I think reposting the design on the ipfs slack channel, and inviting people to early reviews of PRs will be good.

Do share the design doc in our internal #engineering channel.

warpfork · 2021-02-24T21:13:55Z

I wrote up some thoughts on other approaches to GC (n.b., first draft warning): https://hackmd.io/0T8z0ex8RWmI8BmFNPSafQ

The draft idea in that document is a GC approach, but not a reference counting one.

I'm fairly dubious of reference counting approaches for a disk-backed GC. Reference counting requires a strong degree of consistency, and a great deal of mutexing, which makes it both relatively fragile and high cost at runtime. Metrics evaluated on a full implementation would of course shed more light than my bets, but I'm just not optimistic on that whole angle of approach.

warpfork · 2021-02-24T21:45:23Z

I like that this issue also touches on the user story of wanting to actively remove specific content. However, I wonder if that's directly connected to GC. I think we could provide direct remedies for that user story without involving GC at all.

(Although on second thought, active targeted removal would actually be connecting and somewhat more difficult in a reference counting world -- because active removal would have to trigger walks to make sure orphaned subgraphs aren't left behind. Orphaned subgraphs with incorrectly undecremented refcounts which would become effectively unremediable in a refcount world.)

gammazero · 2021-02-25T02:44:45Z

@warpfork I read your thought on an approach to GC, linked it in the pitch doc, and suggested we at least look at a prototype for comparison.

I generally agree with your feelings about reference count resource management, but an advantage with the reference counting is that it should work especially well for Pinata's use case: The only need GC for removing content that has become unpinned. The reference-counted implementation allows GC to walk the DAG for only the unpinned content and remove the associated blocks without loading all the other pinned content.

RubenKelevra · 2021-03-02T03:49:32Z

This is very interesting, as it's touching some points I noted a while back here.

But additionally to "just" count references my note includes tracking of accesses and access time in a compressed combined metric for blocks stored which are not pinned.

Throwing all unpinned blocks away makes sense for a provider of storage, but not on end-user clients, gateways etc. which can anticipate to a degree on the past which blocks might be useful for the future.

I haven't had time yet to work on it, sadly, because of pandemic reasons - but that's still the plan.

gammazero · 2021-03-02T07:40:14Z

@RubenKelevra I agree it would be very useful to be able to be more discriminating about what unpinned content gets removed. That would provide cache management behavior. There are a number of complexities with efficiently maintaining and accessing LRU/LFU data, and the referencing counting approach is intended to be a hopefully simpler solution to address a specific need. That is, a way for GC to determine if a block is pinned or in MFS without having to load and search a set of all pinned blocks.

The immediate goal is to provide faster bulk GC, and more importantly, provide very fast removal of specific content (remove blocks from unpinned DAG). The key feature of ref-counting is that it gives O(blocks-in-dag) performance without having to load other content into memory. The ref-counting also does not incur any overhead for most data access, since it is only used during pinning/unpinning, MFS add/remove, and GC.

More sophisticated data eviction/retention strategies are not precluded by referencing-counting and should be explored.

RubenKelevra · 2021-03-02T08:41:04Z

@RubenKelevra I agree it would be very useful to be able to be more discriminating about what unpinned content gets removed. That would provide cache management behavior. There are a number of complexities with efficiently maintaining and accessing LRU/LFU data, and the referencing counting approach is intended to be a hopefully simpler solution to address a specific need. That is, a way for GC to determine if a block is pinned or in MFS without having to load and search a set of all pinned blocks.

I'm just referencing my note, since I planned to do both in one approach - since it's basically the same problem with:

the needs to store an intention log of pinning/unpinning operations
storing a database in memory with a value for each block
writing the database to disk in larger chunks
background analysis of pinning/unpinning operations
time based locking to avoid to slow down any operation by blocking it

The advantage of combining both cache retention strategy and reference counting is

lower total complexity
dead locks/data loss gets less likely
more code can be shared without refactoring
much lower memory consumption

The immediate goal is to provide faster bulk GC, and more importantly, provide very fast removal of specific content (remove blocks from unpinned DAG). The key feature of ref-counting is that it gives O(blocks-in-dag) performance without having to load other content into memory. The ref-counting also does not incur any overhead for most data access, since it is only used during pinning/unpinning, MFS add/remove, and GC.

I don't think bulk-GC is the way to go, but instead just have a tight limitation of the amount of stuff stored in the unpinned area, like "use only 1% of the free storage for unpinned stuff"

This allows for easier locking since we can do some assumptions which helps us to avoid having to lock the whole datastore - roughly:

If a block gets added we put it on a list with a timestamp.
If a pinning operation starts, we put it on a list of active operations with a timestamp.
If a block in this list has an timestamp which is older than the oldest running pinning operation -> move to cache area since it can be safely discarded.

If a pinning operation calls for a block in the cache area, just add a reference and put it in the "pinned" list - but there was always zero guarantee that a block stays in the datastore when it's not pinned. Those assumptions are safe and avoid any locks on the datastore.

So the cleanup would run continuously as background task flagging the oldest and less accessed blocks for removal, not as bulk-cleanup.

olizilla · 2021-03-02T16:28:52Z

On behalf of the ipfs.io gateways I implore this endevour to consider an LRU eviction strategy as the default for the IPFS repo. Framing this work as a "cache retention strategy" as @RubenKelevra points to, rather than a "garbage collection improvement". Unpinned blocks in IPFS are not unreachable objects in the traditional GC sense of things, and are much closer to "what files should my CDN keep hot". We have limited bandwidth so should avoid scope creep, but that is also why it's even more important to get the framing of this project right. If we make GC less painful it will be a while before we get time to focus on making it what we need.

A relevant paper came up from the first ResNetLab event that is worth a read Caching the Internet: A View from a Global Multi-tenant CDN

In this paper, we explore the efficacy of popular caching policies in a large-scale, global, multi-tenant CDN. We examine the client behaviors observed in a network of over 125 high-capacity Points of Presence (PoPs). Using production data from the Edgecast CDN, we show that for such a large-scale and diverse use case, simpler caching policies dominate. We find that LRU offers the best compromise between hit-rate and disk I/O, providing 60% fewer writes than FIFO, while maintaining high hit-rates. We further observe that at disk sizes used in a large-scale CDN, LRU performs on par with complex polices like S4LRU.
– https://link.springer.com/chapter/10.1007%2F978-3-030-15986-3_5

RubenKelevra · 2021-03-02T17:34:08Z

@olizilla thanks for the paper, will check it out.

My idea is to balance access numbers with time the object stayed in the cache. In short a number will be increased on each access and decreased after a fixed duration.

This way we avoid cache trashing if many objects get accessed for a short while, but afterwards the access pattern changes back.

It's also important to have a rough idea how large a block is, so we know without accessing the datastore how far we've come to meet our target on clearing space.

Additionally it makes sense to measure the amount of bytes which gets added to the datastore and send this information to a process which can delete old cache objects from the datastore in the same rate. This allows us to maintain a constant fill level on the datastore with minimal memory usage.

The datastructure in my note would squeeze this all in a 32 bit integer as additional metadata for each checksum.

While there's probably quite some overhead to put some trees structures for finding the lowest number in the cache section and a specific hash in the reference counter section the total memory consumption should be pretty minimal.

RubenKelevra · 2021-03-02T18:44:16Z

Well, I don't know I don't see why LRU should be better.

The most similar algorithms to my note would be ARC (which is used in ZFS, but is patented) and CAR.

Here's a paper comparing them to LRU:

https://www.usenix.org/conference/fast-04/car-clock-adaptive-replacement

The File: /ipfs/Qmaksm3FYJupAzkZK6EPpaxjyc6zocJvzufyo6AFSMkRBm

If the cache is too small for the workload, LRU is basically useless (cache trashing occurs), while ARC/CAR outperforms it with 3-4 times more hits.

gammazero · 2021-03-02T19:42:50Z

@olizilla @RubenKelevra I want to be clear that, yes, we do want a cache management approach to garbage collection eventually. It should work in harmony with reference counting. Reference counting is solving the specific problem of needing to know if blocks can be removed, without having to load and search other pinned/mfs content. It addresses a situation that is severely impacting pinning service providers.

Gateways, and many general users, have a definite need for cache management. In summary, that means removing blocks that have a lower calculated probability of reuse than others that are competing for storage space (regardless of algorithms used LRU, LFU, ARC, 2-queue, etc.). I am treating that (which blocks to delete first/incrementally) as a separate independent problem from the problem that ref-counting addresses (efficiently determine if a block can be removed).

A cache management strategy should be a separate proposal/project. If it does happen to share some part of implementation with referencing counting, that is an implementation detail and does not mean these are the same problem. There are existing designs for a cache management approach to GC, and these may be useful/informative for creating that solution.

RubenKelevra · 2021-03-09T20:13:34Z

@gammazero I agree that they are somewhat separate from the purpose.

But there's some advantage of having the GC holding two lists: Pinned (or saved in the MFS) and unpinned (not saved in the MFS) blocks. This allows for much quicker removal from the datastore since it can just randomly drop blocks from the "unpinned" list until the list is empty or the storage requirements are met again. If you add a size approximation of each block to the CID you could probably even drop blocks without having to request any additional information from the database and just issue removal requests for the CIDs.

While the CIDs in the refcount data structure needs to be accessible via the CID, so a b-tree is probably required, the CIDs in the unpinned list just need an efficient data structure to save memory. This way you would just drop the oldest entries added, like FIFO.

mikeal · 2021-03-23T21:09:42Z

would like to see a review approval from @Stebalien on this one since he’s talked a bit about this already.

Stebalien

In terms of a "we should work on this proposal", this is fine. But can we link to the GC system we discussed?

cc @gammazero

momack2 · 2021-04-21T02:37:20Z

I don't see this obviously linked here - so bumping the updated design doc on GC implementation based on @gammazero's great research and prototyping: https://hackmd.io/bgU7ti6_THK5gAGcrAhoEw

IPFS Garbage Collection v2

465f7f1

momack2 reviewed Feb 17, 2021

View reviewed changes

momack2 added this to New Proposals in Project Proposals Feb 18, 2021

BigLep previously approved these changes Feb 24, 2021

View reviewed changes

BigLep self-requested a review February 24, 2021 18:18

BigLep reviewed Feb 24, 2021

View reviewed changes

gammazero added 3 commits February 24, 2021 13:54

Update new-ipfs-gc.md

616f85d

Update new-ipfs-gc.md

4a32d84

Update new-ipfs-gc.md

771c5a1

mikeal added this to Needs Review in Project Pipeline Mar 15, 2021

mikeal added confidence:low Confidence rating is 5 or below. ease:low Ease rating is 5 or below. impact:low Impact rating is 5 or below. labels Mar 23, 2021

mikeal self-requested a review March 23, 2021 21:08

mikeal approved these changes Mar 23, 2021

View reviewed changes

mikeal requested a review from Stebalien March 23, 2021 21:10

Stebalien approved these changes Mar 24, 2021

View reviewed changes

mikeal moved this from Needs Review to Needs Owner in Project Pipeline Mar 25, 2021

mikeal moved this from Needs Owner to Grant in Project Pipeline Mar 31, 2021

lidel mentioned this pull request Mar 31, 2021

Files deleted via the UI are removed from MFS but not deleted from repo ipfs/ipfs-webui#1061

Open

BigLep added the Steward Priority Stewards priority project due to enabling us to move faster and/or safer. label Apr 19, 2021

Stebalien mentioned this pull request Jun 29, 2021

IPFS GC & Lotus Splitstore #129

Open

iand mentioned this pull request Apr 11, 2022

Add usage-aware garbage collection ipfs/kubo#8870

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IPFS Garbage Collection v2 #8

IPFS Garbage Collection v2 #8

gammazero commented Feb 16, 2021

momack2 left a comment

momack2 Feb 17, 2021

jnthnvctr Feb 19, 2021 •

edited

gammazero Feb 25, 2021

BigLep commented Feb 23, 2021

BigLep left a comment

BigLep Feb 24, 2021

gammazero Feb 24, 2021

gammazero Feb 25, 2021

BigLep left a comment

BigLep Feb 24, 2021

gammazero Feb 24, 2021

BigLep Feb 24, 2021

gammazero Feb 24, 2021

BigLep Feb 24, 2021

gammazero Feb 24, 2021

anorth Feb 25, 2021

warpfork commented Feb 24, 2021

warpfork commented Feb 24, 2021

gammazero commented Feb 25, 2021

RubenKelevra commented Mar 2, 2021

gammazero commented Mar 2, 2021

RubenKelevra commented Mar 2, 2021

olizilla commented Mar 2, 2021 •

edited

RubenKelevra commented Mar 2, 2021 •

edited

RubenKelevra commented Mar 2, 2021 •

edited

gammazero commented Mar 2, 2021

RubenKelevra commented Mar 9, 2021

mikeal commented Mar 23, 2021 •

edited

Stebalien left a comment

momack2 commented Apr 21, 2021

IPFS Garbage Collection v2 #8

Are you sure you want to change the base?

IPFS Garbage Collection v2 #8

Conversation

gammazero commented Feb 16, 2021

momack2 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnthnvctr Feb 19, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BigLep commented Feb 23, 2021

BigLep left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BigLep left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warpfork commented Feb 24, 2021

warpfork commented Feb 24, 2021

gammazero commented Feb 25, 2021

RubenKelevra commented Mar 2, 2021

gammazero commented Mar 2, 2021

RubenKelevra commented Mar 2, 2021

olizilla commented Mar 2, 2021 • edited

RubenKelevra commented Mar 2, 2021 • edited

RubenKelevra commented Mar 2, 2021 • edited

gammazero commented Mar 2, 2021

RubenKelevra commented Mar 9, 2021

mikeal commented Mar 23, 2021 • edited

Stebalien left a comment

Choose a reason for hiding this comment

momack2 commented Apr 21, 2021

jnthnvctr Feb 19, 2021 •

edited

olizilla commented Mar 2, 2021 •

edited

RubenKelevra commented Mar 2, 2021 •

edited

RubenKelevra commented Mar 2, 2021 •

edited

mikeal commented Mar 23, 2021 •

edited