Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a simple GC #71

samoht opened this issue Jun 26, 2014 · 3 comments


Copy link

@samoht samoht commented Jun 26, 2014

If we want to have Irmin running on real systems we need a story about data recollection:

  1. unreachable blocks in the store should be collected
  2. very old objects might need to be removed

For simplicity, 2. can be transformed into 1. by rebasing the history. A downside with this is that two rebased stores are not able to sync anymore (if they pruned different parts of their history). Solving 2. without transforming it into 1. means supporting partial fetch/push, which the Git protocol doesn't handle very well (not sure if it is a limitation of the git command-line or of the protocol itself, need to investigate a bit more).

@samoht samoht added the enhancement label Jul 21, 2014
@samoht samoht modified the milestone: Next Jan 5, 2015

This comment has been minimized.

Copy link
Member Author

@samoht samoht commented Feb 16, 2015

A first step would be to regularly call git gc in the Git backend ....


This comment has been minimized.

@samoht samoht referenced this issue Mar 1, 2016

This comment has been minimized.

Copy link
Member Author

@samoht samoht commented Mar 7, 2017

On top of my head, there are still a few missing bits in the API before even starting to implement a GC:

  • Irmin.Type should have an explicit combinator for internal keys. This is useful for the GC to follow all the links, even the ones between leaf objects (to the point of view of Git)
  • AO should expose a delete operation. This should be atomic.

Once this is done, we will need to design a proper architecture for the GC. We can start by a "stop-the-world" GC where all concurrent operations are stopped while the GC is scanning the graph of objects (similar to git gc) but this is not really an option for interactive applications. Or we can implement a basic reference counting GC as the graph of object doesn't have cycles: in that case, where do we store the counters? The Git format doesn't have space for additional metadata, and how this would work if someone use the normal Git tool to edit the store?

@kayceesrk has probably more clever ideas, as usual :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
2 participants
You can’t perform that action at this time.