If we want to have Irmin running on real systems we need a story about data recollection:
unreachable blocks in the store should be collected
very old objects might need to be removed
For simplicity, 2. can be transformed into 1. by rebasing the history. A downside with this is that two rebased stores are not able to sync anymore (if they pruned different parts of their history). Solving 2. without transforming it into 1. means supporting partial fetch/push, which the Git protocol doesn't handle very well (not sure if it is a limitation of the git command-line or of the protocol itself, need to investigate a bit more).
The text was updated successfully, but these errors were encountered:
On top of my head, there are still a few missing bits in the API before even starting to implement a GC:
Irmin.Type should have an explicit combinator for internal keys. This is useful for the GC to follow all the links, even the ones between leaf objects (to the point of view of Git)
AO should expose a delete operation. This should be atomic.
Once this is done, we will need to design a proper architecture for the GC. We can start by a "stop-the-world" GC where all concurrent operations are stopped while the GC is scanning the graph of objects (similar to git gc) but this is not really an option for interactive applications. Or we can implement a basic reference counting GC as the graph of object doesn't have cycles: in that case, where do we store the counters? The Git format doesn't have space for additional metadata, and how this would work if someone use the normal Git tool to edit the store?
@kayceesrk has probably more clever ideas, as usual :-)