Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a GC #6

Open
samoht opened this issue Mar 1, 2016 · 4 comments
Open

Implement a GC #6

samoht opened this issue Mar 1, 2016 · 4 comments

Comments

@samoht
Copy link
Member

samoht commented Mar 1, 2016

There is no GC in Irmin (we just use git gc for now). This needs to be fixed if we want to version control everything. Moreover, we need to understand which hooks need to be exposed by the storage backend so we can register the hooks that we need to have a high-performance GC.

See mirage/irmin#71 and http://lists.xenproject.org/archives/html/mirageos-devel/2015-10/msg00040.html

@samoht
Copy link
Member Author

samoht commented May 23, 2016

@kayceesrk you expressed interest about that. You are still very welcome to have a look at it :-)

@kayceesrk
Copy link

Yes. I was looking at this a few weeks ago, and I am still interested in doing this. I did have a few questions regarding this:

  • How/when do objects become unreachable in a typical git workflow?
  • What is the root set for the GC? Given that the data structures are persistent (I am imagining merge-queues for simplicitly and the fact that I understand how they work), there is always a way to reach objects in the past by checking out a previous commit.
  • In the case of persistent data structures, references are embedded in the object in an AO store. How does the GC distinguish between values and references?

It would help to have an example of a scenario where objects become unreachable.

@samoht
Copy link
Member Author

samoht commented May 23, 2016

I think there are various levels of complexity for that task.

  1. GC is done off-line, blobs do not contain pointers to other objects and roots are the Git references. Basically, it amounts of running git gc on process start (pretty easy) or re-implement something similar in Irmin (a bit more involved, especially if we are interested in the pack compression but doable)
  2. GC is done online, blobs do not contain pointers to other objects and roots are the Git references and temporary branches. Very similar to what described above, just need to be careful with locking -- also need to register temporary roots with the temporary (anonymous) branches.
  3. GC is done online, blobs can contain pointers to other objects and roots are the Git references and temporary branches. This need a change in Irmin datamodel, so probably API breakage and I suspect a bigger impact.

We need to ship 1. pretty soon, but I'll be interested to have PoC for 2. as well. 3. needs design discussion about API changes for Blobs.

@samoht
Copy link
Member Author

samoht commented May 23, 2016

Unreachability is usually done when people rebase/delete a branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants