Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Garbage collection service/task for document views #486

Closed
sandreae opened this issue Aug 5, 2023 · 0 comments · Fixed by #496 or #500
Closed

Garbage collection service/task for document views #486

sandreae opened this issue Aug 5, 2023 · 0 comments · Fixed by #496 or #500

Comments

@sandreae
Copy link
Member

sandreae commented Aug 5, 2023

We need this for blobs but the same mechanism will also be used for cleaning up stale views for any document types. So it's worth discussing this in general, not only refering to blobs.

This issue only relates to garbage collecting document views, not entries/operations/documents, this is slightly different concern and should be discussed elsewhere.

We want a garbage collection service to remove views which are no longer required. A view is no longer required when both of the following are true:

  • they aren't the "current view" of any document
  • they aren't the target of a pinned relation from another document

I can see two ways to implement garbage collection:

  1. a stand-alone service which periodically removes any documents from the database which match the above conditions
  2. a "garbage_collection" task in the materialization service which is issued whenever a "reduce" task completed. The input for the task would be a specific document, the task itself would check for the above conditions and clean up the old views for that document, and likely issue more "garbage_collection" tasks for any pinned relations which may now also need garbage collecting

I like both approaches, the second is more matching our current materialisation task patterns, but the first seems less complex in some ways. Don't know which would be more computationally efficient....

One potential challenge I see is dealing with cyclical relations eg. A1 -> B1 -> A1 if these were all document views containing pinned relations, they would never be garbage collected even if they weren't wanted anymore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant