Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Task for automatic garbage collection of unused documents and views #500

Merged
merged 52 commits into from
Aug 31, 2023

Conversation

sandreae
Copy link
Member

@sandreae sandreae commented Aug 12, 2023

i think this PR is made up of just 10% actual new code and 90% tests.....

I reverted changes implemented in #499 and #496 in order to combine them into one PR (this one) as pruning and purging documents can be handled more elegantly in a single task, now called "garbage_collection".

Introduces a new "garbage_collection" task to the materializer service. It is concerned with two different connected but discreet functions:

  1. Deleting document views: when a document view is no longer pinned by any document, and is no documents current view, it is deleted from the store.
  2. Purging blobs: when a blob document is no longer related to by any other document (pinned or unpinned relation) then it's views, operations, and entries can are deleted from the store

The purging only occurs for blobs, but this functionality could be re-used in the future to handle other kind of "dependent" documents which should only be kept alive when related to by other documents.

One point to note is that no rows are removed from the logs table as we still need to block off used log ids.

Next steps:

📋 Checklist

  • Add tests that cover your changes
  • Add this PR to the Unreleased section in CHANGELOG.md
  • Link this PR to any issues it closes
  • New files contain a SPDX license header

@codecov
Copy link

codecov bot commented Aug 12, 2023

Codecov Report

Patch and project coverage have no change.

Comparison is base (71208cc) 92.05% compared to head (71208cc) 92.05%.

❗ Current head 71208cc differs from pull request most recent head 2c35f30. Consider uploading reports for the commit 2c35f30 to get more accurate results

Additional details and impacted files
@@             Coverage Diff              @@
##           development     #500   +/-   ##
============================================
  Coverage        92.05%   92.05%           
============================================
  Files              104      104           
  Lines            16629    16629           
============================================
  Hits             15308    15308           
  Misses            1321     1321           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

This was referenced Aug 12, 2023
@sandreae
Copy link
Member Author

More PostgreSQL woes.... this time it doesn't like how we do triggers, it wants us to define a function, but SQLite doesn't do functions....

@sandreae sandreae marked this pull request as ready for review August 15, 2023 14:51
@sandreae sandreae changed the title "garbage_collection" task "garbage_collection" materializer task Aug 15, 2023
@adzialocha adzialocha changed the title "garbage_collection" materializer task Task for automatic garbage collection of unused documents and views Aug 18, 2023
CHANGELOG.md Outdated Show resolved Hide resolved
aquadoggo/src/db/errors.rs Show resolved Hide resolved
aquadoggo/src/db/stores/document.rs Outdated Show resolved Hide resolved
@sandreae sandreae merged commit 6c5d477 into development Aug 31, 2023
10 checks passed
@sandreae sandreae deleted the garbage-collection branch August 31, 2023 18:04
adzialocha added a commit that referenced this pull request Sep 8, 2023
* development:
  Make sure `/tmp` directory does not run out of scope before application ends (#557)
  Integrate `Bytes` value (#554)
  Stream blob data in chunks to files to not occupy too much memory (#551)
  Blobs directory configuration (#549)
  Use correct MAX_BLOB_PIECE_LENGTH from p2panda_rs
  Build a byte buffer over paginated pieces when assembling blobs (#547)
  HTTP routes to serve files with correct content type and etag headers (#544)
  Task for automatic garbage collection of unused documents and views (#500)
  Refactor tmp blob dir creation after rebase
  Fix after rebase
  "blob" materializer task (#493)
  Add static file server to `http` service (#483)
  Enable deletion of dangling `document_views` and related `document_view_fields` from db  (#491)
  BlobStore for retrieving raw blob data from the db (#484)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants