Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where and how to persist blobs? #479

Closed
sandreae opened this issue Aug 2, 2023 · 2 comments · Fixed by #493
Closed

Where and how to persist blobs? #479

sandreae opened this issue Aug 2, 2023 · 2 comments · Fixed by #493

Comments

@sandreae
Copy link
Member

sandreae commented Aug 2, 2023

The actual data of blobs published and replicated in pieces (max 256KB in size). These pieces need to be persisted for replication and serving the resultant blob files over a http server. I can see 2 possible solutions for the persistence of blobs+/pieces.

  1. Store pieces in the database: during replication we can retrieve the pieces easily from the database, when a request for the blob arrives at the http server we retrieve all the pieces and concatenate into the complete blob.
  2. Store pieces in the database and blobs on the filesystem: for replication pieces come from the database, for serving via http server the blobs come from the file system.

The downside of 2) is that the blobs are stored twice, taking up double the space on a device. The downside of 1) is that we need to handle the file serving logic ourselves, whereas if the files are directly on the system then we can use https://docs.rs/tower-http/latest/tower_http/services/struct.ServeDir.html.

@sandreae
Copy link
Member Author

sandreae commented Aug 2, 2023

Maybe there's an option 3:

  1. Store complete blobs on fs, don't store pieces in the database: during replication we'd need to break them up again into pieces....

@sandreae
Copy link
Member Author

sandreae commented Aug 2, 2023

Ok, it's likely option 2) we'll do, but to clarify, we'll store the blob pieces in their encoded form in the database (for easy replication) and then store the full blobs on the file system (for serving over http). This still means there is some duplication, but it seems worth it for saving on regularly encoding+/decoding blob data during replication.

@sandreae sandreae added this to the Blobs milestone Aug 17, 2023
@sandreae sandreae linked a pull request Aug 17, 2023 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant