-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defer file reading until package build time #107
Comments
@drahnr @cmeister2 Does this sound OK to you? Can you see any problems with the idea? |
My main issue with the outlined approach, is the fact that we're not able to provide a reader but make it internal to the We don't necessarily have to read the files when passed to |
Sure, I'm not tied strongly to specific implementation details, so long as we're not storing the entire contents of files in buffers in the builders. Could you elaborate on this aspect, I'm not sure I understand what you mean.
|
My assumption was you wanted to use |
If by internally you mean on the builder struct then no, I just want to store the path, so that
|
No immediate issue with the design. If I squinted enough with a security hat on I could potentially see an issue with storing a path and then reading from that path at a later time, if there was a sufficient gap between storing and then reading (and if that file can be replaced in the meantime), but I don't think that's a sufficient worry. |
That's what I meant, you store the path, but then you already looked into using |
In theory you would have a (Separately) I'm still not sure I appreciate the point of async for file IO though.
On point 2, I don't have hard data on this, and I should probably do some profiling so that I'm basing this on evidence. I just want to call out that the tokio docs suggest not doing anything compute-bound inside an async runtime, and I have a feeling this may qualify. |
My main concern is the fact hardcoding I also believe, we should consider splitting the sync and async builder to some extent, or provide a splitting function. |
with_file()
andwith_file_async()
currently read out the contents of the file into a buffer and persist them inside ofRPMFileEntry
structs, which are stored in the builder struct, until the package is built. This is simple and convenient but terribly inefficient, as all of the the uncompressed file contents will be stored in memory. Additionally this API allows little possibility of parallelism when building a single package - since each package can be dealing with dozens, hundreds or thousands of files and only a few packages are generally built at a time, this is probably a bad trade.Instead of storing a collection of eagerly-processed
RPMFileEntry
s inside the builder, we should probably try to store only the filename andRPMFileOptions
, and process all of the files at once during build time, which would make it easier to parallelize using a threadpool or async primitives.This will allow the files to be read and written directly into the archive using smaller buffers, and we can calculate details like file metadata, digests, signatures and such at the same time.
The text was updated successfully, but these errors were encountered: