You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at the source for the Rust InputsPackager, which is responsible for packaging up all the inputs necessary for a single compile to ship to the build server:
And I realized that distributed Rust compiles are probably doing a lot of redundant work. Every time a Rust compilation is distributed sccache has to package up any files referenced on the commandline which includes all the rlib files for crates the current crate uses, shared libraries for proc macros, etc. For a single cargo invocation this likely means that the same files will get packaged up over and over again.
A nice optimization here would be to give the build server an API for a content-addressable store, where clients could query the server for a file hash and find out if the server already has it and upload files that the server doesn't have. The server could simply store them on disk similar to how the existing DiskCache works. Then instead of packaging up all compile inputs the client would hash them all (this happens as part of cache lookup anyway), ask the server whether it already has them, and upload any that are not already present. Presumably to optimize the process the server should provide an API that allows querying a list of hashes and returns two lists of hashes: those that are already present and those that are missing. When the build server goes to execute a compilation then instead of providing a tarball of all the inputs it would provide a list of mappings from path -> hash, and the server would take care of retrieving all of those files from its local cache and placing them at the desired paths in the build filesystem (this could likely be done with hardlinks to avoid copying). A further optimization would be to preemptively store any outputs of Rust compilation in the cache since they are likely to be used as inputs to another Rust compile.
The text was updated successfully, but these errors were encountered:
Yes, I have a side interest in content-addressible-stores (actually in rolling checksums, but they're mainly interesting in that context) and sometimes think about it since there's a multitude of use cases, including:
this issue (build inputs)
allowing build machines to collaborate on receiving toolchains from slow clients
permitting clients to retrieve just changed parts of outputs
Might be worth also looking at the "Remote Execution API" as it uses CAS, with the caveat that it may be optimised for different use-cases to sccache - #358 (comment)
I was looking at the source for the Rust
InputsPackager
, which is responsible for packaging up all the inputs necessary for a single compile to ship to the build server:sccache/src/compiler/rust.rs
Lines 1316 to 1462 in fc256ff
And I realized that distributed Rust compiles are probably doing a lot of redundant work. Every time a Rust compilation is distributed sccache has to package up any files referenced on the commandline which includes all the
rlib
files for crates the current crate uses, shared libraries for proc macros, etc. For a singlecargo
invocation this likely means that the same files will get packaged up over and over again.A nice optimization here would be to give the build server an API for a content-addressable store, where clients could query the server for a file hash and find out if the server already has it and upload files that the server doesn't have. The server could simply store them on disk similar to how the existing
DiskCache
works. Then instead of packaging up all compile inputs the client would hash them all (this happens as part of cache lookup anyway), ask the server whether it already has them, and upload any that are not already present. Presumably to optimize the process the server should provide an API that allows querying a list of hashes and returns two lists of hashes: those that are already present and those that are missing. When the build server goes to execute a compilation then instead of providing a tarball of all the inputs it would provide a list of mappings from path -> hash, and the server would take care of retrieving all of those files from its local cache and placing them at the desired paths in the build filesystem (this could likely be done with hardlinks to avoid copying). A further optimization would be to preemptively store any outputs of Rust compilation in the cache since they are likely to be used as inputs to another Rust compile.The text was updated successfully, but these errors were encountered: