Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Much faster symlinking, safer interactions with rustc #386

Closed
wants to merge 2 commits into from

Commits on Sep 16, 2023

  1. Much faster symlinking, safer interactions with rustc

    This MR represents a large performance improvement in practical terms in
    simple Crane situations.
    
    For some unscientific numbers, I tested in the following fashion.
    
    For each of the two versions of crane (master, this branch), and my simple
    closed source software project (it's 1kloc, about 170 crate deps, mostly just hyper):
    1. I ran `nix flake check` to ensure that any crate downloads were done,
       any supporting derivations complete.
    1. I then added a new crate (anyhow) and ran `nix flake check` again.
       This times the whole flow after a dependency is added.
    1. I then changed a constant and re-ran, simulating the usual flow.
    
    Because this project is relatively small, I would expect that this
    represents a 'worst case' scenario. For example, when uncompressed,
    it contains 500MB of dependencies, whereas another project I work on
    represents 3.6GB.
    
    The results were as follows:
    Before this PR:
    - build with new dep: 2m15s
    - build with new code: 1m19s
    After this PR:
    - build with new dep: 1m22s
    - build with new code: 32s
    
    In addition, it is more robust to crate rebuilds.
    
    How it works/why it's better:
    
    1. Drop the diffing behaviour when doing symlinking. This is an
       explicit tradeoff - if one is doing symlinking on inheritance,
       we would expect any duplicate data to be in the form of symlinks,
       for which diffing file content is unhelpful. Given that this only
       helps the case where we are not symlinking on inheritance, are
       not archiving on install, it seems reasonable for it to be
       potentially slower in this case. I say potentially slower since
       if we have target dirs of 1GB, we are trading 2GB of reads for
       up to 1GB fewer writes. I'd note here that Nix store optimisation
       will cover for space savings. But, main argument: common case should
       be archival or symlinking, and we can boost the performance of the
       common case by removing this behaviour.
    1. Instead, we build a `symlinks.tar` containing symlinks to the outputs
       of this derivation.
    1. When inheriting, instead of traversing the tree and creating
       symlinks, we just extract this tar. This is great beacuse it means
       that on both derivation end and derivation start, we avoid forking
       O(num files produced by cargo build) processes. Since even small
       projects have thousands of files emitted (my own has 2033 output
       files). Effectively, GNU tar is much more optimised than the
       pre-existing bash script.
    1. At this point, we still have the problem where rustc may try to write
       to a file. We use a `RUSTC_WRAPPER` to instead write to a temporary
       directory, and after the command is finished we copy the artifacts
       from the out dir back to the target location. There is a potential
       (small) slowdown caused by this - I observe cargo to use rustc's
       stderr to kick off new builds as soon as it can, and so had to
       capture rustc's stdout. However, this effect is most likely very
       minor.
    j-baker committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    8e0d6e4 View commit details
    Browse the repository at this point in the history
  2. api docs

    j-baker committed Sep 16, 2023
    Configuration menu
    Copy the full SHA
    835b90e View commit details
    Browse the repository at this point in the history