Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Accelerate (and simplify) buildcache relocation #32583

Closed
wants to merge 3 commits into from

Conversation

blue42u
Copy link
Contributor

@blue42u blue42u commented Sep 9, 2022

REWRITTEN FOR BETTER EXPLAINATION

Unpacking and relocating packages that come out of a buildcache is a significant bottleneck for CI jobs since they always start from a clean install. This PR speeds up the relocation step, for an overall speedup of ~2.9x installing LLVM (not including dependencies) from cache. With this, the main bottleneck is now the unpacking step (tarfile.extractall).

The main culprit is patchelf, which is very slow for unknown reasons. #27610 reimplements a limited feature set of patchelf in native Python. This PR takes the more aggressive (and faster, and simpler) approach of, "just use string replacement!"

This approach is already implemented in relocate_text_bin to support buildcaches created with -a/--allow-root (i.e. from spack ci), so this patch primarily removes the patchelf pass (relocate_elf_binaries). This works (and works well) because:

  • The RPATH/RUNPATH is always stored as a null-terminated strings, just like a path in the bss would be stored. No special handling is required by the ELF format, relocate_text_bin will recognize and replace it just fine.
  • relocate_text_bin inserts padding made up of os.sep to fill in the gap, patchelf does not add padding. RPATH/RUNPATH is only used when loading binaries and the path-manipulation code is highly optimized, performance will not be hurt.
  • Binaries need to be fit in the system memory to be usable in practice. Loading the entire file and using Python's optimized native bytes.replace method should always be possible without running out of memory.

Additionally, this PR optimizes relocate_text_bin to use a process-Pool instead of a ThreadPool for concurrency. This avoids serialization from GIL contention between threads.

We (the HPCToolkit folks) have been using this patch in our own CI for a few weeks now with no known issues, so I think this is safe to consider more seriously.

Known issues and concerns:

  • The original implementation allowed a case where the new prefix could be longer than the old prefix, if the prefix was only used in the RPATHs and nowhere else. This new implementation does not (currently) fall back to patchelf in this case, which is the primary reason the clingo checks fail. This could be avoided by fixing No padding in rpaths? alalazo/spack-bootstrap-mirrors#10.
  • The original implementation passed --force-rpath to patchelf to ensure RPATHs were used (instead of RUNPATHs). This PR removes that logic, instead using whichever the buildcache was generated with. Fixing this requires minimally parsing the ELF which will slow down the relocation. Alternatively, this could become almost moot with Experimental binding of shared ELF libraries #31948.
  • The relocate_text_bin code path has a latent bug where, if the projection in the "new" case differs from the original, it may not relocate properly (the layout root is replaced but not the projection suffix). Using it for RPATHs makes this bug much more obvious.

@spackbot-app spackbot-app bot added binary-packages core PR affects Spack core functionality gitlab Issues related to gitlab integration labels Sep 9, 2022
@alalazo
Copy link
Member

alalazo commented Sep 9, 2022

See #27610 for a more defensive implementation of the same idea. I'll add @haampie as a reviewer here.

@alalazo alalazo requested a review from haampie September 9, 2022 07:56
@blue42u
Copy link
Contributor Author

blue42u commented Sep 9, 2022

Of course @haampie has a fix already. 👍

#27610 is definitely more defensive than this approach, but AFAICT it doesn't provide much practical benefit for the added complexity (the resulting RPATHs have less slashes... but that's it). And it is significantly slower than this approach (see below). Feel free to correct me if I'm missing something.

It could very well be that the speed issue can be fixed with a bit more optimization, if so I'm all for doing it "right" with the defensive approach. I'm curious for both PRs whether using direct bytearray-like access through an mmap.mmap would provide a noticeable performance boost.


Quick performance comparison (using #32136 to remove download times), baseline:

$ time spack install --cache-only -f llvm.json
...
real    6m19.236s
user    3m1.719s
sys     1m31.231s

With #27610:

$ time spack install --cache-only -f llvm.json
...
real    4m16.527s
user    2m40.794s
sys     0m48.068s

With this PR:

$ time spack install --cache-only -f llvm.json
...
real    2m52.447s
user    1m55.706s
sys     0m16.172s

@scottwittenburg
Copy link
Contributor

@spackbot run pipeline

@spackbot-app
Copy link

spackbot-app bot commented Sep 9, 2022

I've started that pipeline for you!

The GIL tends to get in the way of actually exploiting the parallelism here,
using a process-based pool separates the address spaces enough that it's no
longer an issue.
Not all packages need a lot of relocations after all
@blue42u blue42u changed the title [WIP] Silly idea to speed up relocation [WIP] Accelerate (and simplify) buildcache relocation Oct 12, 2022
@blue42u
Copy link
Contributor Author

blue42u commented Oct 12, 2022

Squashed and rebased, and rewrote the OP to give a bit more explanation.

Updated benchmark results (installing LLVM from cache, not including dependencies):

x Baseline (develop)
+ This PR
    N           Min           Max        Median           Avg        Stddev
x   5        229.04       242.408       237.759      236.8616     5.3901121
+   5        75.456        86.747        81.264       81.3248     4.6657018

@haampie
Copy link
Member

haampie commented Oct 31, 2022

@blue42u the idea is nice, but I think we do want to have human-readable rpaths, so that ldd and friends do not dump padded size * #dependencies bytes of mostly /'es per library, it'd be very messy.

In the meantime:

  1. _replace_prefix_bin performance improvements #33590 was merged to make the binary replacement a single pass
  2. Relocation regex single pass #33496 dropped text files that don't contain install dir paths from relocatable files on install (so we don't have to consider the majority of the files)
  3. binary_distribution: compress level 9 -> 6 #33513 made creating tarballs faster
  4. Speed up relocation and fix bug #33608 improves performance again by dropping a redundant serial filter step.
  5. Relocation should take hardlinks into account #33460 dropped hardlinks so we only relocate 1 of all duplicates instead of all

I find it hard to believe that #27610 would result in such a big overhead though 🤔 it should be touching the absolute minimum number of bytes per binary. I think what happens is this: you're also dropping the serial step of #33608 in this PR but not in the patchelf pr, which is as far as I can see the bottleneck, so it's not really apples to apples.


FWIW: for me LLVM using current develop + #27610 results in 0m41.120s, the bottleneck right now is decompression (gzip 😠)

Only 10% is spent on relocation:

- rpaths: 0.16s
- links:  0.00s
- text:   0.01s
- bin:    4.56s

So, it'd be nice to still get #27610 merged, and then focus on using zstd instead of gzip. That would make LLVM install in ~10s instead (including decompression, excluding download time)


Can you run your $ time spack install --cache-only -f llvm.json again using current develop? If text relocation is slow, you should create a new tarball ;)

@blue42u
Copy link
Contributor Author

blue42u commented Nov 6, 2022

I finally found some time to check, this PR indeed no longer gives a performance improvement in the latest develop. Thanks for all the work!

@blue42u blue42u closed this Nov 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binary-packages core PR affects Spack core functionality gitlab Issues related to gitlab integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants