-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Accelerate (and simplify) buildcache relocation #32583
Conversation
854a1db
to
691a573
Compare
Of course @haampie has a fix already. 👍 #27610 is definitely more defensive than this approach, but AFAICT it doesn't provide much practical benefit for the added complexity (the resulting RPATHs have less slashes... but that's it). And it is significantly slower than this approach (see below). Feel free to correct me if I'm missing something. It could very well be that the speed issue can be fixed with a bit more optimization, if so I'm all for doing it "right" with the defensive approach. I'm curious for both PRs whether using direct Quick performance comparison (using #32136 to remove download times), baseline: $ time spack install --cache-only -f llvm.json
...
real 6m19.236s
user 3m1.719s
sys 1m31.231s With #27610: $ time spack install --cache-only -f llvm.json
...
real 4m16.527s
user 2m40.794s
sys 0m48.068s With this PR: $ time spack install --cache-only -f llvm.json
...
real 2m52.447s
user 1m55.706s
sys 0m16.172s |
@spackbot run pipeline |
I've started that pipeline for you! |
691a573
to
16a20a1
Compare
The GIL tends to get in the way of actually exploiting the parallelism here, using a process-based pool separates the address spaces enough that it's no longer an issue.
Not all packages need a lot of relocations after all
3b1a5c0
to
d0ef697
Compare
Squashed and rebased, and rewrote the OP to give a bit more explanation. Updated benchmark results (installing LLVM from cache, not including dependencies):
|
@blue42u the idea is nice, but I think we do want to have human-readable rpaths, so that In the meantime:
I find it hard to believe that #27610 would result in such a big overhead though 🤔 it should be touching the absolute minimum number of bytes per binary. I think what happens is this: you're also dropping the serial step of #33608 in this PR but not in the patchelf pr, which is as far as I can see the bottleneck, so it's not really apples to apples. FWIW: for me LLVM using current develop + #27610 results in Only 10% is spent on relocation:
So, it'd be nice to still get #27610 merged, and then focus on using zstd instead of gzip. That would make LLVM install in ~10s instead (including decompression, excluding download time) Can you run your |
I finally found some time to check, this PR indeed no longer gives a performance improvement in the latest |
REWRITTEN FOR BETTER EXPLAINATION
Unpacking and relocating packages that come out of a buildcache is a significant bottleneck for CI jobs since they always start from a clean install. This PR speeds up the relocation step, for an overall speedup of ~2.9x installing LLVM (not including dependencies) from cache. With this, the main bottleneck is now the unpacking step (
tarfile.extractall
).The main culprit is
patchelf
, which is very slow for unknown reasons. #27610 reimplements a limited feature set ofpatchelf
in native Python. This PR takes the more aggressive (and faster, and simpler) approach of, "just use string replacement!"This approach is already implemented in
relocate_text_bin
to support buildcaches created with-a/--allow-root
(i.e. fromspack ci
), so this patch primarily removes thepatchelf
pass (relocate_elf_binaries
). This works (and works well) because:relocate_text_bin
will recognize and replace it just fine.relocate_text_bin
inserts padding made up ofos.sep
to fill in the gap,patchelf
does not add padding. RPATH/RUNPATH is only used when loading binaries and the path-manipulation code is highly optimized, performance will not be hurt.bytes.replace
method should always be possible without running out of memory.Additionally, this PR optimizes
relocate_text_bin
to use a process-Pool
instead of aThreadPool
for concurrency. This avoids serialization from GIL contention between threads.We (the HPCToolkit folks) have been using this patch in our own CI for a few weeks now with no known issues, so I think this is safe to consider more seriously.
Known issues and concerns:
patchelf
in this case, which is the primary reason theclingo
checks fail. This could be avoided by fixing No padding in rpaths? alalazo/spack-bootstrap-mirrors#10.--force-rpath
topatchelf
to ensure RPATHs were used (instead of RUNPATHs). This PR removes that logic, instead using whichever the buildcache was generated with. Fixing this requires minimally parsing the ELF which will slow down the relocation. Alternatively, this could become almost moot with Experimental binding of shared ELF libraries #31948.relocate_text_bin
code path has a latent bug where, if the projection in the "new" case differs from the original, it may not relocate properly (the layout root is replaced but not the projection suffix). Using it for RPATHs makes this bug much more obvious.