-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Various binary cache improvements #34371
Comments
Note that Spack end-users generally have a
Note also that Spack end-users almost always concretize (more than) once in their normal operations. This already requires the startup cost to be paid to fully implement buildcache For the above users, paying the startup cost and using the mirror order optimization provides overall benefits which can be significant with moderate-large specs, multiple mirrors or high-latency "slow" mirrors. The only "other Spack users" for which paying this startup cost causes overall slowdowns are users who (a) don't have a persistent cache and (b) don't concretize during their operations. The known (and likely most common) case of this is Spack in CI, where clean-room containers are the norm. Specifically, the build jobs generated by The great irony here is that I implemented #32137 to speed up our own use of Spack in CI. And we reaped benefits, because for us 12 requests * 50 packages was significantly more expensive than fetching the However, since the worst-case scenario is indeed worse now that #32137 has landed, I have given my blessing to #34326 which reverts the behavior it adds that causes problems for Spack's CI. More of what has not really been addressed:
|
I was thinking something like this too. What (I think) that will also require is some way to get spack to run |
To add a note here - it used to work to add a filesystem mirror without |
Over the last week(s) there's been quite some discussion about binary caches.
This issue is meant to give an overview of discussions, and previous & current problems, and a suggested way forward.
The main problems that triggered these discussions:
s3://
URLs are very slow, let's say it takes 1-3 seconds per request. Compare this to ~150ms for the equivalenthttps://
URL for our spack-binaries bucket.spec.yaml
,spec.json
,spec.json.sig
This leads to a significant overhead to fetch binaries in CI, @blue42u reported a lot of time wasted (let's say about 30 minutes) just trying to fetch binaries from mirrors.
We already had a small optimization to reduce the number of requests:
However, the optimization has a bug: if the spec cannot be located in any local cache (which is either because none of the remotes have the spec at all, or because we don't have a local cache for the mirror), Spack would do a partial update of the cache. Partial in the sense that it would query each mirror if it had the spec by directly fetching the relevant spec.json files. So, in this case, the optimization is doing strictly damage: all mirrors are queried before starting a download; without the "optimization", Spack would simply stop at the first mirror where it can download from.
However, 5. only makes sense for terribly slow mirrors, since there is a high startup cost of fetching
index.json
with say a 100K specs. Slow mirrors are not the norm (e.g. file:// mirrors or mirrors on the local network with low latency), so this PR makes the Spack experience only worse for other Spack users. For fast mirrors, we'd really like to do direct fetches (and also use that fully offline mirror order optimization).In fact, we never had any issues with the https://mirror.spack.io URLs for sources, it would be absurd if Spack would first download an index of all sources available on mirror.spack.io so that it could use that when installing packages from sources.
What has not really been looked into is why these s3:// requests are so slow in the first place, and it turns out it's because of various trivial issues:
<failing url>/index.html
, this was fixed in Stop checking for {s3://path}/index.html #34325Next, what had not really been addressed:
spec.yaml
,spec.json
,spec.json.sig
, we can reduce this to one:spec.yaml
was deprecated, so it's removed in remove legcay yaml from buildcache fetch #34347spec.json.sig
extension, so we can just stick tospec.json
and have Spack peek into the file to see if it's signed or not, so I submitted binary cache: do not create separatespec.json.sig
files #34350. (The only problem here is that it's not forward compatible, it may need backporting to 0.19 if we're nice about it).When 6-8 are all addressed, I expect it would reduce the overhead (especially in the unhappy cache miss path) at least by a factor 10.
Going forward, I think the highest priority is to fix point 7.
Then we should ensure the mirror order optimization is always offline, which means partially reverting @blue42u's PR, and adding
index_only=True
in the relevant place where a spec is searched for.To make @blue42u happy, it could be useful to have a command
spack mirror update
(or something like that), that updates the local binary index if necessary, which he can then run beforespack install
in CI.The text was updated successfully, but these errors were encountered: