Replies: 6 comments 6 replies
-
Spack's concretization is still far from perfect (which deserves a separate discussion), so often I have to rebuild my environment from scratch (otherwise Spack insists that everything is installed and it has nothing to do). To reduce the build time, I keep a local buildcache - basically a directory on mounted file system. I don't want to use web/s3 for that, since it's easier to remove and update a local file while you are polishing a recipe. So, at least for me, the removal of directory-based buildcache will be bad. Also, directory-based buildcaches will be used for so called "online" systems, which reside in an air-gapped "technical" network - creating a simple HTTP server, let alone a S3 one, is problematic there. |
Beta Was this translation helpful? Give feedback.
-
@tgamblin Last week we discussed how "spackbot rebuild everything" should really prefer to consume dependency binaries built in the pipeline instead binaries from develop. And the approach we discussed was to simply place a higher priority on mirror precedence rather than always preferring signed json files. So kind of (though it isn't quite that simple) reversing the looping order here, a couple lines of code @blue42u singled out above as "blind handling for deprecated and un/signed buildcache layout formats". So for the "rebuild everything" use case, we are basically proposing that the unsigned binary built in the pipeline is preferable to the signed one in the Since I don't want "rebuild everything" to have to wait until everything here is addressed, I wonder if we shouldn't just go with the other alternative we discussed last week: just remove the main mirror from the configuration when we know we're doing a "rebuild everything" pipeline. |
Beta Was this translation helpful? Give feedback.
-
Does this mean that buildcaches will have to be homogenous? So no mixing of the old |
Beta Was this translation helpful? Give feedback.
-
While this assumption would make life a lot better (really, a huge amount, in my opinion), it's kind of hard for me to imagine in practice. Because a lot of mirrors are populated by pipelines where concurrent jobs would need to edit the index at one time, I don't see how it could work. From time to time, we've discussed making mirrors "smarter". Instead of brainless s3 buckets or directories in a file-system, if mirrors were REST endpoints (or similar), we could ensure with each push of a binary package, that the index reflects the true state of the mirror, at least in theory. As it is now though, how do we prevent the inevitable race conditions between parallel jobs updating the index simultaneously? Not to mention that with brainless mirrors, updating the index can take forever, because we have to iterate over all the items, fetching and ingesting each one into a database, before we can finally write the index back to the mirror. Don't get me wrong though, as I stated initially, I think it's a great goal to be able to assume the index is always up to date. |
Beta Was this translation helpful? Give feedback.
-
Do I understand correctly the above statement is aimed at making sure spack never accesses the contents of a mirror without all the context you propose should go along with that mirror? |
Beta Was this translation helpful? Give feedback.
-
fyi @zackgalbreath @kwryankrattiger for visibility and in case you are interested to participate in this discussion. |
Beta Was this translation helpful? Give feedback.
-
Buildcaches have a number of micro-issues (security and otherwise) that all stem from the same core design flaw, Spack allows no configuration for used buildcaches outside of their URL. Using Spack in CI often aggravates these issues. This is not sustainable as Spack continues to roll forward and encourage using buildcaches in more scenarios.
Observed issues
Buildcaches are fundamentally divorced from their signing keys. If any one buildcache is unsigned, every
spack install
that uses it needs to include--no-check-signature
. This significantly weakens security for the buildcaches that are signed and encourages highly insecure behavior in users.Buildcaches share a single set of signing keys, which are not pruned when buildcaches are removed from the
mirrors.yaml
. This means a "cracked" buildcache signing key could be used to maliciously inject arbitrary code via a user's current mirror set, even if they aren't using the compromised buildcache anymore. Although good keys are hard to crack, very few of us (users) are security professionals, trusting us to properly manage signing keys is not a good idea. As such, this is highly insecure behavior. (Debian deprecatedapt-key
for basically this reason.)Buildcaches are fundamentally divorced from their access credentials. For buildcaches hosted in the cloud (e.g. AWS S3) in particular, this meant all the buildcaches needed to share a single set of credentials. If one buildcache was private while others were public, the credentials would need to have access to all buckets, breaking the principle of least privilege. Multiple private buildcaches would need to all be accessible via a single set of access credentials. A user would be charged if a buildcache was maliciously converted to a "requester pays" bucket. This was highly insecure behavior and was fixed by Bug/fix credentials s3 buildcache update #31391, but since it deeply relates I bring it up again here.
Buildcaches blindly and aggressively probe URLs. In a practical pipeline this means thousands of requests, almost all of which result in 404s (see image below). AFAICT this expensive behavior is unnecessary, some likely relates to Spack's blind handling for deprecated and un/signed buildcache layouts. This is detrimental behavior that could be fixed with a modicum of configuration.
Proposed solution
Buildcaches should always be mirrors, and never a bare URL. Specifically:
mirrors.yaml
contains all the configuration needed to use the associated buildcache.1
might mean the current layout withindex.json
,index.json.hash
and*.spec.json
/*.spec.json.sig
/*.spack
files named by arch/package/hash.spack install
never probes for files outside of the configured layout, if files appear to be missingspack mirror update
(see below) will be recommended.spack install
assumes theindex.json
is up-to-date. This can be enabled on a per-mirror basis.spack mirror
provides user-facing commands to transparently manage buildcache subscriptions:spack mirror add
automagically fills in the buildcache configuration by probing the buildcache URL and downloading keys.spack mirror update
repeats the above and updates the configuration. If the configuration is ever found to be wrong during aspack install
(e.g. signature verification failure), this command is suggested as a solution.spack buildcache
andspack ci
always use mirror names instead of directories or URLs:spack buildcache keys
is removed (now handled byspack mirror (add|update)
.spack buildcache (add|create|update-index|...)
only accept a mirror name for their destination, URLs and directories are not allowed.spack ci generate
requires a destination mirror (i.e.--buildcache-destination
is required and uses a mirror name instead of a URL). This mirror is not added to the concrete environment by default (to prevent security leakage).spack ci rebuild
takes a destination mirror as it's only argument, which should (almost always) be added viaspack mirror add
in thebefore_script
.I think this could also significantly simplify
binary_distribution.py
, since it can pass around first-classMirror
objects instead of bare URLs.Beta Was this translation helpful? Give feedback.
All reactions