-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Separate Hashes for Install Directories vs. Modules #3513
Comments
I'm not sure if I'm correct in this, but I thought of the hashes as a unique identifier of the contents of an installation directory. In that case it would only need to hash all build-dependencies. The generated modules then would have to incorporate the link- and run-dependencies as well as the hashes of those dependencies. The output of Obviously I'm missing something important here...
I'm not sure if you really want to do this, spack was intended to provide reproducible results -> this would make reproducibility largely impossible, wouldn't it? |
I'm not sure if I'm correct in this, but I thought of the hashes as a
unique identifier of the contents of an installation directory. In that
case it would only need to hash all build-dependencies.
yes
The generated modules then would have to incorporate the link- and
run-dependencies as well as the hashes of those dependencies.
yes
The output of spack find should then only contain the latter sort of
hashes (effectively moving the install-hashes to be "internal use only").
yes. Although `spack find`, `spack load`, etc. and weak and should be
replaced with more useful ways of finding stuff that's been built (see
Spack Environments).
Obviously I'm missing something important here...
To really get it right, it seems we will need to give users control over
whether individual dependencies are/are not hashed, for each hash
algorithm. Sure, we can have defaults based on the deptype. But those
defaults will need to be overridden 5% of the time.
I think this is orthogonal to the issues you're dealing with, and it's a
somewhat minor detail. The idea is that in general, Spack would now need to
know whether each declared dependency (`depends_on()`) does or does not add
to (a) the install dir hash, (b) the module hash. I'm concluding that
users would probably have to have the option to declare this information
directly in `depends_on()`, rather than relying on the deptype to divine it
(although relying on deptype would probably work in most cases).
My problem is that I cannot (easily) load my installed module in a working
fashion, once I have multiple versions (including broken ones). If one
would want to reduce the number of builds I guess this would be a way.
Yes, that is exactly why `spack load` is irretrievably broken; and it is
orthogonal to these other issues. Spack Environments provide a way out of
that mess. Basically... you tell Spack what you're interested in as part
of a complete environment you want to use. Spack will then build it and
assemble the resulting packages either into a Spack View, or a bunch of
`module load` commands. Then, you never have to use `spack load` or go
through the ambiguity of more than one package installed.
https://github.com/LLNL/spack/wiki/Elizabeth%27s-Conceptual-Framework-for-Environments
For the time being, you might want to check out this #2698. It is a "poor
man's" Spack Environment, and is a bit arcane to use. Therefore, it will
change when we do the "real" Spack Environments. But it does get the job
done, and I rely on it.
|
@scheibelp This relates to your work on deptypes. |
Specifically that is referring to #2548. To update this thread with discussion from the telecon: #2548 was more focused on how deptypes affect environment setup for a build; IMO this is a different issue. My initial read-through of #3501 is that it is also a different issue: it started out potentially related to #2548 in that deptypes determine which modules are loaded; it then turned into a discussion of whether there should be support for changing deptypes in package.py without a reinstall. Regarding the concept of removing run dependencies (or rather run-only dependencies) from the hash, I think there may be special cases which make this difficult: what happens if top-level dependency requires an output format from a run dependency that changes for some new version? Or perhaps if newer versions of a run dependency add support for new commands? There does seem to be a certain constraints on run dependencies, although they are likely not as strict as for link dependencies (for example where variants are much more likely to alter compatibility). |
Closing the issue as stale. Feel free to reopen if you think something still needs to be discussed. |
@tgamblin @healther @adamjstewart
Spack currently creates install directories and modules, both based on a fully concretized spec (FCS). Install directories and modules are identified by hash; and the same hash algorithm is currently used to generate the hash used to identify both types of objects. Using the same hash for both seems intuitively correct. However, recent discussion on #3501 suggests that it might actually be incorrect; and that Spack might benefit from decoupling of the hash used to identify install directories vs. modules.
Hashes are useful for efficiently labelling two things as "same" or "possibly different." If two install directories have the same hash, then we can surmise that their contents are the same. Similarly, if a FCS-A hashes to hash-A, and Spack finds an install directory labelled with hash-A, then Spack can (and does) surmise that building FCS-A would result in a directory that is "the same" as the install directory it just found; and so it does not need to re-build. Note that our notion of "the same" has been left fuzzy; we care about functional equivalence, not byte-for-byte equivalence of every file in the install directory.
Hashes are not always perfect. It is OK if FCS-A and FCS-B hash to something different, even if their install directories are the same. That will result in unnecessary builds and annoyed users, but won't break anything. However, the converse is not OK. If FCS-A and FCS-B hash to the same thing, then their install directories must be the same. (This is why packaging systems like
pip
cause problems for Spack; they modify their install directory after Spack is done installing them).Now suppose fully concretized spec X involves a run dependency Y. Should Y be included in the hash for X? Looking at the install directory... If we accept for now that run dependencies do not affect the contents of the install directory, then clearly Y should not be included in the hash. BUT looking at modules... run dependencies do materially affect the contents of the generated module. Therefore, run dependencies do need to be included in the hash used to label the module.
Conclusion: To be fully correct, the hash algorithm used for modules and install directories need to be different. This of course might be user-unfriendly: it is convenient for them to be the same. But it seems that the simplest, "purest" system would have the two use different hashes. Maybe there's some clever way to hide this from the user. Or maybe the algorithm laid out in #3501 can be seen more simply with this understanding.
Unfortunately, it doesn't stop there. Once we start removing some dependencies from the install-dir or module hash, we will want to keep others. For example... build dependencies should be hashed (eg if they're a compiler), except when they shouldn't be (for example, if Bison is used, and the parser generated by any Bison version is functionally equivalent; or if the dependency is doxygen and we just don't want docs to affect the hash). Similarly, run dependencies shouldn't be hashed for install directories... except when they should be; maybe a full path to a run dependency snuck in there somehow.
To really get it right, it seems we will need to give users control over whether individual dependencies are/are not hashed, for each hash algorithm. Sure, we can have defaults based on the deptype. But those defaults will need to be overridden 5% of the time.
In the meantime.... without distinct hashes for modules vs. install directories, and without fine-grained control over what goes in the hash... Spack errs on the side of caution. It puts everything in the hash, and now and then annoys users with unnecessary rebuilds.
The text was updated successfully, but these errors were encountered: