New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
setup_dependent_environment overwritten by module-defined external compiler #8632
Comments
* rename _version_yearlike to version_yearlike - needed in studio package * for spack#8632: debugging output in setup_dependent_package/setup_dependent_environment * custom TMPDIR for install.sh, to identify files for saving. When done, save installer log alongside silent.cfg. Absorb preserve_cfg() into install() so we need not bother to carry the mkdtemp name across functions.
At the moment the order is hard-coded (to load compiler modules after |
As I wrote:
The current Spack strategy will backfire for any omnibus development package that provides more than just a compiler. Today, perhaps only Parallel Studio fits that bill, but tomorrow other packages may come along that want to be all things to all people. With compilers loaded last (and in the form of modules), all such omnibus packages will silently overshadow alternative sub-packages that a user may legitimately want, as shown for Intel-MPI and MKL in the intro. I am aware that changing the order touches on perhaps deep architectural choices made for Spack that I don't have insight on. If the default order is not changed, a configurable order provides at least a way out, but it puts the burden on the user and it could may make issue replications tricky. |
The order of compiler vs. module loads is not particularly difficult to change and I don't think there would be too much controversy around it. As an example, #8245 (which was closed in favor of #8346) changed the order. @alalazo do you see a major reason to maintain the current order? I know in #8245 you mention the line
but I think that was primarily because the environment cleaning actions like unsetting |
@mgsternberg @scheibelp The problem:
seems to be a generalization of #8346 (there we were only interested in
Would this solution work for you? Do you see major drawbacks? |
That's what I described by:
… and it certainly won't tackle the problem "with the fewest assumptions and least a priori knowledge". The relevant env. var names include (*) For goodness' sake, don't include that list in the hash definition of the compiler spec. |
I think you misunderstood my proposal (or I was not clear exposing it). What I want to do is to use the knowledge already in |
Sorry, yes, I missed the core idea. I commented at #8688, which I fear is going too far. What about the idea of deferring the |
I am not opposed to it, though it may come with its own issues (an incompatibility like
Thinking about it: we already have a factory for environment modifications called The added value is that if all the modifications are in the same object they can be analyzed together. It won't magically solve the issue, but at least in your case you would have seen a message like: ==> Warning: Suspicious requests to set or unset 'CC' found
==> Warning: env.set('CC', join_path(link_dir, compiler.link_paths['cc'])) at /git/spack/lib/spack/spack/build_environment.py:146
==> Warning: ---> spack_env.set('CC', spack_cc) at /git/spack/var/spack/repos/builtin/packages/gdbm/package.py:49 pointing to a clear incompatibility among the two requests. |
If I understand correctly, the application of env. modifications in #8245 is already in the right order, done after the compiler setup at the very last step of
I'd say #8245 needs to be reconsidered. |
Problem
In Spack, a compiler definition that uses a
module:
token can interfere with or nullify any environment variable defined bysetup_dependent_environment()
orsetup_dependent_package()
of a package upstream in the DAG.Question
How does one get around this, with the fewest assumptions and least a priori knowledge?
Can one defer calling the
setup_dependent_*
hooks until after the compiler is loaded?If I'm not mistaken, that's the canonical way of doing things outside of Spack. I can see why Spack would load compilers last, perhaps to largely (though not entirely) guarantee a consistent compiler environment?
Blacklist "known" variables for
module load
in the specific situation of loading a compiler module? – I think that could only cure specific symptoms.Have the user modify the module that defines the compiler? – Ditto for symptoms-only, and I'd say that's impractical for many users.
On my own cluster, that's what I had done, if unrelated to Spack: I create modulefiles for
intel-parallel-studio
by a script frompsxevars.sh
but excised any MPI variables and paths and spirits those away into a separateimpi
package. This avoided a hit from the main example that follows:Main Example: I_MPI_ROOT
The present issue crystallized from the root cause for the failure in issue #8410, which concerns
$I_MPI_ROOT
, defined by bothintel-parallel-studio
(any edition) andintel-mpi
.Consider the scenario where a user runs:
and where the user has in
compilers.yaml
:Then, there are two failure modes:
Failure mode 1: MPI Runtime vs. MPI Development version
When
intel-parallel-studio
is@composer.*
or@professional.*
, it includes an MPI runtime, and therefore setsI_MPI_ROOT
. This value supersedes any specific value or the internal default if unset for any prerequisite^intel-mpi
version.When compiling a client package,
mpicc
etc. fromintel-mpi
will error out due to missing static libs only found under$I_MPI_ROOT
ofintel-mpi
, but not that ofintel-parallel-studio
. Sadly, that information is buried inconfig.log
. The user sees the rather more dramatic message:Failure mode 2: Silent version overshadowing
When
intel-parallel-studio
is@cluster.*
, a version mismatch occurs for MPI components: In the eyes of the user and Spack, the client package is compiled withintel-mpi@foo
, whereasmpicc
(which is fromintel-mpi@foo
) will take its libs etc. fromintel-parallel-studio
.Disconcertingly, this will occur silently.
Now, a user would not typically need or use
^intel-mpi
when one has%intel-parallel-studio@cluster.*
, but Spack will let you do it. It may not be clear to a new user that it's typically not needed, and it's bad for advanced users who specifically choose a different MPI version.Other affected packages: MKL, …
The same is true for the other
intel-foo
library-style packages, of whichintel-mkl
is most critical.The MKL is provided by all studio versions, so failure mode 1. does not occur, but failure mode 2. might, especially when the user has no access to newer compilers but does for the MKL, which has been available under a "simplified" (no-cost) license since 2017.
Variable tracing
Here's a more detailed tracing of the MPI variables for the case discussed in #8410 :
Gives (formatted for readability):
.. and fails with:
(formatted for readability):
Location where
I_MPI_ROOT
is usedThis occurs right on top of
mpiicc
:namely,
The lib missing in #8410 is specifically mentioned as well; it is available as static lib only, and is linked explicitly even for shared linkage:
The text was updated successfully, but these errors were encountered: