Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better detection of the module command #6451

Closed
wants to merge 16 commits into from

Conversation

skosukhin
Copy link
Member

@skosukhin skosukhin commented Nov 24, 2017

Some improvements for the module command detection and execution:

  1. Function get_module_cmd_from_bash() is agnostic to the naming format of environment variables that are used for storing of shell functions (i.e. $BASH_FUNC_module() or $BASH_FUNC_module%%).
  2. Added support for the case when shell function module is defined with shell variables that are set in the bash initialization scripts but not exported.
  3. Added support for old versions of Tcl Environment Modules, which generate incorrect python scripts (e.g. exec '/tmp/modulescript_12345_00').
  4. Argument bashopts, which was used only for testing, is removed from the function get_module_cmd_from_bash()get_module_cmd().

@davydden
Copy link
Member

i don't know how modules are set up on HPC clusters, but this PR might fix #2563 (comment)

@skosukhin
Copy link
Member Author

I would like to work on this a bit more.

@skosukhin skosukhin changed the title Better detection of the module command [WIP] Better detection of the module command Nov 28, 2017
@skosukhin skosukhin changed the title [WIP] Better detection of the module command Better detection of the module command Nov 29, 2017
@skosukhin
Copy link
Member Author

Travis has some problems with testing on MacOs but the tests should pass. I think this is ready for merging.
@alalazo @becker33 @scheibelp @junghans @adamjstewart @tgamblin

@adamjstewart
Copy link
Member

Restarted the tests on macOS.

@f-sim
Copy link
Contributor

f-sim commented Jan 11, 2018

Is there any chance that this PR will be merged?

@mgsternberg
Copy link
Contributor

I would need this PR as well. Being a relative spack noob, I'm not sure, though, how to deal with the code here, but I'd like to ask to accommodate the following use case for Modules-Tcl:

$ type module
module is a function
module () 
{ 
    _moduleraw $* 2>&1
}

$ type _moduleraw
_moduleraw is a function
_moduleraw () 
{ 
    eval `/lib64/ld-linux-x86-64.so.2 --library-path "" /usr/bin/tclsh /usr/share/modules-tcl/libexec/modulecmd.tcl bash $*`
}

Background: This is from my HPC system where I use Modules-Tcl 1.962 (2017-08-09) and provide the functions above to my users. I implemented the ld-linux-x86-64.so.2 hack to isolate /usr/bin/tclsh from perfectly respectable modules that contain some version of libtcl*.so, which is exposed by design or unintentionally via LD_LIBRARY_PATH and thus could destabilize /usr/bin/tclsh, in effect poisoning the module command. (Such a possible interference has been an inherent and age-old issue for Modules.)

@mamelara
Copy link
Member

@skosukhin any news on this?

@mgsternberg
Copy link
Contributor

I'd like to sound a note of caution here about adding complexity to the existing module detection and use. Let me say that the actual code for expanding and handling the variables in get_module_cmd_from_bash() as done here is quite clever in its details. I am, however, skeptical of the need for such rather raw interventions. In fact, if I'm not mistaken, the step of expanding variables in the function definition is fundamentally flawed, which I hasten to add is not the fault of the present PR, but already present in develop.

(NB: I submitted #7536 which overlaps an edit with the present PR but does not address the same issues.)

Trouble with current module interface

Spack assumes one of the following:

  1. That there indeed is a command called modulecmd in PATH. Moreover, Spack assumes knowledge of how it works.
    • This was not the case in a pre-release version of Environment Modules 4.x, as I commented above.
  2. The module function definition contains an eval-type line using backticks.
    • $( ... ) might be used instead.
    • The sub-command might extend over several lines.
    • The function definition may not contain eval at all. – That is in fact the case in the normal deployment of Environment Modules 4.x (but see Env mod 4 #7536), for the purpose of redirecting stderr, such as to make output directly grep-able:
      $ type module
      module is a function
      module () 
      { 
          _moduleraw $* 2>&1
      }
      
    • Variables are expanded – see section delow.

Ideal(ized?) interface

Ideally, Spack would interface with modules through the one and only interface that is documented and common across Modules-{C/Tcl} 3.x, Modules-4.x, Lmod, (others?), namely, the module command, either through Bash natively or in any other supported interface of the locally active Modules implementation. I realize that the hitch here is in identifying and correctly using said implementation.

Variable expansion pitfalls.

The motivation for the current PR seems to be to avoid evaluating the BASH_FUNC_foo environment variables (item 1. of the PR) and accessing variables inside the module function definition (item 2, though I wonder if I understand correctly: If an internally used variable hasn't been exported but the module function itself has been, would the latter even be usable in child shells?).

Consider the following example for how a module function might plausibly be defined, followed by (a) a demo that it works, and (b) ways of inspecting its definition:

$ cat /tmp/demo.sh 
function module () {
    local tcl=/usr/bin/tclsh
    eval `$tcl /usr/share/Modules/libexec/modulecmd.tcl bash $*`
}

echo -e '\n== Demo =='
module -l avail dot 

echo -e '\n== Definition, plain =='
typeset -f module

echo -e '\n== Definition, neutered =='
typeset -f module | envsubst

Running this, we get:

$ bash /tmp/demo.sh

== Demo ==
- Package/Alias -----------------------.- Versions --------.- Last mod. -------
/usr/share/Modules/modulefiles:
dot                                                         2018/02/07 16:05:07

== Definition, plain ==
module () 
{ 
    local tcl=/usr/bin/tclsh;
    eval `$tcl /usr/share/Modules/libexec/modulecmd.tcl bash $*`
}

== Definition, neutered ==
module () 
{ 
    local tcl=/usr/bin/tclsh;
    eval ` /usr/share/Modules/libexec/modulecmd.tcl bash $*`
}

The point is that trying to inspect the eval from the last line would be futile. The example here is for Env-Mod 4.x, but even with a modulecmd available as expected, module could be defined with a similar construct that would likewise elude being captured in get_module_cmd_from_bash:

    …
    cmd=/usr/bin/modulecmd
    eval `$cmd bash $*`
    …

In summary, the question is: Can and should(?) Spack interface with a locally active module command in a manner that is more documented and has fewer assumptions?

@skosukhin
Copy link
Member Author

@mgsternberg, thank you for the analysis of the issue. Let me give a short history of the development of this feature.

The function module is a shell function and we can't use it because it would get called in a child process of Spack and the environment wouldn't get changed. Thus, we have to call modulecmd executable directly to get Python scripts for changing the environment. Correct me if I'm wrong about this.

So, the first implementation was introduced with the assumption that modulecmd is in the $PATH. Unfortunately, this is not always the case, so I tried to retrieve the path to modulecmd from the definition of the module function but gave up.

Later the idea was implemented (#3250). Although I was quite sceptical in the beginning, it turned out that it just worked in many cases (I have an opportunity to test Spack on a comparably large number of machines).

Now, there are cases, in which a really old version of Environment Modules is used. The first problem is that such old versions don't generate correct Python scripts. This PR targets this issue (not in the best way but I would be glad to learn about a better way).

The second problem is related to the initialization script of Environment Modules. Depending on the mode, bash gets initialized by running different scripts: (it should be noted that there are cases, in which bash doesn't follow the documentation). In the ideal case, module function gets initialized in all possible scenarios (except for the POSIX mode, probably, which, I think, is rather exotic). This is why child shells "know" the function module too: not because they inherit its definition from their parents via environment variables. The problem here is that the function is often defined with variables that are not exported (e.g. $TCLSH), which is why we can't expand them in Spack – in a child python process of a bash process. So, our chance is to call bash in the interactive login mode (assuming that at least in this case the module function gets defined correctly), retrieve values of all defined variables, even non-exported ones (compgen -v), and use them to expand variables that are used in the definition of the module function. This is also implemented in this PR.

Another issue targeted by this PR is that we can't retrieve the definition of the module function by just reading the environment variable $BASH_FUNC_module(), because (probably, it has something to do with the Shellshock) the variable that stores the definition of the function can have a name $BASH_FUNC_module%%. Instead of checking both of the names, I decided to get the definition by calling typeset -f module.

I didn't change the approach that is already implemented in develop (BTW, it seems to take care of the case, in which module function is defined as $(...)) but polished it a bit. It works in a larger number of cases now and hopefully keeps doing so in all the real cases, in which it functioned before.

I admit that the number of cases when this won't work is large. Environment Modules are often configured in ways that just don't account for the scenarios, in which modules are loaded not from bash. Although I think that we should give users an option to specify a command that generates correct Python module loading scripts in Spack's configure file.

@mgsternberg
Copy link
Contributor

Sergey,

Thank you for illuminating the history!

On interfacing with a module shell function

@skosukhin:

The function module is a shell function and we can't use it because it would get called in a child process of Spack and the environment wouldn't get changed.

Not quite – EnvironmentModifications.from_sourcing_file() uses subprocess.Popen to determine changes of environment variables. It does, in essence:

/bin/bash -c source filename &> /dev/null \
    && python -c "import os, json; print(json.dumps(dict(os.environ)))"

I have some quibbles with the code details (discussed at #7536), but the general approach is reasonable. That said, using Python to dump the environment could introduce an "observer effect". I'd rather prefer env, as Orion Poplawski's createmodule.py does.

On the Module installation and versions landscape

As you explain, to interface with Modules on different systems, Spack deals with the following issues:

A. The module function may be initialized incorrectly and therefore may not be available in child processes.

Putting a grumpy hat on, I must say that this should not be Spack's problem to solve in the first place.

But I realize that the cause may not be mere inexperience on the part of a local admin or user but could also be the need for a conservative or controlled approach in critical environments like handling beamline or accelerator data. That, of course, wouldn't help a user who has to work with an uncooperative module installation.

B. Older Modules versions generate incorrect Python.

OK; I'll review the implications of that.

modulecmd avenue

Short of the ideal(ized) approach mentioned of only using a documented module interface, I advocate that Spack do a few simple steps to attempt to identify modulecmd, and entirely abandon attempts at low-level parsing of the module function internals, which in my opinion introduces far too much complexity and bug potential, invites future incompatibilities, and is going to make maintenance harder.

Although I think that we should give users an option to specify a command that generates correct Python module loading scripts in Spack's configure file.

I thought about this as well, and I think it would entirely avoid item A and for item B (your issue 3) would put the burden on the user (not unreasonably so). Also, that particular issue would presumably fade in the coming years. (Though the Spack project is still wet behind the ears compared to Modules.)

  • Read a Spack config item supplying a command that acts like modulecmd python $*
  • Locate modulecmd through which() and use when found.
    • Fix up the missing last step for older Modules-Tcl to generate Python like you proposed.

module avenue

I wouldn't give up on trying to use a module function directly. After all, the particulars of this being a Python function to compile and exec is only needed for the 'load' and 'unload' sub-commands. For any other subcommands ('avail', 'show', ...), the "shell" language arg passed to modulecmd makes no difference as only stdout is of interest.

In fact, the effects of module load foo and module unload foo could just as well be determined by running these commands in a manner similar to EnvironmentModifications.from_sourcing_file(), just with source foo replaced by module subcmd foo.

@skosukhin
Copy link
Member Author

@alalazo @adamjstewart do you think this PR has a chance to be merged? I admit that the approach suggested by @mgsternberg is definitely worth trying, but perfect is the enemy of the good. I believe that this PR already improves what we have in develop.

Take a look at my modifications of class Executable. I am not sure that this is how it should be done, but I couldn't come up with a better solution.

BTW, #7884 touches the same part of the code.

@adamjstewart
Copy link
Member

I don't like to merge non-package PRs myself. A PR like this definitely requires a lot of testing and unit tests. I agree that a fix is much needed, and this has been a long-standing issue. I just don't know enough about it to merge it myself.

@adamjstewart
Copy link
Member

@skosukhin I'm trying this out for the first time and I'm seeing the following error message:

$ spack uninstall zlib
==> Warning: Failed to detect module function as a shell function in the current environment: Bash function 'module' is not defined or exported. Trying to detect in the interactive login shell.
Traceback (most recent call last):
  File "/u/sciteam/stewart1/spack/bin/spack", line 80, in <module>
    import spack.main  # noqa
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/__init__.py", line 84, in <module>
    import spack.repository
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/repository.py", line 52, in <module>
    import spack.spec
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/spec.py", line 119, in <module>
    import spack.compilers as compilers
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/compilers/__init__.py", line 36, in <module>
    import spack.config
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/config.py", line 212, in <module>
    _platform = spack.architecture.platform().name
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/llnl/util/lang.py", line 182, in __call__
    self.cache[args] = self.func(*args)
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/architecture.py", line 499, in platform
    return platform_cls()
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/platforms/cray.py", line 69, in __init__
    for target in self._avail_targets():
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/platforms/cray.py", line 146, in _avail_targets
    module = get_module_cmd()
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/util/module_cmd.py", line 88, in get_module_cmd
    result(error=str)) is not None:
  File "/mnt/a/u/sciteam/stewart1/spack/lib/spack/spack/util/executable.py", line 184, in __call__
    proc.returncode, cmd_line)
spack.util.executable.ProcessError: Command exited with status 1:
    '/opt/modules/3.2.10.4/bin/modulecmd' 'python'

@becker33
Copy link
Member

@skosukhin @mgsternberg @adamjstewart I've submitted a fix with @mgsternberg's approach in #8570. Can you confirm that it fully obviates the need for this PR?

@tgamblin tgamblin added this to To do in Modules Improvement Jul 6, 2018
Modules Improvement automation moved this from To do to Done May 9, 2019
tgamblin pushed a commit that referenced this pull request May 9, 2019
Use new `module` function instead of `get_module_cmd`

Previously, Spack relied on either examining the bash `module()` function or using the `which` command to find the underlying executable for modules. More complicated module systems do not allow for the sort of simple analysis we were doing (see #6451).

Spack now uses the `module` function directly and copies environment changes from the resulting subprocess back into Spack. This should provide a future-proof implementation for changes to the logic underlying the module system on various HPC systems.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants