-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Execution in login shell (or not) #201
Comments
Some additional complications have come up. In some cases,
Note that on my Ubuntu system, the default It's not clear to me whether there's a difference between |
Just encountered another issue: if we don't run in a login shell, then we'll inherit the environment from the parent, which is the manager or the node agent. The manager has in turn inherited the environment from the shell that started it, but not any functions defined in it. Subshells do "inherit" functions, because they're forked subprocesses of the parent shell, but when starting a Python interpreter those are lost because Python doesn't know anything about shell functions and there's no mechanism to pass them. The problem is that the So we need a login shell, but the problem with that is that it can also contain other stuff that we don't want (giant active banners, commands to move to a different directory, anything really). It seems that lmod defines a few environment variables that we may be able to use. Sourcing We could add those commands to the run script if we detect that LMOD_PKG and SPACK_ROOT are set, but what if we're using environment modules, and what if we're using EasyBuild or Nix with lmod? We'd have to figure out all those situations and add support for them one by one. And test them too... Question: how does SLURM do this? It starts the job script, and inside the job script you can do |
https://lists.schedmd.com/pipermail/slurm-users/2021-January/006675.html According to the bash man page, login shells read Of course, we have no idea where the cluster administrators source the module environment script, so that doesn't help. The link above says that the So then why was |
Model components are currently executed in a login shell. This is nice, because it means the environment is the same as what you have on the command line, so there are fewer unexpected differences. On the other hand, there may be cases where different models require different things, and you want a clean environment to explicitly add modules and variables to. Also, the first case may create unexpected conflicts, because the shell scripts loaded for a login shell may fail in the presence of environment variables injected from the environment in which the manager was started by QCG-PJ.
QCG-PJ currently seems to copy various bits of the environment from that in which it is running to the jobs it runs, but as I recall it's not the same locally as on a cluster. It also always runs in a login shell, at least in a cluster but not when running locally. MUSCLE3 currently manually adds a
bash -l -c
to local runs to at least make it consistent.Both of the above cases actually seem reasonable, so the solution is probably to add another key to the implementations section in the yMMSL file that specifies whether we want a login shell or a normal one, and/or a clean one or with passthrough from the host environment.
Thanks to @peter-t-fox for the report and discussion.
The text was updated successfully, but these errors were encountered: