-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only load root specs in env (de)activate #25755
base: develop
Are you sure you want to change the base?
Only load root specs in env (de)activate #25755
Conversation
d99b4b4
to
a62af99
Compare
Partially undoes spack#23485 The current logic of `spack env activate` is as follows: 1. collect root specs 2. add root specs + transient run type deps of root specs to a list 3. call `environment_modifications_for_spec` for every spec of this list However, `environment_modifications_for_spec` also processes run type dependencies of the spec, so we're doing a lot of redundant work, resulting in O(n^2) calls to `modifications_from_dependencies` if we have a chain of deps `a` <- `b` <- ... <- `z` of deptype `run`. This PR drops step 2, so that we call `environment_modifications_for_spec` only on the root specs, and this function will process the run type for us anyways. Given an environment like this: ```yaml spack: specs: - py-flake8 - py-matplotlib - py-isort - py-sphinx - py-six - py-ipython ``` This is the time spent on `activate` before and after: **Before**: ``` In [1]: from spack.main import SpackCommand In [2]: env = SpackCommand('env') In [3]: %timeit env('activate', '--sh', '.') 2.56 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` **After**: ``` In [1]: from spack.main import SpackCommand In [2]: env = SpackCommand('env') In [3]: %timeit env('activate', '--sh', '.') 761 ms ± 4.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` which is 70% less.
a62af99
to
627a79c
Compare
@@ -956,6 +956,8 @@ def add_modifications_for_dep(dep): | |||
dpkg.setup_dependent_build_environment(env, spec) | |||
else: | |||
dpkg.setup_dependent_run_environment(env, spec) | |||
if dep in exe_deps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scheibelp can you comment on this? I generally struggle to follow the logic in modifications_from_dependencies
, but this does the job.
My issue though is that one line up we're calling setup_dependent_run_environment
on all deps in custom_mod_deps
which is run_and_supporting_deps
, which includes link deps too, which looks like a mistake? Hence I added if dep in exe_deps
. Generally though this function is much more complicated then it needs to be, but I don't want to modify more than necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).
"this package has already been added".format( | ||
spec.format("{name}/{hash:7}") | ||
) | ||
if root_spec.name in visited: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I just also undo the visited
part @scheibelp? It now only applies to root specs anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.
I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited
list entirely.
Before$ time spack env activate .
real 0m18.318s
user 0m15.929s
sys 0m2.228s
$ time spack env deactivate
real 0m24.808s
user 0m20.695s
sys 0m3.949s After$ time spack env activate .
real 0m11.325s
user 0m9.979s
sys 0m1.241s
$ time spack env deactivate
real 0m14.652s
user 0m12.286s
sys 0m2.188s |
Still two orders of magnitude too slow :p but an improvement nonetheless... |
We should just store the environment in a bash script and source that instead. I assume that's how conda does things in less than 0.1 sec. |
Yeah, that'd be a useful shortcut, we can hash spack.lock and verify in the shell script if it was derived from the same concrete env. |
@scheibelp this shouldn't really require the upcoming proposal about loading behavior, considering #23368 quoted todd So hope you can review :) |
@@ -956,6 +956,8 @@ def add_modifications_for_dep(dep): | |||
dpkg.setup_dependent_build_environment(env, spec) | |||
else: | |||
dpkg.setup_dependent_run_environment(env, spec) | |||
if dep in exe_deps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).
"this package has already been added".format( | ||
spec.format("{name}/{hash:7}") | ||
) | ||
if root_spec.name in visited: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.
I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited
list entirely.
This PR partially reverts #23485 as it introduced a performance regression
in
spack env activate
. The test from that PR is still there and passing.The current logic of
spack env activate
is as follows:environment_modifications_for_spec
for every spec of this listHowever,
environment_modifications_for_spec
also processes run typedependencies of the spec, so we're doing a lot of redundant work,
resulting in O(n^2) processed specs given
n
of them ina chain like
a
<-b
<- ... <-z
This PR drops step 2, so that we call
environment_modifications_for_spec
only on the root specs, and thisfunction will process the run type deps for us anyways.
Given an environment like this:
This is the time spent on
activate
before and after:Before:
After:
which is 68% less.
The diff of
spack env activate --sh .
is the same except for the order of paths inPYTHONPATH
, but note that the logic introduced in #23485 didn't walk over the deps in post order:root_spec.traverse(deptype='run', root=True)
which seems to be a mistake? (@scheibelp). Anyways, that's gone with this pr.