Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only load root specs in env (de)activate #25755

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

haampie
Copy link
Member

@haampie haampie commented Sep 2, 2021

This PR partially reverts #23485 as it introduced a performance regression
in spack env activate. The test from that PR is still there and passing.

The current logic of spack env activate is as follows:

  1. collect root specs
  2. add root specs + transitive run type deps of root specs to a list
  3. call environment_modifications_for_spec for every spec of this list

However, environment_modifications_for_spec also processes run type
dependencies of the spec, so we're doing a lot of redundant work,
resulting in O(n^2) processed specs given n of them in
a chain like a <- b <- ... <- z

This PR drops step 2, so that we call
environment_modifications_for_spec only on the root specs, and this
function will process the run type deps for us anyways.

Given an environment like this:

spack:
  specs:
  - py-flake8
  - py-matplotlib
  - py-isort
  - py-sphinx
  - py-six
  - py-ipython

This is the time spent on activate before and after:

Before:

In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
2.56 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

After:

In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
820 ms ± 6.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

which is 68% less.

The diff of spack env activate --sh . is the same except for the order of paths in PYTHONPATH, but note that the logic introduced in #23485 didn't walk over the deps in post order: root_spec.traverse(deptype='run', root=True) which seems to be a mistake? (@scheibelp). Anyways, that's gone with this pr.

@haampie haampie force-pushed the fix/speedup-spack-env-(de)-activation branch from d99b4b4 to a62af99 Compare September 2, 2021 13:18
Partially undoes spack#23485

The current logic of `spack env activate` is as follows:

1. collect root specs
2. add root specs + transient run type deps of root specs to a list
3. call `environment_modifications_for_spec` for every spec of this list

However, `environment_modifications_for_spec` also processes run type
dependencies of the spec, so we're doing a lot of redundant work,
resulting in O(n^2) calls to `modifications_from_dependencies` if we
have a chain of deps `a` <- `b` <- ... <- `z` of deptype `run`.

This PR drops step 2, so that we call
`environment_modifications_for_spec` only on the root specs, and this
function will process the run type for us anyways.

Given an environment like this:

```yaml
spack:
  specs:
  - py-flake8
  - py-matplotlib
  - py-isort
  - py-sphinx
  - py-six
  - py-ipython
```

This is the time spent on `activate` before and after:

**Before**:

```
In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
2.56 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

**After**:

```
In [1]: from spack.main import SpackCommand
In [2]: env = SpackCommand('env')
In [3]: %timeit env('activate', '--sh', '.')
761 ms ± 4.36 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

which is 70% less.
@haampie haampie force-pushed the fix/speedup-spack-env-(de)-activation branch from a62af99 to 627a79c Compare September 2, 2021 14:03
@@ -956,6 +956,8 @@ def add_modifications_for_dep(dep):
dpkg.setup_dependent_build_environment(env, spec)
else:
dpkg.setup_dependent_run_environment(env, spec)
if dep in exe_deps:
Copy link
Member Author

@haampie haampie Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scheibelp can you comment on this? I generally struggle to follow the logic in modifications_from_dependencies, but this does the job.

My issue though is that one line up we're calling setup_dependent_run_environment on all deps in custom_mod_deps which is run_and_supporting_deps, which includes link deps too, which looks like a mistake? Hence I added if dep in exe_deps. Generally though this function is much more complicated then it needs to be, but I don't want to modify more than necessary.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).

"this package has already been added".format(
spec.format("{name}/{hash:7}")
)
if root_spec.name in visited:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I just also undo the visited part @scheibelp? It now only applies to root specs anyways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.

I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited list entirely.

@adamjstewart
Copy link
Member

Before

$ time spack env activate .

real	0m18.318s
user	0m15.929s
sys	0m2.228s
$ time spack env deactivate

real	0m24.808s
user	0m20.695s
sys	0m3.949s

After

$ time spack env activate .

real	0m11.325s
user	0m9.979s
sys	0m1.241s
$ time spack env deactivate

real	0m14.652s
user	0m12.286s
sys	0m2.188s

@haampie
Copy link
Member Author

haampie commented Sep 3, 2021

Still two orders of magnitude too slow :p but an improvement nonetheless...

@adamjstewart
Copy link
Member

We should just store the environment in a bash script and source that instead. I assume that's how conda does things in less than 0.1 sec.

@haampie
Copy link
Member Author

haampie commented Sep 3, 2021

Yeah, that'd be a useful shortcut, we can hash spack.lock and verify in the shell script if it was derived from the same concrete env. 0.1s is less than the python startup time, so it's likely ;)

@haampie
Copy link
Member Author

haampie commented Sep 26, 2021

@scheibelp this shouldn't really require the upcoming proposal about loading behavior, considering #23368 quoted todd _env_modifications_for_default_view doesn’t currently traverse run deps, but it really should and this PR does the same but is more efficient.

So hope you can review :)

@@ -956,6 +956,8 @@ def add_modifications_for_dep(dep):
dpkg.setup_dependent_build_environment(env, spec)
else:
dpkg.setup_dependent_run_environment(env, spec)
if dep in exe_deps:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do worry a little here about link deps whose behavior may be modified by environment variables (like libraries that use environment variables to find their config files).

"this package has already been added".format(
spec.format("{name}/{hash:7}")
)
if root_spec.name in visited:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to cause problems -- what if you have hdf5^mpich and openmpi in your environment, and openmpi is loaded first. Then the hdf5 dependency mpich will be loaded later than openmpi, and will shadow its environment variables.

I think we may be able to get around that problem by sorting the root specs by size -- if we always load bigger specs first, then they can't be "sneakily" pulling in something that will shadow a dependency (and then I think we could get rid of the visited list entirely.

@adamjstewart adamjstewart removed their assignment May 6, 2022
@becker33 becker33 added this to In progress in Spack v0.19.0 release via automation May 24, 2022
@tgamblin tgamblin added this to the v0.20.0 milestone Nov 7, 2022
@tgamblin tgamblin removed this from In progress in Spack v0.19.0 release Nov 7, 2022
@alalazo alalazo modified the milestones: v0.20.0, v0.21.0 May 10, 2023
@alalazo alalazo removed this from the v0.21.0 milestone Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants