Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP, ENH: Linux simultaneous install locking #13217

Closed

Conversation

tylerjereddy
Copy link
Contributor

Linux-only for now

  • adds experimental support for "locking" attempts at doing spack install package multiple times simultaneously with the same spack instance (can be a problem leading to errors in certain CI testing situations, for example)
  • the regression test, while reproducibly failing locally before the fix & passing after, is a little gruesome with subprocess shell usage---there may be better ways to use your mocking test infrastructure here?

Here's a simple example of this working for me locally on a Linux box, along with output:

tyler@machine:$ spack install bzip2 & spack install bzip2 && wait
[1] 32050
Acquired spack install Linux lock
spack is currently locked from additional install commands to protect against undesirable behavior with simultaneous installs
==> libiconv is already installed in /home/tyler/github_projects/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-7.4.0/libiconv-1.16-ajkkpczfc3ijoubpfvksfghjg6hy2ovq
==> diffutils is already installed in /home/tyler/github_projects/spack/opt/spack/linux-ubuntu18.04-skylake_avx512/gcc-7.4.0/diffutils-3.7-woopak5zo6qrczxc37b4r6wvwm5o5snx
==> Installing bzip2
==> Searching for binary cache of bzip2
==> No binary for bzip2 found: installing from source
==> Using cached archive: /home/tyler/github_projects/spack/var/spack/cache/bzip2/bzip2-1.0.8.tar.gz
==> Staging archive: /tmp/tyler/spack-stage/bzip2-1.0.8-x4qtjiesmosoaosdibkhxsacpovckflu/bzip2-1.0.8.tar.gz
==> Created stage in /tmp/tyler/spack-stage/bzip2-1.0.8-x4qtjiesmosoaosdibkhxsacpovckflu
==> Ran patch() for bzip2
==> Building bzip2 [Package]
==> Executing phase: 'install'
==> Successfully installed bzip2
  Fetch: 0.00s.  Build: 1.92s.  Total: 1.93s.

cc @junghans

* draft a new regression test for
handling simultaneous installs in
the same spack instance; this is
currently a Linux-only test

* the test, test_simultaneous_installs()
is currently rather ugly, running
through subprocess, but for now is
at least reproducibly failing locally
* add locking to "spack install" on Linux; allows
the regression test to pass & appears to work
well locally in practical testing

* flake8 cleanups
@tylerjereddy
Copy link
Contributor Author

CI fail is just subprocess.check_output not existing in Python 2.6, but I won't aim to fix that for now because I suspect we'll want a regression test that doesn't use subprocess in any case, which may require expert input to design anyway since we basically need to invoke spack multiple times simultaneously--maybe multiprocessing with some SpackCommand magic or something.

@junghans
Copy link
Contributor

ping @becker33

@alalazo
Copy link
Member

alalazo commented Oct 25, 2019

Just reading the description I think this might be in direct conflict with a feature on the 0.15 roadmap 😟 The WIP is in #13100 and should ensure that multiple invocations of the same Spack instance coordinate via filesystem locks.

Let me know if you think I am wrong or misunderstood the scope of the PR.

@alalazo
Copy link
Member

alalazo commented Oct 25, 2019

@tldahlgren

@tylerjereddy
Copy link
Contributor Author

If you have a better solution in progress that is more than fine by me. Filesystem locks can be advisory only, but presumably there's a plan there to enforce them. I can just focus on #13213 and #13139 as review comments roll in then.

@tldahlgren tldahlgren mentioned this pull request Oct 25, 2019
10 tasks
@becker33
Copy link
Member

becker33 commented Nov 5, 2019

Spack already takes write locks on the package prefix and read locks on all of its dependencies before building, as well as appropriate locking of internal databases and file caches. If you're seeing bugs in the locking code I would be more inclined to work on debugging that than to add another layer of locking, unless you can make an argument that you're locking something of a fundamentally different type here.

As far as bugs in the locking mechanism go, hopefully they will be addressed as part of @tldahlgren's current work on locking. Can you give some more details about what hasn't been working for you that this is designed to solve?

@tylerjereddy
Copy link
Contributor Author

The bug and target of the fix are pretty clear here--there's a minimum working reproducer built right into the regression test I've added.

Perhaps you can make more specific suggestions as to what needs to be changed. It seems to me that I've identified a clear bug & written a test for it along with the fix.

Add the regression test to the competing pull request if you want to prove that it resolves this issue.

tgamblin pushed a commit that referenced this pull request Feb 19, 2020
Fixes #9394
Closes #13217.

## Background
Spack provides the ability to enable/disable parallel builds through two options: package `parallel` and configuration `build_jobs`.  This PR changes the algorithm to allow multiple, simultaneous processes to coordinate the installation of the same spec (and specs with overlapping dependencies.).

The `parallel` (boolean) property sets the default for its package though the value can be overridden in the `install` method.

Spack's current parallel builds are limited to build tools supporting `jobs` arguments (e.g., `Makefiles`).  The number of jobs actually used is calculated as`min(config:build_jobs, # cores, 16)`, which can be overridden in the package or on the command line (i.e., `spack install -j <# jobs>`).

This PR adds support for distributed (single- and multi-node) parallel builds.  The goals of this work include improving the efficiency of installing packages with many dependencies and reducing the repetition associated with concurrent installations of (dependency) packages.

## Approach
### File System Locks
Coordination between concurrent installs of overlapping packages to a Spack instance is accomplished through bottom-up dependency DAG processing and file system locks.  The runs can be a combination of interactive and batch processes affecting the same file system.  Exclusive prefix locks are required to install a package while shared prefix locks are required to check if the package is installed.

Failures are communicated through a separate exclusive prefix failure lock, for concurrent processes, combined with a persistent store, for separate, related build processes.  The resulting file contains the failing spec to facilitate manual debugging.

### Priority Queue
Management of dependency builds changed from reliance on recursion to use of a priority queue where the priority of a spec is based on the number of its remaining uninstalled dependencies.  

Using a queue required a change to dependency build exception handling with the most visible issue being that the `install` method *must* install something in the prefix.  Consequently, packages can no longer get away with an install method consisting of `pass`, for example.

## Caveats
- This still only parallelizes a single-rooted build.  Multi-rooted installs (e.g., for environments) are TBD in a future PR.

Tasks:
- [x] Adjust package lock timeout to correspond to value used in the demo
- [x] Adjust database lock timeout to reduce contention on startup of concurrent
    `spack install <spec>` calls
- [x] Replace (test) package's `install: pass` methods with file creation since post-install 
    `sanity_check_prefix` will otherwise error out with `Install failed .. Nothing was installed!`
- [x] Resolve remaining existing test failures
- [x] Respond to alalazo's initial feedback
- [x] Remove `bin/demo-locks.py`
- [x] Add new tests to address new coverage issues
- [x] Replace built-in package's `def install(..): pass` to "install" something
    (i.e., only `apple-libunwind`)
- [x] Increase code coverage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants