Skip to content

Commit

Permalink
Distributed builds (#13100)
Browse files Browse the repository at this point in the history
Fixes #9394
Closes #13217.

## Background
Spack provides the ability to enable/disable parallel builds through two options: package `parallel` and configuration `build_jobs`.  This PR changes the algorithm to allow multiple, simultaneous processes to coordinate the installation of the same spec (and specs with overlapping dependencies.).

The `parallel` (boolean) property sets the default for its package though the value can be overridden in the `install` method.

Spack's current parallel builds are limited to build tools supporting `jobs` arguments (e.g., `Makefiles`).  The number of jobs actually used is calculated as`min(config:build_jobs, # cores, 16)`, which can be overridden in the package or on the command line (i.e., `spack install -j <# jobs>`).

This PR adds support for distributed (single- and multi-node) parallel builds.  The goals of this work include improving the efficiency of installing packages with many dependencies and reducing the repetition associated with concurrent installations of (dependency) packages.

## Approach
### File System Locks
Coordination between concurrent installs of overlapping packages to a Spack instance is accomplished through bottom-up dependency DAG processing and file system locks.  The runs can be a combination of interactive and batch processes affecting the same file system.  Exclusive prefix locks are required to install a package while shared prefix locks are required to check if the package is installed.

Failures are communicated through a separate exclusive prefix failure lock, for concurrent processes, combined with a persistent store, for separate, related build processes.  The resulting file contains the failing spec to facilitate manual debugging.

### Priority Queue
Management of dependency builds changed from reliance on recursion to use of a priority queue where the priority of a spec is based on the number of its remaining uninstalled dependencies.  

Using a queue required a change to dependency build exception handling with the most visible issue being that the `install` method *must* install something in the prefix.  Consequently, packages can no longer get away with an install method consisting of `pass`, for example.

## Caveats
- This still only parallelizes a single-rooted build.  Multi-rooted installs (e.g., for environments) are TBD in a future PR.

Tasks:
- [x] Adjust package lock timeout to correspond to value used in the demo
- [x] Adjust database lock timeout to reduce contention on startup of concurrent
    `spack install <spec>` calls
- [x] Replace (test) package's `install: pass` methods with file creation since post-install 
    `sanity_check_prefix` will otherwise error out with `Install failed .. Nothing was installed!`
- [x] Resolve remaining existing test failures
- [x] Respond to alalazo's initial feedback
- [x] Remove `bin/demo-locks.py`
- [x] Add new tests to address new coverage issues
- [x] Replace built-in package's `def install(..): pass` to "install" something
    (i.e., only `apple-libunwind`)
- [x] Increase code coverage
  • Loading branch information
tldahlgren committed Feb 19, 2020
1 parent 2f4881d commit f2aca86
Show file tree
Hide file tree
Showing 100 changed files with 2,954 additions and 963 deletions.
2 changes: 1 addition & 1 deletion etc/spack/defaults/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,7 @@ config:
# when Spack needs to manage its own package metadata and all operations are
# expected to complete within the default time limit. The timeout should
# therefore generally be left untouched.
db_lock_timeout: 120
db_lock_timeout: 3


# How long to wait when attempting to modify a package (e.g. to install it).
Expand Down
Loading

0 comments on commit f2aca86

Please sign in to comment.