Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error path in spec.py causes mainline path to use phantom provider #8059

Closed
djfitzgerald opened this issue May 8, 2018 · 1 comment
Closed
Labels
bug Something isn't working power

Comments

@djfitzgerald
Copy link
Contributor

Issue #7901 exposes a bug in the way that _expand_virtual_packages() in spack/lib/spack/spack/spec.py handles exceptions raised by copy.normalize(force=True).

My environment is a POWER9 node running RHEL 7.4, with a clean Spack installation and no external packages installed. My ~/.spack/linux/packages.yaml file contains the following:

packages:
  spectrum-mpi:
    paths: 
        spectrum-mpi@10.02.00 arch=linux-rhel7-ppc64le: /opt/ibm/spectrum_mpi
    version: [10.02.00]
    buildable: false
  essl:
    paths:
        essl@6.1%xl_r arch=linux-rhel7-ppc64le: /opt/ibmmath/essl/6.1
    version: [6.1]
    buildable: false
  all:
    providers:
      mpi: [spectrum-mpi]
      blas: [essl]

The spectrum-mpi and essl external packages were both installed via spack install. The essl package provides blas but unlike most other blas providers does not also provide lapack.

When executing spack spec petsc%xl_r^essl, I get the following error message:

--> spack spec petsc%xl_r^essl
Input spec
--------------------------------
petsc%xl_r
    ^essl

Concretized
--------------------------------
==> Error: Multiple providers found for 'blas': ['essl@6.1%xl_r@16.1 fflags="-qzerosize" ~cuda~ilp64 threads=openmp arch=linux-rhel7-ppc64le', 'veclibfort@0.4.2%xl_r@16.1 fflags="-qzerosize" +shared arch=linux-rhel7-ppc64le']

This error message does not accurately describe the actual problem, that essl only provides blas but not lapack while petsc requires both virtual dependencies and Spack doesn't seem to be able to use blas from one package while taking lapack from another package (issue #7901 ). Furthermore, the lapack package hasn't even been installed.

Examining the Spack source code, I was able to determine the cause of this incorrect error message. When executing spack spec petsc%xl_r^essl, control was eventually passed to the concretize(self) function in spack/lib/spack/spack/spec.py. At the beginning of the function is this code block:

        changed = True
        force = False

        while changed:
            pdb.set_trace()
            changes = (self.normalize(force),
                       self._expand_virtual_packages(),
                       self._concretize_helper())
            changed = any(changes)
            force = True

        for s in self.traverse():
                . . . .

On entry, self contains a variable that is set to the spec given to spack spec by the user, in this case spack spec petsc%xl_r^essl.

Key to what's going on is that while loop. On its first iteration, it creates a 3-tuple named changes. Each component of this tuple is a boolean variable indicating whether the call to its corresponding function resulted in self changing.

self.normalize(force) begins by normalizing spack spec petsc%xl_r^essl into petsc%xl_r^bzip2^essl^lapack^ncurses^openssl^pkgconfig^python@2.6:2.8^readline^sqlite^zliband returning True.

Next, self._expand_virtual_packages() is called to recursively process the virual packages in that now-normalized spec, replacing them with providers and normalizng it all again to include the provider's (possibly virtual) dependencies.

When it goes to process lapack, its call to spack.concretizer.choose_virtual_or_external(spec) returns the following list of candidates from the repository: [openblas, atlas, intel-mkl, intel-parallel-studio+mkl, netlib-lapack, veclibfort]. None of these are currently installed on the system. _expand_virtual_packages() processes each of these candidates, substiuting them for lapack in a copy of the top-level spec and then calling normalize() to consoldate any duplicate providers or duplicate provider dependencies and merge their constraints. But each of these candidates provide both lapack and blas, which is already being supplied by the installed essl. So the call for normalize() will fail for each and every candidate.

Now consider the following code from _expand_virtual_packages():

                    # Try the replacements in order, skipping any that cause
                    # satisfiability problems.
                    for replacement in candidates:
                        if replacement is spec:
                            break

                        # Replace spec with the candidate and normalize
                        copy = self.copy()
                        copy[spec.name]._dup(replacement, deps=False)

                        try:
                            # If there are duplicate providers or duplicate
                            # provider deps, consolidate them and merge
                            # constraints.
                            copy.normalize(force=True)
                            break
                        except SpecError:
                            # On error, we'll try the next replacement.
                            continue

When the loop is on the last member of the replacement candidates list generated by spack.concretizer.choose_virtual_or_external(spec) and an exception is raised, we still hit the continue statement, even though there are no further replacements to consider. The value of replacement remains the last replacement value checked (in this case, veclibfort) and gets substituted in for the spec being checked at the top-level spec. We return to the while loop back up in concretize(), and call self._concretize_helper() to concretize the spec, veclibfort and all. changes = (True, True, True) so changed becomes True and we find ourselves repeating the while loop.

The second time around in that loop, we find ourselves calling self.normalize(force=True). Except the value of self is a fully concretized spec containing veclibfort. self.normalze() calls a similar path to the one we had earlier, that resulted in the MultipleProviderError exceptions but where _expand_virtual_packages(self) would catch and ignore it, concretize() propagates it back to its caller where it eventually gets displayed on the user's terminal:

==> Error: Multiple providers found for 'blas': ['essl@6.1%xl_r@16.1 fflags="-qzerosize" ~cuda~ilp64 threads=openmp arch=linux-rhel7-ppc64le', 'veclibfort@0.4.2%xl_r@16.1 fflags="-qzerosize" +shared arch=linux-rhel7-ppc64le']
@alalazo
Copy link
Member

alalazo commented Apr 5, 2023

This bug was tightly related to the old concretizer. Closing as outdated.

@alalazo alalazo closed this as completed Apr 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working power
Projects
None yet
Development

No branches or pull requests

3 participants