Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

provider conditionals with "when: '%openapi'" breaks spack - not all root specs are concretized, and no error messages. #43350

Open
3 tasks done
pbisbal1 opened this issue Mar 25, 2024 · 3 comments
Assignees
Labels
bug Something isn't working triage The issue needs to be prioritized

Comments

@pbisbal1
Copy link

Steps to reproduce

I'm trying to have spack use intel-oneapi-mkl as the provider for BLAS and LAPACK when the compiler is oneapi, and amdblis and amdlibflame when using using aocc or gcc. With @becker33's help I arrived at the following solution: (only BLAS section shown same arrangement for LAPACK, too):

  packages:
    blas:
      require:
      - spec: 'amdblis'
        when: '%aocc'
      - spec: 'amdblis'
        when: '%gcc'
      - spec: 'intel-oneapi-mkl'
        when: '%oneapi'

When I do spack concretize -f, the concretizer appears to complete w/o error, but when you look at the packages concretized, a number of them are missing. If I run spack install, the installation doesn't complete, but no errors are shown. I just notice that if I installed say, 100 packages, the last package installed will say something like [77/100], indicating that spack stopped installing before all 100 packages are installed, but there's no obvious error messages. If I comment out the lines above pertaining to intel-oneapi like this:

  packages:
    blas:
      require:
      - spec: 'amdblis'
        when: '%aocc'
      - spec: 'amdblis'
        when: '%gcc'
#   - spec: 'intel-oneapi-mkl'
#      when: '%oneapi'

The concretize/install process works as expected, but the packages compiled with %oneapi aren't using the desired BLAS provider (openblas is used instead, most likely because that's the default provider in etc/spack/defaults/packages.yaml) Here is a minimal spack.yaml I'm using to reproduce the problem:

spack:

  definitions:
  - serial_packages:
    - amdblis
    - amdlibm
  - mpi_packages:
    - hpl
    - intel-oneapi-mkl

  specs:
  - matrix:
    - ["$mpi_packages"]
    - ["%aocc@4.1.0"]
    - ["^openmpi%aocc@4.1.0"]
    exclude:
    - intel-oneapi-mkl
  - matrix:
    - ["$mpi_packages"]
    - ["%gcc@13.1.0"]
    - ["^openmpi%gcc@13.1.0"]
    exclude:
    - intel-oneapi-mkl
  - matrix:
    - ["$mpi_packages"]
    - ["%oneapi@2023.2.0"]
    - ["^openmpi%oneapi@2023.2.0"]
  view: false

  concretizer:
    unify: when_possible
    reuse: dependencies

  packages:
    blas:
      require:
      - spec: 'amdblis'
        when: '%aocc'
      - spec: 'amdblis'
        when: '%gcc'
      - spec: 'intel-oneapi-mkl'
        when: '%oneapi'
    hwloc:
      require:
      - ~netloc
      - ~rocm
    hpl: 
      require: 
      - '@2.3'
    mpi:
      require: openmpi
    openmpi:
      require:
      - '@4.1.6'
      - fabrics=hcoll,ucx
      - ~internal-hwloc
      - ~internal-pmix
      - ~rsh
      - schedulers=slurm

  modules:
    default:
      enable:
      - lmod
      roots:
        lmod: modules
      lmod:
        hierarchy:
        - mpi
        - lapack
        hash_length: 0
        include:
        - gcc
        - aocc
        - intel-oneapi
        exclude:
        - '%gcc@11.3.1'
        all:
          environment:
            set:
              '{name}_ROOT': '{prefix}'
        projections:
          all: '{name}/{version}'
        core_compilers:
        - gcc@=11.3.1

Error message

Without using spack -d, there are no obvious error messages from the concretizer. The only way to see the an error is to look at the concretizer output and see what packages were concretized. When I run the concretizer with -d I see these messages:

==> [2024-03-25-11:44:45.542524] UNKNOWN SYMBOL: attr("virtual_on_incoming_edges", NodeArgument(id='0', pkg='libiconv'), iconv)
==> [2024-03-25-11:44:45.542797] UNKNOWN SYMBOL: attr("external_conditions_hold", NodeArgument(id='0', pkg='ucx'), 0)
==> [2024-03-25-11:44:45.542864] UNKNOWN SYMBOL: attr("external_conditions_hold", NodeArgument(id='0', pkg='slurm'), 0)
==> [2024-03-25-11:44:45.542899] UNKNOWN SYMBOL: attr("virtual_on_incoming_edges", NodeArgument(id='0', pkg='zlib-ng'), zlib-api)
==> [2024-03-25-11:44:45.543262] UNKNOWN SYMBOL: attr("external_conditions_hold", NodeArgument(id='0', pkg='openssl'), 0)
==> [2024-03-25-11:44:45.543322] UNKNOWN SYMBOL: attr("virtual_on_incoming_edges", NodeArgument(id='0', pkg='intel-tbb'), tbb)
==> [2024-03-25-11:44:45.543889] UNKNOWN SYMBOL: attr("virtual_on_incoming_edges", NodeArgument(id='0', pkg='openmpi'), mpi)
==> [2024-03-25-11:44:45.543979] UNKNOWN SYMBOL: attr("external_conditions_hold", NodeArgument(id='0', pkg='hcoll'), 0)

But I see similar errors even when I comment out the %oneapi lines, so I don't think they're related to this problem. I've attached the output of running spack -d concretizer -f in both cases for you to look at:
concretizer_debug_output_w_oneapi.txt
concretizer_debug_output_wo_oneapi.txt

Information on your system

  • Spack: 0.21.1 (e30feda)
  • Python: 3.9.16
  • Platform: linux-rhel9-zen2
  • Concretizer: clingo

General information

  • I have run spack debug report and reported the version of Spack/Python/Platform
  • I have searched the issues of this repo and believe this is not a duplicate
  • I have run the failing commands in debug mode and reported the output
@pbisbal1 pbisbal1 added bug Something isn't working triage The issue needs to be prioritized labels Mar 25, 2024
@pbisbal1
Copy link
Author

As a workaround, I've done this which seems to work. In etc/spack/packages.yaml I added these lines to make intel-oneapi-mkl the default for blas and lapack:

packages:
  all:
    providers:
      blas: [intel-oneapi-mkl, amdblis]
      lapack: [intel-oneapi-mkl, amdlibflame]

This works in my test environment (the spack.yaml shown above). I haven't tested in my production environment yet. In addition to providing a usable workaround, this also seems to confirm that the problem is in using 'intel-oneapi-mkl' in a where: statement.

Since this is an AMD-based cluster, I'd prefer being able to make amdblis/amdlibflame the defaults and make intel-oneapi-mkl the exception when compiling with oneapi.

Prentice

@pbisbal1
Copy link
Author

pbisbal1 commented Mar 28, 2024

Correction to that workaround... The concretizer seems to concretize everything when using that workaround, but things are NOT being concretized as desired. hpl%oneapi is being concretized with amdblis as the blas provider instead of intel-oneapi-mkl.

@scheibelp
Copy link
Member

If I run spack install, the installation doesn't complete, but no errors are shown.

If you update to e78484f, the errors will no longer be silent

(that commit won't actually make it so the concretization succeeds, but it will prevent the failure from being silent)

#43475 (once a PR is created for it) should make use cases like this easier.

@scheibelp scheibelp self-assigned this Apr 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working triage The issue needs to be prioritized
Projects
None yet
Development

No branches or pull requests

2 participants