Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clingo should allow two providers in the same DAG #27157

Open
3 tasks done
mwkrentel opened this issue Nov 2, 2021 · 5 comments
Open
3 tasks done

clingo should allow two providers in the same DAG #27157

mwkrentel opened this issue Nov 2, 2021 · 5 comments

Comments

@mwkrentel
Copy link
Member

mwkrentel commented Nov 2, 2021

Steps to reproduce

@alalazo Could we rethink the fix from #26718 and #26740 ?

The problem is that the fix breaks another concretization that I need,
namely, hpctoolkit +rocm. The problem is that hpctoolkit +rocm has
two dependencies, llvm-amdgpu and hsa-rocr-dev that both require
specifically libelf, and hpctoolkit requires elfutils.

I understand this is an advanced use case. elfutils and libelf
are both providers for the virtual package elf. The original
concretizer would never allow both of these in the same DAG. But in
this case, the two uses of elf are separate. The part of hpctoolkit
that uses elf does not call ROCM and there is no actual conflict.

Therefore, this should be allowed, at least by some means, even if
hpctoolkit has to sign a waiver that says, "I promise there is no
actual conflict with the two elf providers."

I can see where DAGs get bigger and virtual providers become more
common that this issue will become more common.

ping @becker33 @tgamblin

Error message

I tracked this down to the commit from #26740.

commit eded8f48dc629a5f3a72acf9c134731f8b89606c
Author: Massimiliano Culpo <massimiliano.culpo@gmail.com>
Date:   Thu Oct 14 23:06:41 2021 +0200

    ASP-based solver: add a rule for version uniqueness in virtual packages (#26740)

commit d9d0ceb726b301d68fd649f395fb94134b6f7943
Author: Christoph Junghans <junghans@lanl.gov>
Date:   Thu Oct 14 13:17:02 2021 -0600

    add  py-pyh5md and update py-espressopp (#26746)

You can see the problem with spack spec hpctoolkit +rocm.
(I'm leaving out large parts of the output.)

$ spack spec hpctoolkit +rocm    # d9d0ceb726b3
...
hpctoolkit@2021.05.15%gcc@8.4.1
    ^dyninst@11.0.1%gcc@8.4.1
        ^elfutils@0.185%gcc@8.4.1
    ^hip@4.3.1%gcc@8.4.1
        ^comgr@4.3.1%gcc@8.4.1
            ^llvm-amdgpu@4.3.1%gcc@8.4.1
                ^libelf@0.8.13%gcc@8.4.1
        ^hip-rocclr@4.3.1%gcc@8.4.1
            ^hsa-rocr-dev@4.3.1%gcc@8.4.1
                ^libelf@0.8.13%gcc@8.4.1

Note: the above includes both elfutils and libelf.

$ spack spec hpctoolkit +rocm    # eded8f48dc62
...
  condition(1377)
  condition(419)
  imposed_constraint(1377,"version_satisfies","elf","0")
  imposed_constraint(419,"version_satisfies","elf","1")
  virtual("elf")
==> Error: hpctoolkit+rocm does not satisfy unknown
error: spack spec hpctoolkit +rocm failed

Information on your system

  • Spack: 0.16.3-5133-f45ef21e37
  • Python: 3.6.8
  • Platform: linux-rhel8-zen2
  • Concretizer: clingo

General information

  • I have run spack debug report and reported the version of Spack/Python/Platform
  • I have searched the issues of this repo and believe this is not a duplicate
  • I have run the failing commands in debug mode and reported the output
@mwkrentel mwkrentel added bug triage The issue needs to be prioritized labels Nov 2, 2021
@alalazo alalazo added this to To do in Spack v0.18.0 release via automation Nov 2, 2021
@mwkrentel
Copy link
Member Author

I've taken a closer look at this and I think there is a reasonable solution.

First, to review the problem, libelf and elfutils are both
providers of the virtual package elf. hpctoolkit +rocm links with
dyninst which uses elfutils and also uses (but not directly links with)
hip which uses (transitively) libelf. (libelf stopped development in 2009,
but that's a separate problem.)

The path for elfutils is:

[    ]  hpctoolkit@2021.10.15%gcc@8.4.1~all-static~cray~cuda~debug~mpi+papi~rocm+viewer arch=linux-rhel8-x86_64
[bl  ]      ^dyninst@12.0.1%gcc@8.4.1~ipo+openmp~stat_dysect~static build_type=RelWithDebInfo arch=linux-rhel8-x86_64
[ l  ]          ^elfutils@0.186%gcc@8.4.1+bzip2~debuginfod~nls+xz arch=linux-rhel8-x86_64
                 ...

And for hip and libelf:

[    ]  hip@4.3.1%gcc@8.4.1~ipo build_type=Release patches=2a4190477b7d9206b9cd8d70770ba0bc007273cbe54772efb12f9ca2e37c0392,99190b4616edb362d48f9b265c3018a3c6339481b0729d9fe46185fca25bc54b,e276c4acf3d37712b6bea306fea34f539d3c4f743471e9da208b5eb17b16ae67 arch=linux-rhel8-x86_64
[bl  ]      ^comgr@4.3.1%gcc@8.4.1~ipo build_type=Release arch=linux-rhel8-x86_64
[bl  ]          ^llvm-amdgpu@4.3.1%gcc@8.4.1~ipo+openmp+rocm-device-libs build_type=Release patches=d999f3b235e655ee07f6dd2590302082feaa06d32c5c6b53aae9c5cf1e45b644 arch=linux-rhel8-x86_64
[ l  ]              ^libelf@0.8.13%gcc@8.4.1 arch=linux-rhel8-x86_64
                    ...

Although these are both paths of link dependencies, hpctoolkit doesn't
actually need a link depends on hip. (It currently does, but doesn't
need to.)

So the solution is:

  1. Spack needs to allow libelf and elfutils, both providers of elf
    in the same DAG.

  2. Hpctoolkit is a large, complex package with several subpackages.
    That is, not all dependencies are used in all places. We use hip
    header files and some roctracer files, we don't actually link with
    anything from hip, so we don't need a link dependency. Plus, hip is
    used in a different subpackage from dyninst/elfutils.

That is, spack should add -L elfutils (because of dyninst), but not
-L libelf. I can fix this in the hpctoolkit package.py file.

I tested this. I removed elf as a virtual package and set everything
to libelf or elfutils, and I changed the hip depends_on to build + run.
Then hpctoolkit built cleanly.

ping @scheibelp and @alalazo
I'd like to discuss this at Wed's spack meeting,
and I'd like to move this up in the priority list.

@scheibelp
Copy link
Member

(Note: this initial part predates #27157 (comment))

I added this to discuss for tomorrow's meeting (https://github.com/spack/spack/wiki/Telcon%3A-2021-12-15) but a few points:

  • I believe it has been stated before that eded8f4 addressed a mistake: it happened to work well in your case but in general could produce incorrect results
  • Specifically: we don't want a packages to link to different instances of a library that is intended for the same purpose (e.g. for MPI this would likely not work)
  • The challenge will be identifying precisely how this case is distinct from other virtuals (for example I could imagine that a pair of stateless libraries implementing the same API might actually interoperate)

(This part was written after #27157 (comment))

The modeling of the problem is simpler given #27157 (comment) (but unfortunately still challenging - I think in this case the good news is that we know how to move forward with it): if hpctoolkit doesn't have a link dependency on hip, then we may be able to reuse some of the work we are doing to allow separate concretization of build dependencies and also apply it to to other non-link dependencies.

@mwkrentel
Copy link
Member Author

Yes, I'm saying that in this case for hpctoolkit, it's safe to change
the dependency on hip, etc to build + run. Then, all I need is for
spack to allow libelf and elfutils to exist in the same DAG.

And yes, eventually there will be harder cases where a package really
needs link dependencies on two packages that don't coexist, like the
MPI test program that someone mentioned.

@scheibelp
Copy link
Member

@mwkrentel
Copy link
Member Author

Notes from today's spack meeting.

  1. For this specific use case, we may be able to replace uses of
    libelf with the virtural elf. That way, the entire DAG would use
    elfutils and not libelf.

I (krentel) will investigate this.

  1. Spack, hopefully, will investigate allowing multiple packages that
    would otherwise conflict in the same DAG (different providers of the
    same virtual, or different versions of the same package) when the
    paths to their LCA (least common ancestor) contain a non-link
    dependency, eg, build only or build+run.

As package DAGs grow ever larger, this case will become more common.

Note: In a DAG (as opposed to a tree), the LCA need not be unique.
So, I guess the requirement must be that the paths to any ancestor must
be broken by a non-link dependency (on one side or other).

In this case of hpctoolkit, (2) can be tested by changing the three
hip/rocm dependencies to:

depends_on('hip', type=('build', 'run'), when='+rocm')
depends_on('rocm-dbgapi', type=('build', 'run'), when='+rocm')
depends_on('roctracer-dev', type=('build', 'run'), when='+rocm')

It's likely that (1) will happen before (2) in which case, you'll have
to undo the fix for (1) to test a fix for (2) in this case. (Or just
construct something artificial.)

Good meeting, thanks @scheibelp !

@alalazo alalazo removed this from To do in Spack v0.18.0 release Apr 25, 2022
@alalazo alalazo added this to To do in Spack v0.19.0 release via automation Apr 25, 2022
@tgamblin tgamblin added this to the v0.20.0 milestone Nov 7, 2022
@tgamblin tgamblin removed this from To do in Spack v0.19.0 release Nov 7, 2022
@alalazo alalazo added feature proposal and removed bug triage The issue needs to be prioritized labels Apr 7, 2023
@alalazo alalazo removed this from the v0.20.0 milestone May 2, 2023
@alalazo alalazo added the revisit label May 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants