Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU/TeraScale: Node emitted out of order - late assertion with clpeak #55698

Open
illwieckz opened this issue May 25, 2022 · 4 comments
Open

Comments

@illwieckz
Copy link
Contributor

illwieckz commented May 25, 2022

I'm currently running LLVM from commit 69f7f15 (very recent but old enough to avoid opaque pointer mismatch with mesa and other stuff) and Mesa from commit b2b810ebff657b3d24d93a1fdbd6adc79bc38153.

When running the clpeak OpenCL benchmark on R600/TeraScale devices, I get this assertion:

llvm-project/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:285: llvm::Register llvm::InstrEmitter::getVR(llvm::SDValue, lvm::DenseMap<llvm::SDValue, llvm::Register>&): Assertion `I != VRBaseMap.end() && "Node emitted out of order - late"' failed.

The source code of the OpenCL benchmark:

I reproduced it on:

  • AMD Radeon HD 6970 RV970 Cayman XT (TeraScale 3)
  • AMD Radeon HD 5870 RV870 Cypress XT (TeraScale 2)

Those are the errors I get:

$ clpeak -d 0

Platform: Clover
  Device: AMD CAYMAN (DRM 2.50.0 / 5.13.0-44-generic, LLVM 15.0.0)
    Driver version  : 22.2.0-devel (Linux x64)
    Compute units   : 12
    Clock frequency : 880 MHz
clpeak: llvm-project/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:285: llvm::Register llvm::InstrEmitter::getVR(llvm::SDValue, llvm::DenseMap<llvm::SDValue, llvm::Register>&): Assertion `I != VRBaseMap.end() && "Node emitted out of order - late"' failed.
Aborted

$ clpeak -d 1

Platform: Clover
  Device: AMD CYPRESS (DRM 2.50.0 / 5.13.0-44-generic, LLVM 15.0.0)
    Driver version  : 22.2.0-devel (Linux x64)
    Compute units   : 10
    Clock frequency : 850 MHz
clpeak: llvm-project/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:285: llvm::Register llvm::InstrEmitter::getVR(llvm::SDValue, llvm::DenseMap<llvm::SDValue, llvm::Register>&): Assertion `I != VRBaseMap.end() && "Node emitted out of order - late"' failed.
Aborted

For reference, this is what I get on GCN1 and GCN2 (assuming it works):

$ clpeak -d 0

Platform: Clover
  Device: AMD Radeon R9 390 Series (hawaii, LLVM 15.0.0, DRM 3.41, 5.13.0-45-generic)
    Driver version  : 22.2.0-devel (Linux x64)
    Compute units   : 44
    Clock frequency : 1080 MHz

    Global memory bandwidth (GBPS)
      float   : 11.29
      float2  : 11.28
      float4  : 11.27
      float8  : 10.57
      float16 : 4.77

    Single-precision compute (GFLOPS)
      float   : 5881.14
      float2  : 5895.50
      float4  : 5880.94
      float8  : 5824.35
      float16 : 5799.34

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 747.48
      double2  : 749.21
      double4  : 744.47
      double8  : 742.23
      double16 : 738.79

    Integer compute (GIOPS)
      int   : 1946.12
      int2  : 1960.41
      int4  : 1975.52
      int8  : 1945.99
      int16 : 1977.48

    Integer compute Fast 24bit (GIOPS)
      int   : 5566.71
      int2  : 5400.64
      int4  : 5528.86
      int8  : 5435.56
      int16 : 5298.62

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 18.55
      enqueueReadBuffer               : 19.62
      enqueueWriteBuffer non-blocking : 18.71
      enqueueReadBuffer non-blocking  : 19.38
      enqueueMapBuffer(for read)      : 32317.29
        memcpy from mapped ptr        : 19.53
      enqueueUnmap(after write)       : 32987.46
        memcpy to mapped ptr          : 18.72

    Kernel launch latency : 67.25 us

$ clpeak -d 1

Platform: Clover
  Device: AMD Radeon HD 8500 series (oland, LLVM 15.0.0, DRM 3.41, 5.13.0-45-generic)
    Driver version  : 22.2.0-devel (Linux x64)
    Compute units   : 5
    Clock frequency : 780 MHz

    Global memory bandwidth (GBPS)
      float   : 5.53
      float2  : 5.53
      float4  : 5.53
      float8  : 5.33
      float16 : 3.41

    Single-precision compute (GFLOPS)
      float   : 62.14
      float2  : 62.10
      float4  : 61.99
      float8  : 61.77
      float16 : 61.53

    No half precision support! Skipped

    Double-precision compute (GFLOPS)
      double   : 31.09
      double2  : 31.07
      double4  : 31.02
      double8  : 30.92
      double16 : 30.83

    Integer compute (GIOPS)
      int   : 24.88
      int2  : 24.86
      int4  : 24.82
      int8  : 24.74
      int16 : 24.81

    Integer compute Fast 24bit (GIOPS)
      int   : 122.75
      int2  : 122.29
      int4  : 121.36
      int8  : 119.55
      int16 : 116.07

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer              : 18.78
      enqueueReadBuffer               : 19.24
      enqueueWriteBuffer non-blocking : 18.70
      enqueueReadBuffer non-blocking  : 18.95
      enqueueMapBuffer(for read)      : 21941.06
        memcpy from mapped ptr        : 19.30
      enqueueUnmap(after write)       : 15561.46
        memcpy to mapped ptr          : 18.81

    Kernel launch latency : 48.15 us
@llvmbot
Copy link
Collaborator

llvmbot commented May 25, 2022

@llvm/issue-subscribers-backend-amdgpu

@llvmbot
Copy link
Collaborator

llvmbot commented May 25, 2022

@llvm/issue-subscribers-opencl

@jayfoad
Copy link
Contributor

jayfoad commented Jan 12, 2023

Is this still a problem? If so can you give some more instructions on how to reproduce it from scratch?

@lorn10
Copy link

lorn10 commented May 23, 2023

@jayfoad
Well, there are literally speaking problems since almost ever with clpeak and TeraScale based GPUs in Mesa clover. 😉

For example, the following old old Mesa bugs might be somehow related to this or other LLVM issues:

Mesa bug #586 AMD JUNIPER: "clpeak causes a GPU hang"
Mesa bug #610 AMD CAYMAN: "segfault when running clpeak --global-bandwidth"
Mesa bug #638 AMD PALM: "clpeak: Bus error (core dumped) & lots of GPU lockup"

I can confirm Mesa bug 610 also for an AMD TURKS based Radeon HD 6770M GPU. Whatever, maybe there exist the chance that a recent R600 LLVM fix will have some impact also here, - see 1706960. And more information can be found in #55679.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants