Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One loop hangs Amoeba on Windows HIP #4194

Open
bdenhollander opened this issue Aug 18, 2023 · 5 comments
Open

One loop hangs Amoeba on Windows HIP #4194

bdenhollander opened this issue Aug 18, 2023 · 5 comments

Comments

@bdenhollander
Copy link
Contributor

After hundreds of display driver resets, I finally tracked down what hangs in solveDIISMatrix when iteration > 0 on Windows HIP.

for (k = 0; k < rank; k++) {
real t = b[p][k];
b[p][k] = b[j][k];
b[j][k] = t;
}

The issue is resolved by adding #pragma unroll 1. This hang only happens on Windows HIP. Ubuntu 20.04 compiles kernels using Clang 15.x and Windows HIP bundles Clang 17.0.0 and could affect how the kernel is optimized by /O3. Since this is OS dependent and the code is in common, I'm not sure what the best way to implement the fix would be.

RX 6600 on Windows HIP. The massive amoebagk improvement is inline with previously reported Linux HIP results.

Benchmark OpenCL ns/day HIP ns/day Ratio
amoebagk 1.38817 15.4375 1112.08%
amoebapme 3.61985 6.11883 169.04%
@bdenhollander bdenhollander changed the title One loop hang Amoeba on Windows HIP One loop hangs Amoeba on Windows HIP Aug 18, 2023
@peastman
Copy link
Member

That sounds like a compiler bug. It's a really simple loop, and unrolling should never affect correctness. We should probably notify AMD about it.

@ex-rzr do you have any insight about this?

@ex-rzr
Copy link
Contributor

ex-rzr commented Aug 18, 2023

I spent some time trying to build and run OpenMM-HIP on Windows with an older pre-release HIP SDK version a few months ago.
I didn't finish my attempt then because it was my personal initiative (OpenMM-HIP is supported by AMD developers currently) and needed to switch to another project.
And I definitely remember some issues with AMOEBA but I'm not sure whether they were in tests or benchmarks and what kind of issues (hangs or just failures).

@bdenhollander I see that you've done a lot of work on supporting Windows: https://github.com/bdenhollander/openmm-hip/commits/windows-compatibility
I'll try to find time next week and build and run your branch on our GPU. Unfortunately it's also RDNA. If it's indeed a compiler bug then trying it on another architecture may provide some additional info.

Good job with narrowing it down, btw! Compiler issues are incredibly hard to debug because they often disappear with a minor change in code like adding printf or commenting something.

Further steps to understand what happens may be running with OPENMM_SAVE_TEMPS=pathwheresavetempsto on Linux and Windows and comparing corresponding .s files. If you are going to do this, please upload them, I'm also curious.

As a workaround I have a crazy idea to "patch" a source code before compiling in HipContext::createModule: find for (k = 0; k < rank; k++) { in a string and add #pragma unroll 1\n before it. It sounds a bit stupid but it does not require modifying main sources and hence depending on a very recent version of OpenMM.

@jdmaia Have you heard about similar issues on HIP SDK for Windows?

@ex-rzr
Copy link
Contributor

ex-rzr commented Aug 18, 2023

Deja vu. This swapping code reminds me one compiler bug. We had an issue with one test in rocPRIM:
https://github.com/ROCmSoftwarePlatform/rocPRIM/blob/develop/rocprim/include/rocprim/warp/detail/warp_sort_stable.hpp#L105

As you can see, the code is very similar. So I found the exact place in a compiled code where the compiler missed one instructions (or more precisely: incorrectly removed one instruction). Then AMD's compiler developer fixed this bug: https://reviews.llvm.org/rGdf1782c2a2af9938ba4c5bacfab20d1ddebc82dd

I wonder if this is indeed the same compiler bug and if the compiler from HIP SDK for Windows includes the fix.
@bdenhollander Could you check clang's version (with a commit hash) with hipconfig?

@bdenhollander
Copy link
Contributor Author

I generated .s files with and without #pragma unroll 1 on Windows. Diffing ignoring whitespace and numbers shows differences in lines 3561-4090.

unroll-windows-hip-amdgcn-amd-amdhsa-gfx1032.s.txt
hang-windows-hip-amdgcn-amd-amdhsa-gfx1032.s.txt

hipcc.bin.exe prints clang version 17.0.0 (git@github.amd.com:Compute-Mirrors/llvm-project e3201662d21c48894f2156d302276eb1cf47c7be) when it launches.

@bdenhollander
Copy link
Contributor Author

Retested this with Windows HIP SDK 5.7.1 and #pragma unroll 1 is still needed. The clang hash reported by hipcc.bin.exe has changed since 5.5.1.

C:\AMD\ROCm\5.7\bin\hipcc.bin.exe --version
HIP version: 5.7.32000-193a0b56e
clang version 17.0.0 (git@github.amd.com:Compute-Mirrors/llvm-project 6e709f613348e5258188527d11ee8d78376f26b7)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: C:\AMD\ROCm\5.7\bin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants