-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One loop hangs Amoeba on Windows HIP #4194
Comments
That sounds like a compiler bug. It's a really simple loop, and unrolling should never affect correctness. We should probably notify AMD about it. @ex-rzr do you have any insight about this? |
I spent some time trying to build and run OpenMM-HIP on Windows with an older pre-release HIP SDK version a few months ago. @bdenhollander I see that you've done a lot of work on supporting Windows: https://github.com/bdenhollander/openmm-hip/commits/windows-compatibility Good job with narrowing it down, btw! Compiler issues are incredibly hard to debug because they often disappear with a minor change in code like adding printf or commenting something. Further steps to understand what happens may be running with As a workaround I have a crazy idea to "patch" a source code before compiling in @jdmaia Have you heard about similar issues on HIP SDK for Windows? |
Deja vu. This swapping code reminds me one compiler bug. We had an issue with one test in rocPRIM: As you can see, the code is very similar. So I found the exact place in a compiled code where the compiler missed one instructions (or more precisely: incorrectly removed one instruction). Then AMD's compiler developer fixed this bug: https://reviews.llvm.org/rGdf1782c2a2af9938ba4c5bacfab20d1ddebc82dd I wonder if this is indeed the same compiler bug and if the compiler from HIP SDK for Windows includes the fix. |
I generated .s files with and without unroll-windows-hip-amdgcn-amd-amdhsa-gfx1032.s.txt hipcc.bin.exe prints |
Retested this with Windows HIP SDK 5.7.1 and
|
After hundreds of display driver resets, I finally tracked down what hangs in
solveDIISMatrix
wheniteration > 0
on Windows HIP.openmm/plugins/amoeba/platforms/common/src/kernels/multipoleInducedField.cc
Lines 765 to 769 in 065e34a
The issue is resolved by adding
#pragma unroll 1
. This hang only happens on Windows HIP. Ubuntu 20.04 compiles kernels using Clang 15.x and Windows HIP bundles Clang 17.0.0 and could affect how the kernel is optimized by/O3
. Since this is OS dependent and the code is in common, I'm not sure what the best way to implement the fix would be.RX 6600 on Windows HIP. The massive amoebagk improvement is inline with previously reported Linux HIP results.
The text was updated successfully, but these errors were encountered: