-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorrect compute kernel from evaluator on WSL2 #284
Comments
Here is more info if needed. The GPU is an RX580
|
I was able to get torch's ROCm version to install, but running the antares samples will use the CPU |
From #269, we have said Antares is used to launch ROCm device code using Windows native ROCm driver and help to port device code to Standard Win64 applications, not the one to restore the full-stack of ROCm and make pytorch to work in Linux mode (maybe this is possible in theory but letting it come true is definitely a costly task and I am not sure whether it is deserved to do using plenty of time). But your logs https://gist.github.com/3c77d7003a0a212d3f30abea8ee2b9d8 and statement of "running the antares samples will use the CPU" is not expected though. I am not sure whether you have installed Windows AMDGPU driver correctly. So can you paste the log by running |
|
The AMDGPU drivers are installed correctly on windows, the DLL mentioned in the docs is present |
If I'm reading the error correctly I might need to install an older ROCm version? But it could totally be related to the fact |
Your log is clear to show the root-cause: no existing My AMDGPU is Radeon7 which should match gfx906, if I remove argument + /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
..\..\..\hip_code_object.cpp:92: guarantee(false && "hipErrorNoBinaryForGpu: Coudn't find binary for current devices!") Once any + /opt/rocm/bin/hipcc .antares-module-tempfile.cu --amdgpu-target=gfx803 --amdgpu-target=gfx900 --amdgpu-target=gfx906 --amdgpu-target=gfx908 --amdgpu-target=gfx1010 --genco -Wno-ignored-attributes -O2 -o .antares-module-tempfile.cu.out
[EvalAgent] Results = {"K/0": 1504583185.0, "TPR": 0.000401532} This is also tested using AMD Navi 10 which should match gfx1010. Besides, some GPUs with same types may still have special suffix like |
I'm running an RX580, no idea what that one is in this naming scheme |
@Column01 Just found the news: https://www.videogames.ai/2021/01/07/RX580-ROCM-40.html Seems like new ROCm drivers >= 4.0 no longer support gfx803, and this is also the same in Linux ROCm. If you still want to use it for acceleration, you may need to consider "Linux + ROCm < 4.0" or "Windows + DirectX12 (over BACKEND=c-hlsl_win64)" |
ugh, This is extremely frustrating. AMD keeps making stupid decisions like this and pissing consumers off. I wholeheartedly regret buying my AMD card, |
@Column01 Windows AMDGPU with ROCm runtime was initially support by the end of 2020 while ROCm 4.0 is released after that? Maybe the an history version of Windows AMD driver is still supporting gfx803? But.. it is indeed annoying though. Fortunately, DirectX12 runtime is always able to use RTX580 for acceleration, and I think the performance is not far from acceleration by ROCm runtime. |
I will try 4.0 tomorrow and see if works |
Tried older ROCm versions, no dice (3.9 and 4.0 and 4.1 all have a longer error about So the repository for ROCm lists gfx8 GPUs as compatible but full support is not guaranteed. I think this is more likely due to its being ROCm inside WSL2 and not WSL1. I'm going to dual boot Linux and run my workflows in there and hopefully, ROCm will work properly there... |
After lots of tearing my hair out, I gave up. This is not an antares issue, it's an "AMD being stupid" issue. GFX803 is not supported and just straight-up broke at some point. RIP AMD consumer cards for computing... |
Following the other issue about this (#269) I went to install Antares to hopefully get ROCm on WSL2 using Ubuntu 20.04, and it seems to not work.
When running
sudo BACKEND=c-rocm_win64 make
to install the ROCm backend on windows in WSL2, it tries to evaluate a custom kernel at the end and fails to do so.The AMD HIP driver is present (
C:\Windows\System32\amdhip64.dll
) and when runningsudo apt install rocm-dev
shows it is already installed. Antares is visible in windows/mnt/c/Users/Colin/ubuntu_stuff/antares/
Here is the log during the evaluation: https://gist.github.com/3c77d7003a0a212d3f30abea8ee2b9d8
Should be noted that when running
/opt/rocm/bin/rocminfo
, it states:ROCk module is NOT loaded, possibly no GPU devices
. AMD has closed all issues regarding WSL2 and this error message...Kernel version is:
Linux Colin-Desktop 5.10.16.3-microsoft-standard-WSL2 #1 SMP Fri Apr 2 22:23:49 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
The text was updated successfully, but these errors were encountered: