Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usage with Rocm windows for hip code compilation and documentation #342

Closed
LuisB79 opened this issue Apr 22, 2022 · 49 comments
Closed

Usage with Rocm windows for hip code compilation and documentation #342

LuisB79 opened this issue Apr 22, 2022 · 49 comments

Comments

@LuisB79
Copy link

LuisB79 commented Apr 22, 2022

First, where is the documentation?, after installation in wsl2 it told me the command antares didn't exist, second i have my kernel.hip.cpp and source .cpp files, how could i compile that?, do i need to install rocm to compile it for gfx 1031?, or can antares compile that?

@LuisB79 LuisB79 changed the title Usage with Rocm windows Usage with Rocm windows for hip code compilation and documentation Apr 23, 2022
@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial.

BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

it seems restarting ubuntu in wls2 helped, the BACKEND=c-rocm_win64 antares gave me [Antares] Incorrect compute kernel from evaluator. I havent installed the rocm compiler, i will proceed to do so

@ghostplant
Copy link
Contributor

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

i haven't installed rocm yet, do i need to install a previous version?, or do i install the newest 5.1.1 which has support for gfx1030

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial.

BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

imagen

@ghostplant
Copy link
Contributor

Have you installed with pip3 install --upgrade antares in wsl2? After that, antares cmd will be available in PATH. Secondly, you don't need to compile it from source. Just run BACKEND=c-rocm_win64 antares for a trial.
BTW, in wsl2, you need to install rocm compiler according to 'https://sep5.readthedocs.io/en/latest/Installation_Guide/Installation-Guide.html#performing-an-opencl-only-installation-of-rocm', making sure command /opt/rocm/bin/hipcc is installed in wsl successfully.

imagen

These two packages dkms rocm-dkms is not needed for WSL. Please install the rest packages and finally ensure command /opt/rocm/bin/hipcc can work.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

ok i installed rocm-opencl-dev and rocm-dev afterthat i got this
imagen

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

imagen

am i supposed to get nothing?

@ghostplant
Copy link
Contributor

That's great. Next you need to install an upgrade antares version.

pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.

# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

i have succesfully updated, however i got a hip error no binary for gpu :c
imagen

That's great. Next you need to install an upgrade antares version.

pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.

# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares

@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

gfx1031 is possibly not the corresponding spec name for your GPU.

Or your AMD driver for Windows is not up-to-date,

or not in the worst case, not supporting this spec.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

gfx1031 is possibly not the corresponding spec name for your GPU.

how can i know?

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

My GPU is an RX 6800m which it's pretty much a rx6700 xt with tdp limit of 145, ubuntu said it was a gfx 1031 when i installed it

@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

Please guess and try other numbers like 1030, 1010, etc.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Please guest and try other numbers like 1030, 1010, etc.

i have 2 amd gpu doe, the rx 6800m and a vega one that is integrated, do i need to specify that?

@ghostplant
Copy link
Contributor

What is the gfx number for the vega one?

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

What is the gfx number for the vega one?

is there a command to check that?

@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

What is the full model name of Vega GPU?

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

it literally just says amd radeon tm graphics, according to the wiki its a vega 8 gpu

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

the rx6800m says it's navi22 XTM

@ghostplant
Copy link
Contributor

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

i do not have a mux switch, i will try to select ubuntu or wsl to use the rx6800m

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

imagen
imagen
new error?

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

imagen
?

@ghostplant
Copy link
Contributor

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

imagen ?

Nop, rocminfo is not the suitable test command. I don't think the "new error" is related to our topic. After you disable Vega GPU in "Windows Device Manager", Please try:

antares clean
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Maybe you can temporarily disable the Vega GPU in windows device manager for this test.

imagen ?

Nop, rocminfo is not the suitable test command. I don't think the "new error" is related to our topic. After you disable Vega GPU in "Windows Device Manager", Please try:

antares clean
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares

i fear that if i disable it i won't have a screen input, since the vega 8 gpu is the one connected to the laptop display, not the rx6800m

@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

OK, since you don't know which GPU is the enabled one.

Firstly please try which of the following typical spec settings can work:

antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

OK, since you don't know which GPU is the enabled one.

Firstly please try which of the following typical spec settings can work:

antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares

Nothing, not a single one worked

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

OK, since you don't know which GPU is the enabled one.

Firstly please try which of the following typical spec settings can work:

antares clean
AMDGFX=gfx803 BACKEND=c-rocm_win64 antares
AMDGFX=gfx900 BACKEND=c-rocm_win64 antares
AMDGFX=gfx902 BACKEND=c-rocm_win64 antares
AMDGFX=gfx906 BACKEND=c-rocm_win64 antares
AMDGFX=gfx908 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1010 BACKEND=c-rocm_win64 antares
AMDGFX=gfx1030 BACKEND=c-rocm_win64 antares

does it affect that i'm, in wsl2?

@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

Please open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

For the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    int gpu_count = -1;
    LOAD_ONCE(hipGetDeviceCount, int (*)(int*));
    CHECK(0 == hipGetDeviceCount(&gpu_count), "Failed to run hipGetDeviceCount().");
    fprintf(stderr, "@@ hipGetDeviceCount = %d\n", gpu_count);

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(dev), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares and show the output of logging, which will include whether Windows ROCm driver detects at least 1 GPU.

@ghostplant
Copy link
Contributor

If you see @@ hipGetDeviceCount = 0, it means the current Windows ROCm driver doesn't even recognize at least 1 GPU from two you have. If it is @@ hipGetDeviceCount = 1, it means it is supported, but the gfx number is incorrect.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

If you see @@ hipGetDeviceCount = 0, it means the current Windows ROCm driver doesn't even recognize at least 1 GPU from two you have. If it is @@ hipGetDeviceCount = 1, it means it is supported, but the gfx number is incorrect.

imagen
what does 2 mean?

@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

2 means both 2 gpu will be supported. (Vega 8 and RX6700). Thus, you need to link to correct GPU ID and correct GFX number:

Please re-open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

Similarly, for the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares.

The main difference is that this will use the 2nd GPU for a trial.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

imagen
i don't know if this helps but this is what the rx6800m says in the amd driver

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

2 means both 2 gpu will be supported. (Vega 8 and RX6700). Thus, you need to link to correct GPU ID and correct GFX number:

Please re-open this file via vim:

vi ~/.local/lib/python3.8/site-packages/antares_core/backends/c-rocm_win64/include/backend.hpp

Similarly, for the content, please fully replace the original init function with this updated function:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

After saving, please re-run AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares.

The main difference is that this will use the 2nd GPU for a trial.

imagen
same error

@ghostplant
Copy link
Contributor

How did you get "a hip error no binary"? The new change will in the wost case throw that error again.
https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png

Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

How did you get "a hip error no binary"? The new change will in the wost case throw that error again. https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png

Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?

to get that error i did this

That's great. Next you need to install an upgrade antares version.

pip3 install --upgrade antares==0.3.13.1 # If it fails, it means the PYPI repo is not up-to-date, please re-run this command until it succeed.

# Then, try show the output for this command:
AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

How did you get "a hip error no binary"? The new change will in the wost case throw that error again. https://user-images.githubusercontent.com/95400651/164879095-88be9121-a4c3-4afb-8f94-491f5447adae.png

Can you re-update antares with "pip3 install antares --upgrade" (possibly multiple times if fails)?

i managed to recreate it, i did "pip3 install antares --upgrade", then i ran AMDGFX=gfx1031 BACKEND=c-rocm_win64 antares, and i got this
imagen

(ghostplant said:)OK, this is a good state, but the init function are reverted as well, you need to re-edit that into:

  void init(int dev) {
    ab::hLibDll = LoadLibrary(AMDHIP64_LIBRARY_PATH);
    CHECK(hLibDll, "Cannot find `" AMDHIP64_LIBRARY_PATH "` !\n");

    LOAD_ONCE(hipSetDevice, int (*)(int));
    CHECK(0 == hipSetDevice(1), "Failed initialize AMD ROCm device with `" AMDHIP64_LIBRARY_PATH "` (No AMDGPU installed or enabled?).");
    _current_device = dev;
  }

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

doing that worked?
imagen

@ghostplant
Copy link
Contributor

Yes, it successful utilize RX6700 for computation.

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

i have my kernel.hip.cpp and source .cpp files, how can i compile them? (i'm literally new to this)

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Yes, it successful utilize RX6700 for computation.

for example with rocm in ubuntu (no wsl) this would be the command

/opt/rocm/hip/bin/hipcc source.cpp kernel.hip.cpp -o libbm3dhip.so -shared -fPIC -std=c++17 -O3 -I/home/comp/vapoursynth/include -Wno-unused-result --offload-arch=gfx1031 $(/opt/rocm/hip/bin/hipconfig --cxx_config), what do i change to do it with antares?

@LuisB79
Copy link
Author

LuisB79 commented Apr 23, 2022

Yes, it successful utilize RX6700 for computation.
imagen
imagen

hip info even though it is compiled it fails

@ghostplant
Copy link
Contributor

ghostplant commented Apr 23, 2022

This allows you execute __global__ function based source code on windows. It means, you can use driver-level programming for Win64 execution to make ROCm kernels run efficiently just with native RX6700 driver for Windows.

Antares treats ROCm for Windows as a special-hardware backend, and can generate any IR-based efficient kernel to build up that __global__ function, e.g. MatMul/Transpose/Conv/..

@LuisB79
Copy link
Author

LuisB79 commented Apr 27, 2022

This allows you execute __global__ function based source code on windows. It means, you can use driver-level programming for Win64 execution to make ROCm kernels run efficiently just with native RX6700 driver for Windows.

Antares treats ROCm for Windows as a special-hardware backend, and can generate any IR-based efficient kernel to build up that __global__ function, e.g. MatMul/Transpose/Conv/..

Can native hip kernels be compiled in antares to run hip code in windows without the need of wsl?

@ghostplant
Copy link
Contributor

You need to compile the hip kernels in wsl since hipcc is from wsl only, After that, hipcc will produce HSACO binary code for AMDGPU, this file can be directly loaded by Win64 program and no need to use wsl.

Briefly, you need wsl to compile all hip kernels to many HSACO files, and then you can detach wsl and write clean Win64 host program to interact with AMDGPU using these HSACOs.

@LuisB79
Copy link
Author

LuisB79 commented Apr 28, 2022

Please add more documentation.

@LuisB79 LuisB79 closed this as completed Apr 28, 2022
@Looong01
Copy link

Looong01 commented Jan 2, 2023

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

Hey, there are my problems:
image

And when I input this:
image
It returns this:
image

Andinstalled, automatic this things:
rocm-clang-ocl/focal,now 0.5.0.50401-8420.04 amd64 [installed,automatic]
rocm-cmake/focal,now 0.8.0.50401-84
20.04 amd64 [installed,automatic]
rocm-core/focal,now 5.4.1.50401-8420.04 amd64 [installed,automatic]
rocm-dbgapi/focal,now 0.68.0.50401-84
20.04 amd64 [installed,automatic]
rocm-debug-agent/focal,now 2.0.3.50401-8420.04 amd64 [installed,automatic]
rocm-dev/focal,now 5.4.1.50401-84
20.04 amd64 [installed]
rocm-device-libs/focal,now 1.0.0.50401-8420.04 amd64 [installed,automatic]
rocm-dkms/focal,now 5.4.1.50401-84
20.04 amd64 [installed]
rocm-gdb/focal,now 12.1.50401-8420.04 amd64 [installed,automatic]
rocm-llvm/focal,now 15.0.0.22465.50401-84
20.04 amd64 [installed,automatic]
rocm-ocl-icd/focal,now 2.0.0.50401-8420.04 amd64 [installed,automatic]
rocm-opencl-dev/focal,now 2.0.0.50401-84
20.04 amd64 [installed]
rocm-opencl/focal,now 2.0.0.50401-8420.04 amd64 [installed,automatic]
rocm-smi-lib/focal,now 5.0.0.50401-84
20.04 amd64 [installed,automatic]
rocm-utils/focal,now 5.4.1.50401-8420.04 amd64 [installed,automatic]
rocminfo/focal,now 1.0.0.50401-84
20.04 amd64 [installed, automatic]

I succeed to use this:
image

@Looong01
Copy link

Looong01 commented Jan 2, 2023

Is this command working?

/opt/rocm/bin/hipcc ~/.cache/antares/cache/_/my_kernel.cc --genco -O2 --amdgpu-target=gfx1031 -Wno-ignored-attributes -o /tmp/out.hsaco

Ok, I use this:
image

And then this:
image
It returns this:
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants