Adding ROCm support for AMD GPUs #638

hisohara · 2024-04-26T04:09:36Z

This PR adds the support of ROCm with HIP for AMD GPUs. CUDA codes are converted to HIP API. Per the past discussion with maintainer, #ifdef __HIP_PLATFORM_AMD__ is used for HIP API. This HIP region is called only when HIP compiler is used. No change on CUDA codes.

To compile for ROCm, USE_HIP=Yes needs to be added on USE_GPU=Yes as follows:
$ USE_GPU=Yes USE_HIP=Yes pip install .

Tested hardware environment:
MI250, MI300A

Tested software environment:
ROCm 6.0.2 and ROCm 6.1

References

No change on usage for NVIDIA CUDA. $ USE_GPU=Yes pip install . For HIP, USE_HIP needs to be added as follows. $ USE_GPU=Yes USE_HIP=Yes pip install .

Verbose log outputs for pip install and CMakeis temporalily specified.

KowerKoint

Thank you for the first PR!

I'm not sure about AMD HIP, but the APIs of both looks alike. I think wrapping some types and functions of both architecture like you are doing with GTYPE is better than adding so much ifdefs and typing almost same codes.
I'm thinking of using cudaOccupancyMaxPotentialBlockSize to adapt to many NVIDIA GPUs in Flexible and Effienct Blocksize and Loopdim #628 . Is there any functions like this in HIP?

hisohara · 2024-04-28T15:51:18Z

Thanks for your review!

I agree with you. Current way of #ifdef/#endif shows the straightforward conversion from CUDA to HIP, but it would be hard to read and maintain. I'll consider the wrapping types/functions. Another way is to rely on HIP. HIP provides not only the compiler for AMD GPUs, but also the thin layer for CUDA. With the environment variable HIP_PLATFORM, it could choose the compiler, either hipcc (clang++) or nvcc. The latter way makes the code simple, but HIP would not be familiar for you, I assume you prefer the wrapper way, right?
Yes, we have hipOccupancyMaxPotentialBlockSize. This kind of information is summarized at Supported CUDA API

KowerKoint · 2024-04-30T01:06:52Z

Yes. Most of users and maintainers (including me) are familiar with CUDA, so I prefer wapping types/functions. Could you make change?
Thank you for the information. I'll try to deal with the change and test on MI100 and/or MI210 if Flexible and Effienct Blocksize and Loopdim #628 is accepted. In this PR, you may not change the policy to determine block size.

hisohara · 2024-04-30T04:59:38Z

No problem. Let me change the structure.
Sure. When your change on block size is accepted, I'll try to incorporate. Thanks for letting me know.

hisohara · 2024-05-07T01:21:27Z

Wrapping types and functions was applied for CUDA and HIP. For this purpose, I created gpu_wrapping.h. Could you please take a look again?
Regarding __shfl_sync(), it still relies on #ifdef __HIP_PLATFORM_AMD__. When HIP supports it, I'll make another PR

(CC: @ckime-amd)

ckime-amd · 2024-05-07T18:10:19Z

Thanks for the notification @hisohara, looks like a good contrib to enable AMD GPU support. 👍

KowerKoint

Thank you for defining alias.
It looks great.
I have one change-request.

KowerKoint · 2024-05-10T04:25:36Z

src/gpusim/CMakeLists.txt

@@ -1,4 +1,43 @@
 cmake_minimum_required(VERSION 3.0)
+project(qulacs LANGUAGES HIP)


When building on CUDA GPU, this line causes the following error.

-- The HIP compiler identification is unknown CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:102 (message): Failed to find ROCm root directory. Call Stack (most recent call first): src/gpusim/CMakeLists.txt:2 (project)

hisohara · 2024-05-10T12:05:31Z

Thanks for taking look at it again. I should have realized earlier..
I updated rocm branch. Could you please check it?

KowerKoint

Thank you!

codecov · 2024-05-14T00:58:11Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.13%. Comparing base (ac5da89) to head (75150b2).
Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #638      +/-   ##
==========================================
+ Coverage   85.98%   88.13%   +2.15%     
==========================================
  Files         127      137      +10     
  Lines       13253    16273    +3020     
  Branches     1695     2174     +479     
==========================================
+ Hits        11396    14343    +2947     
- Misses       1824     1837      +13     
- Partials       33       93      +60

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

KowerKoint · 2024-05-14T04:05:44Z

Currently CI of macOS has been failed. I fixed on Issue of brew upgrade #641. Could you merge origin/main to reflect it?
Flexible and Effienct Blocksize and Loopdim #628 is accepted and will be merged soon. We should wrap cudaOccupancyMaxBlockSize and resolve conflicts.

hisohara · 2024-05-14T10:46:24Z

Thanks for your approval!

OK. Already done.
Sure. When it is reflected in main, let me add the modification on my branch.

KowerKoint · 2024-05-15T02:17:00Z

Thank you. Now it's reflected.

Adopt hipOccupancyMaxPotentialBlockSize()

hisohara · 2024-05-22T03:02:40Z

Let me update current status. When hipOccupancyMaxPotentialBlockSize() is called, SEGV is happened. Function pointer does not seem to be passed correctly. With simple application, direct passing of kernel onto hipOccupancyMaxPotentialBlockSize() is confirmed to be executed. Indirect passing through get_block_size_to_maximize_occupancy is not succeeded. Let me discuss with my colleagues internally.

Meanwhile, hard coding like the following is acceptable?

template <typename F>
inline unsigned int get_block_size_to_maximize_occupancy(F func,
    unsigned int dynamic_s_mem_size = 0, unsigned int block_size_limit = 0) {
    int block_size, min_grid_size;
#ifdef __HIP_PLATFORM_AMD__
    block_size = 512;
//    TODO: Investigate SEGV issue
//    hipOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, func,
//        dynamic_s_mem_size, block_size_limit);
#else
    cudaOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, func,
        dynamic_s_mem_size, block_size_limit);
#endif
    return block_size;
}

hisohara · 2024-05-22T16:38:39Z

Please ignore my last comment. I've found the workaround with #define macro as follows:

#ifdef __HIP_PLATFORM_AMD__
#define get_block_size_to_maximize_occupancy(x) ({ \
    int min_grid_size, block_size; \
    hipOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, (x), 0, 0); \
    block_size; \
})
#else
template <typename F>
inline unsigned int get_block_size_to_maximize_occupancy(F func,
    unsigned int dynamic_s_mem_size = 0, unsigned int block_size_limit = 0) {
    int block_size, min_grid_size;
    cudaOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, func,
        dynamic_s_mem_size, block_size_limit);
    return block_size;
}
#endif

I've confirmed no major performance impact. At QCBMopt4 with 25 qubits, it shows slightly faster by 2.4%. Hope that this change is acceptable.

KowerKoint

Thank you. That seems nice.

hisohara · 2024-05-25T09:25:14Z

Thanks for your approval!

KowerKoint · 2024-05-28T00:26:42Z

@hisohara I'm sorry, could you run script/format.sh and commit again?
We require cpp/cuda sources to be formatted in specified style.

hisohara · 2024-05-28T04:33:02Z

I updated rocm branch by applying clang-format. Also cached up with the latest main commit. Could you please check?

KowerKoint · 2024-05-28T06:50:02Z

Format is broken while merging main branch.
Re-format, please 🙏

hisohara · 2024-05-28T06:57:53Z

I executed format.sh again, but nothing seems to be updated as follows:

hisohara@gbtrocm:~/Projects/GITHUB/qulacs$ git log -1
commit 5e031c4306d3f219c6a75199191fabba57cbf17d (HEAD -> hip-format, origin/rocm, AMD-HPC-qulacs/rocm, rocm, hip-format-v2)
Merge: e3a8e531 ac5da893
Author: Hisaki Ohara <Hisaki.Ohara@amd.com>
Date:   Tue May 28 04:03:59 2024 +0000

    Merge branch 'main' into hip-format
hisohara@gbtrocm:~/Projects/GITHUB/qulacs$ ./script/format.sh
hisohara@gbtrocm:~/Projects/GITHUB/qulacs$ git diff
hisohara@gbtrocm:~/Projects/GITHUB/qulacs$

Could you please tell me what file is broken on your side?

hisohara · 2024-05-28T10:35:58Z

My using clang-format was 14.0.0 on Ubuntu 22.04, while GitHub Action uses 10.0-50. I changed it to 11.1.0-6 which is the oldest version available for Ubuntu 22.04. It could apply src/cppsim/circuit.cpp where the test was failed before. Hope that the issue is resolved..

hisohara · 2024-05-30T11:05:47Z

Hi, please let me know if there is anything from my side to proceed.

KowerKoint · 2024-05-30T14:03:00Z

Sorry for late reply.
Now it's all OK.
I'll merge this.
Thank you for your contribution.

hisohara · 2024-05-30T15:03:12Z

Finally! Thanks so much for your continuous support and accepting this big change. I'm very happy to contribute to this project.

hisohara added 13 commits March 21, 2024 15:16

HIP support added for CMake

06b0920

No change on usage for NVIDIA CUDA. $ USE_GPU=Yes pip install . For HIP, USE_HIP needs to be added as follows. $ USE_GPU=Yes USE_HIP=Yes pip install .

Changed CMake scripts for HIP compilation.

9c39979

Verbose log outputs for pip install and CMakeis temporalily specified.

Change CMAKE_PREFIX_PATH for ROCm 6.0

8e10d41

HIPIFY memory_ops.cu and relevant header files

08f3d7a

HIPIFY util.cu

0aa98fb

HIPIFY stat_ops.cu and related header file

f436951

HIPIFY update_ops_named.cu and related header file

344266c

HIPIFY update_ops_multi.cu

8595d8a

HIPIFY update_ops_single.cu

bbd0498

Merge branch 'main' into rocm

861488c

Adopt hiprand instead of rocrand

765f309

Change CMAKE_BUILD_TYPE to Release

fa25a29

Merge branch 'main' into rocm

e79514d

KowerKoint reviewed Apr 26, 2024

View reviewed changes

hisohara added 12 commits May 1, 2024 02:03

First effort of wrapping on allocate_cuda_stream_host() for CUDA/HIP

b969420

Wrapping memory_ops.cu for CUDA/HIP

9f8eb5c

Add assert.h for memory_ops.cu

500955f

Wrapping util.cu for CUDA/HIP

d7e3eaa

Wrapping stat_ops.cu for CUDA/HIP

9d95a3e

Wrapping update_ops_named.cu for CUDA/HIP

7a66f98

Wrapping update_ops_multi.cu for CUDA/HIP

7e3fbdc

Wrapping update_ops_single.cu for CUDA/HIP

6e1aeff

Merge branch 'main' into rocm

ac6b2d4

Merge branch 'main' into rocm-wrapping

1b96fab

Merge branch 'rocm-wrapping' into rocm

7a35d6e

Replace checkCudaErrors with checkGpuErrors for wrapping of CUDA/HIP

575bc3d

KowerKoint reviewed May 10, 2024

View reviewed changes

CMake project() is required only for HIP

1e94d05

KowerKoint approved these changes May 14, 2024

View reviewed changes

Merge branch 'main' into rocm

632ea13

Merge branch 'main' into rocm-blocksize

de9bac9

Adopt hipOccupancyMaxPotentialBlockSize()

Define macro of hipOccupancyMaxPotentialBlockSize() for HIP

910765d

Fix the return of get_block_size_to_maximize_occupancy for HIP

219ad6d

KowerKoint approved these changes May 24, 2024

View reviewed changes

hisohara added 2 commits May 28, 2024 03:59

Format with clang-format

e3a8e53

Merge branch 'main' into hip-format

5e031c4

Format with clang-format-11

75150b2

KowerKoint merged commit bea30ac into qulacs:main May 30, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding ROCm support for AMD GPUs #638

Adding ROCm support for AMD GPUs #638

hisohara commented Apr 26, 2024

KowerKoint left a comment •

edited

hisohara commented Apr 28, 2024 •

edited

KowerKoint commented Apr 30, 2024

hisohara commented Apr 30, 2024

hisohara commented May 7, 2024 •

edited

ckime-amd commented May 7, 2024

KowerKoint left a comment

KowerKoint May 10, 2024

hisohara commented May 10, 2024

KowerKoint left a comment

codecov bot commented May 14, 2024 •

edited

KowerKoint commented May 14, 2024

hisohara commented May 14, 2024

KowerKoint commented May 15, 2024

hisohara commented May 22, 2024

hisohara commented May 22, 2024 •

edited

KowerKoint left a comment

hisohara commented May 25, 2024

KowerKoint commented May 28, 2024

hisohara commented May 28, 2024

KowerKoint commented May 28, 2024

hisohara commented May 28, 2024

hisohara commented May 28, 2024 •

edited

hisohara commented May 30, 2024

KowerKoint commented May 30, 2024

hisohara commented May 30, 2024

		@@ -1,4 +1,43 @@
		cmake_minimum_required(VERSION 3.0)
		project(qulacs LANGUAGES HIP)

Adding ROCm support for AMD GPUs #638

Adding ROCm support for AMD GPUs #638

Conversation

hisohara commented Apr 26, 2024

KowerKoint left a comment • edited

Choose a reason for hiding this comment

hisohara commented Apr 28, 2024 • edited

KowerKoint commented Apr 30, 2024

hisohara commented Apr 30, 2024

hisohara commented May 7, 2024 • edited

ckime-amd commented May 7, 2024

KowerKoint left a comment

Choose a reason for hiding this comment

KowerKoint May 10, 2024

Choose a reason for hiding this comment

hisohara commented May 10, 2024

KowerKoint left a comment

Choose a reason for hiding this comment

codecov bot commented May 14, 2024 • edited

Codecov Report

KowerKoint commented May 14, 2024

hisohara commented May 14, 2024

KowerKoint commented May 15, 2024

hisohara commented May 22, 2024

hisohara commented May 22, 2024 • edited

KowerKoint left a comment

Choose a reason for hiding this comment

hisohara commented May 25, 2024

KowerKoint commented May 28, 2024

hisohara commented May 28, 2024

KowerKoint commented May 28, 2024

hisohara commented May 28, 2024

hisohara commented May 28, 2024 • edited

hisohara commented May 30, 2024

KowerKoint commented May 30, 2024

hisohara commented May 30, 2024

KowerKoint left a comment •

edited

hisohara commented Apr 28, 2024 •

edited

hisohara commented May 7, 2024 •

edited

codecov bot commented May 14, 2024 •

edited

hisohara commented May 22, 2024 •

edited

hisohara commented May 28, 2024 •

edited