Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding ROCm support for AMD GPUs #638

Merged
merged 33 commits into from
May 30, 2024
Merged

Adding ROCm support for AMD GPUs #638

merged 33 commits into from
May 30, 2024

Conversation

hisohara
Copy link
Contributor

This PR adds the support of ROCm with HIP for AMD GPUs. CUDA codes are converted to HIP API. Per the past discussion with maintainer, #ifdef __HIP_PLATFORM_AMD__ is used for HIP API. This HIP region is called only when HIP compiler is used. No change on CUDA codes.

To compile for ROCm, USE_HIP=Yes needs to be added on USE_GPU=Yes as follows:
$ USE_GPU=Yes USE_HIP=Yes pip install .

Tested hardware environment:
MI250, MI300A

Tested software environment:
ROCm 6.0.2 and ROCm 6.1

References

Copy link
Contributor

@KowerKoint KowerKoint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the first PR!

  1. I'm not sure about AMD HIP, but the APIs of both looks alike. I think wrapping some types and functions of both architecture like you are doing with GTYPE is better than adding so much ifdefs and typing almost same codes.
  2. I'm thinking of using cudaOccupancyMaxPotentialBlockSize to adapt to many NVIDIA GPUs in Flexible and Effienct Blocksize and Loopdim #628 . Is there any functions like this in HIP?

@hisohara
Copy link
Contributor Author

hisohara commented Apr 28, 2024

Thanks for your review!

  1. I agree with you. Current way of #ifdef/#endif shows the straightforward conversion from CUDA to HIP, but it would be hard to read and maintain. I'll consider the wrapping types/functions. Another way is to rely on HIP. HIP provides not only the compiler for AMD GPUs, but also the thin layer for CUDA. With the environment variable HIP_PLATFORM, it could choose the compiler, either hipcc (clang++) or nvcc. The latter way makes the code simple, but HIP would not be familiar for you, I assume you prefer the wrapper way, right?
  2. Yes, we have hipOccupancyMaxPotentialBlockSize. This kind of information is summarized at Supported CUDA API

@KowerKoint
Copy link
Contributor

  1. Yes. Most of users and maintainers (including me) are familiar with CUDA, so I prefer wapping types/functions. Could you make change?
  2. Thank you for the information. I'll try to deal with the change and test on MI100 and/or MI210 if Flexible and Effienct Blocksize and Loopdim #628 is accepted. In this PR, you may not change the policy to determine block size.

@hisohara
Copy link
Contributor Author

  1. No problem. Let me change the structure.
  2. Sure. When your change on block size is accepted, I'll try to incorporate. Thanks for letting me know.

@hisohara
Copy link
Contributor Author

hisohara commented May 7, 2024

Wrapping types and functions was applied for CUDA and HIP. For this purpose, I created gpu_wrapping.h. Could you please take a look again?
Regarding __shfl_sync(), it still relies on #ifdef __HIP_PLATFORM_AMD__. When HIP supports it, I'll make another PR

(CC: @ckime-amd)

@ckime-amd
Copy link

Thanks for the notification @hisohara, looks like a good contrib to enable AMD GPU support. 👍

Copy link
Contributor

@KowerKoint KowerKoint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for defining alias.
It looks great.
I have one change-request.

@@ -1,4 +1,43 @@
cmake_minimum_required(VERSION 3.0)
project(qulacs LANGUAGES HIP)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When building on CUDA GPU, this line causes the following error.

-- The HIP compiler identification is unknown
CMake Error at /usr/share/cmake-3.22/Modules/CMakeDetermineHIPCompiler.cmake:102 (message):
  Failed to find ROCm root directory.
Call Stack (most recent call first):
  src/gpusim/CMakeLists.txt:2 (project)

@hisohara
Copy link
Contributor Author

Thanks for taking look at it again. I should have realized earlier..
I updated rocm branch. Could you please check it?

Copy link
Contributor

@KowerKoint KowerKoint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Copy link

codecov bot commented May 14, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.13%. Comparing base (ac5da89) to head (75150b2).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #638      +/-   ##
==========================================
+ Coverage   85.98%   88.13%   +2.15%     
==========================================
  Files         127      137      +10     
  Lines       13253    16273    +3020     
  Branches     1695     2174     +479     
==========================================
+ Hits        11396    14343    +2947     
- Misses       1824     1837      +13     
- Partials       33       93      +60     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@KowerKoint
Copy link
Contributor

  1. Currently CI of macOS has been failed. I fixed on Issue of brew upgrade #641. Could you merge origin/main to reflect it?
  2. Flexible and Effienct Blocksize and Loopdim #628 is accepted and will be merged soon. We should wrap cudaOccupancyMaxBlockSize and resolve conflicts.

@hisohara
Copy link
Contributor Author

Thanks for your approval!

  1. OK. Already done.
  2. Sure. When it is reflected in main, let me add the modification on my branch.

@KowerKoint
Copy link
Contributor

Thank you. Now it's reflected.

Adopt hipOccupancyMaxPotentialBlockSize()
@hisohara
Copy link
Contributor Author

Let me update current status. When hipOccupancyMaxPotentialBlockSize() is called, SEGV is happened. Function pointer does not seem to be passed correctly. With simple application, direct passing of kernel onto hipOccupancyMaxPotentialBlockSize() is confirmed to be executed. Indirect passing through get_block_size_to_maximize_occupancy is not succeeded. Let me discuss with my colleagues internally.

Meanwhile, hard coding like the following is acceptable?

template <typename F>
inline unsigned int get_block_size_to_maximize_occupancy(F func,
    unsigned int dynamic_s_mem_size = 0, unsigned int block_size_limit = 0) {
    int block_size, min_grid_size;
#ifdef __HIP_PLATFORM_AMD__
    block_size = 512;
//    TODO: Investigate SEGV issue
//    hipOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, func,
//        dynamic_s_mem_size, block_size_limit);
#else
    cudaOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, func,
        dynamic_s_mem_size, block_size_limit);
#endif
    return block_size;
}

@hisohara
Copy link
Contributor Author

hisohara commented May 22, 2024

Please ignore my last comment. I've found the workaround with #define macro as follows:

#ifdef __HIP_PLATFORM_AMD__
#define get_block_size_to_maximize_occupancy(x) ({ \
    int min_grid_size, block_size; \
    hipOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, (x), 0, 0); \
    block_size; \
})
#else
template <typename F>
inline unsigned int get_block_size_to_maximize_occupancy(F func,
    unsigned int dynamic_s_mem_size = 0, unsigned int block_size_limit = 0) {
    int block_size, min_grid_size;
    cudaOccupancyMaxPotentialBlockSize(&min_grid_size, &block_size, func,
        dynamic_s_mem_size, block_size_limit);
    return block_size;
}
#endif

I've confirmed no major performance impact. At QCBMopt4 with 25 qubits, it shows slightly faster by 2.4%. Hope that this change is acceptable.

Copy link
Contributor

@KowerKoint KowerKoint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. That seems nice.

@hisohara
Copy link
Contributor Author

Thanks for your approval!

@KowerKoint
Copy link
Contributor

@hisohara I'm sorry, could you run script/format.sh and commit again?
We require cpp/cuda sources to be formatted in specified style.

@hisohara
Copy link
Contributor Author

I updated rocm branch by applying clang-format. Also cached up with the latest main commit. Could you please check?

@KowerKoint
Copy link
Contributor

Format is broken while merging main branch.
Re-format, please 🙏

@hisohara
Copy link
Contributor Author

I executed format.sh again, but nothing seems to be updated as follows:

hisohara@gbtrocm:~/Projects/GITHUB/qulacs$ git log -1
commit 5e031c4306d3f219c6a75199191fabba57cbf17d (HEAD -> hip-format, origin/rocm, AMD-HPC-qulacs/rocm, rocm, hip-format-v2)
Merge: e3a8e531 ac5da893
Author: Hisaki Ohara <Hisaki.Ohara@amd.com>
Date:   Tue May 28 04:03:59 2024 +0000

    Merge branch 'main' into hip-format
hisohara@gbtrocm:~/Projects/GITHUB/qulacs$ ./script/format.sh
hisohara@gbtrocm:~/Projects/GITHUB/qulacs$ git diff
hisohara@gbtrocm:~/Projects/GITHUB/qulacs$

Could you please tell me what file is broken on your side?

@hisohara
Copy link
Contributor Author

hisohara commented May 28, 2024

My using clang-format was 14.0.0 on Ubuntu 22.04, while GitHub Action uses 10.0-50. I changed it to 11.1.0-6 which is the oldest version available for Ubuntu 22.04. It could apply src/cppsim/circuit.cpp where the test was failed before. Hope that the issue is resolved..

@hisohara
Copy link
Contributor Author

Hi, please let me know if there is anything from my side to proceed.

@KowerKoint
Copy link
Contributor

Sorry for late reply.
Now it's all OK.
I'll merge this.
Thank you for your contribution.

@KowerKoint KowerKoint merged commit bea30ac into qulacs:main May 30, 2024
12 checks passed
@hisohara
Copy link
Contributor Author

Finally! Thanks so much for your continuous support and accepting this big change. I'm very happy to contribute to this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants