New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix template kernels with ROCm #33
Conversation
Given that the issues reported in #34 seem independent from this PR, I would address them in another PR and mark this one ready for review. |
@mlazzarin do you really need to add the thrust:: prefix? |
@scarrazza It seems to be required in |
do you have an example? I would prefer to keep kernels as light as possible. |
I implemented the workaround suggested in this comment cupy/cupy#5436 (comment) . In particular, in the first commit e2a83eb of this PR I replaced In the second commit 9f4c059 I replaced Actually, I've no idea why this works, I just followed the discussion of that thread. |
Ok thanks, I am fine with this current implementation If the performance on NVIDIA does not change. |
Ok, I'll double-check the performance. |
I performed some benchmarks (EDIT on a NVIDIA gpu), I've included also the The simulation times are quite similar across the three branches. qft - simulation times
variational - simulation times
bv - simulation times
supremacy - simulation times
qv - simulation times
qft - dry run overhead
variational - dry run overhead
bv - dry run overhead
supremacy - dry run overhead
qv - dry run overhead
|
@mlazzarin thanks. All these numbers refer to the Radeon VII, correct? |
Sorry I didn't mention it. These numbers refer to an NVIDIA GPU, to see if the performance on NVIDIA changes or not. |
Ok, good, so NVIDIA performance is unaffected when compared to the multiqubitgpu. |
Yes, exactly. |
Codecov Report
@@ Coverage Diff @@
## multiqubitgpu #33 +/- ##
===============================================
Coverage 100.00% 100.00%
===============================================
Files 9 9
Lines 758 758
===============================================
Hits 758 758
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
@scarrazza shall we merge this? |
In this PR I implemented the workaround for the template kernels with ROCm, as suggested in cupy/cupy#5436 .
In particuar, I replaced
<complex<double>>
with<thrust::complex<double> >
and<complex<float>>
with<thrust::complex<float> >
. I also replacedcomplex
withthrust::complex
in the__device__
functions ofgates.cu.cc
for consistency.Then, I removed the duplicated file with the ROCm kernels, which is now redundant.
I run the tests of this repository and they are ok. However, the tests in the qibo repository fail, but they fail also with the
main
branch. I will open a separate issue concerning this.