Fix template kernels with ROCm #33

mlazzarin · 2021-10-25T08:46:18Z

In this PR I implemented the workaround for the template kernels with ROCm, as suggested in cupy/cupy#5436 .

In particuar, I replaced <complex<double>> with <thrust::complex<double> > and <complex<float>> with <thrust::complex<float> >. I also replaced complex with thrust::complex in the __device__ functions of gates.cu.cc for consistency.
Then, I removed the duplicated file with the ROCm kernels, which is now redundant.

I run the tests of this repository and they are ok. However, the tests in the qibo repository fail, but they fail also with the main branch. I will open a separate issue concerning this.

mlazzarin · 2021-10-25T12:27:52Z

Given that the issues reported in #34 seem independent from this PR, I would address them in another PR and mark this one ready for review.

scarrazza · 2021-10-25T12:36:06Z

@mlazzarin do you really need to add the thrust:: prefix?

mlazzarin · 2021-10-25T12:40:45Z

@scarrazza It seems to be required in backends.py but not in the __device__ functions of gates.cu.cc. Let me know what you prefer.

scarrazza · 2021-10-25T12:45:13Z

@scarrazza It seems to be required in backends.py

do you have an example? I would prefer to keep kernels as light as possible.

mlazzarin · 2021-10-25T12:54:08Z

I implemented the workaround suggested in this comment cupy/cupy#5436 (comment) .

In particular, in the first commit e2a83eb of this PR I replaced <complex<double>> with <complex<double> > and the same with float. On my configuration, it doesn't work, and raises the following error cupy_backends.cuda.api.driver.CUDADriverError: hipErrorNotFound: hipErrorNotFound.

In the second commit 9f4c059 I replaced <complex<double> > with <thrust::complex<double> > and the same with float, and the error doesn't appear anymore.

Actually, I've no idea why this works, I just followed the discussion of that thread.

scarrazza · 2021-10-25T12:57:46Z

Ok thanks, I am fine with this current implementation If the performance on NVIDIA does not change.

mlazzarin · 2021-10-25T17:57:04Z

Ok, I'll double-check the performance.

mlazzarin · 2021-10-26T06:09:36Z

I performed some benchmarks (EDIT on a NVIDIA gpu), I've included also the main branch so that we can see also the impact of the multiqubitgpu branch on performances.

The simulation times are quite similar across the three branches.
The dry run overhead is similar between fixrocm and multiqubitgpu, but larger than that of the main branch.

qft - simulation times

nqubits	Simulation time main	Simulation time multiqubitgpu	Simulation time fixrocm
3	0.00042	0.00043	0.00042
4	0.00073	0.00073	0.00070
5	0.00104	0.00102	0.00107
6	0.00147	0.00148	0.00146
7	0.00198	0.00196	0.00195
8	0.00248	0.00254	0.00244
9	0.00318	0.00304	0.00317
10	0.00387	0.00388	0.00387
11	0.00470	0.00479	0.00476
12	0.00565	0.00549	0.00566
13	0.00643	0.00662	0.00663
14	0.00752	0.00767	0.00770
15	0.00872	0.00847	0.00861
16	0.00982	0.00986	0.01018
17	0.01162	0.01116	0.01109
18	0.01270	0.01264	0.01281
19	0.01441	0.01477	0.01498
20	0.01810	0.01758	0.01833
21	0.02530	0.02483	0.02517
22	0.03671	0.03655	0.03686
23	0.05898	0.05888	0.05853
24	0.10572	0.10547	0.10608
25	0.20290	0.20228	0.20338
26	0.40923	0.40984	0.41000
27	0.84414	0.84438	0.84496
28	1.77014	1.76874	1.76983
29	3.71794	3.71742	3.71838
30	7.85706	7.85693	7.85759

variational - simulation times

nqubits	Simulation time main	Simulation time multiqubitgpu	Simulation time fixrocm
3	0.00042	0.00043	0.00042
4	0.00073	0.00073	0.00070
5	0.00104	0.00102	0.00107
6	0.00147	0.00148	0.00146
7	0.00198	0.00196	0.00195
8	0.00248	0.00254	0.00244
9	0.00318	0.00304	0.00317
10	0.00387	0.00388	0.00387
11	0.00470	0.00479	0.00476
12	0.00565	0.00549	0.00566
13	0.00643	0.00662	0.00663
14	0.00752	0.00767	0.00770
15	0.00872	0.00847	0.00861
16	0.00982	0.00986	0.01018
17	0.01162	0.01116	0.01109
18	0.01270	0.01264	0.01281
19	0.01441	0.01477	0.01498
20	0.01810	0.01758	0.01833
21	0.02530	0.02483	0.02517
22	0.03671	0.03655	0.03686
23	0.05898	0.05888	0.05853
24	0.10572	0.10547	0.10608
25	0.20290	0.20228	0.20338
26	0.40923	0.40984	0.41000
27	0.84414	0.84438	0.84496
28	1.77014	1.76874	1.76983
29	3.71794	3.71742	3.71838
30	7.85706	7.85693	7.85759

bv - simulation times

nqubits	Simulation time main	Simulation time multiqubitgpu	Simulation time fixrocm
3	0.00038	0.00040	0.00039
4	0.00052	0.00052	0.00052
5	0.00065	0.00064	0.00065
6	0.00077	0.00078	0.00079
7	0.00089	0.00093	0.00090
8	0.00100	0.00104	0.00103
9	0.00116	0.00116	0.00117
10	0.00133	0.00133	0.00132
11	0.00150	0.00153	0.00153
12	0.00172	0.00165	0.00168
13	0.00176	0.00179	0.00181
14	0.00198	0.00197	0.00196
15	0.00210	0.00211	0.00206
16	0.00228	0.00228	0.00222
17	0.00239	0.00241	0.00243
18	0.00282	0.00278	0.00272
19	0.00353	0.00366	0.00362
20	0.00517	0.00513	0.00512
21	0.00799	0.00796	0.00804
22	0.01379	0.01379	0.01376
23	0.02577	0.02590	0.02578
24	0.05074	0.05071	0.05063
25	0.10251	0.10255	0.10256
26	0.21009	0.21012	0.21009
27	0.43246	0.43246	0.43247
28	0.89208	0.89218	0.89226
29	1.84328	1.84265	1.84303
30	3.80665	3.80711	3.80684

supremacy - simulation times

nqubits	Simulation time main	Simulation time multiqubitgpu	Simulation time fixrocm
3	0.00050	0.00050	0.00050
4	0.00059	0.00062	0.00061
5	0.00072	0.00072	0.00071
6	0.00091	0.00086	0.00087
7	0.00102	0.00103	0.00101
8	0.00110	0.00111	0.00116
9	0.00126	0.00125	0.00127
10	0.00145	0.00148	0.00148
11	0.00170	0.00172	0.00172
12	0.00184	0.00181	0.00181
13	0.00196	0.00193	0.00193
14	0.00222	0.00215	0.00215
15	0.00222	0.00226	0.00227
16	0.00242	0.00247	0.00251
17	0.00259	0.00249	0.00261
18	0.00307	0.00309	0.00312
19	0.00401	0.00413	0.00406
20	0.00567	0.00570	0.00563
21	0.00899	0.00902	0.00904
22	0.01580	0.01592	0.01578
23	0.02963	0.02968	0.02962
24	0.05829	0.05838	0.05837
25	0.11844	0.11843	0.11843
26	0.24500	0.24480	0.24487
27	0.50053	0.50056	0.50070
28	1.03131	1.03162	1.03103
29	2.13265	2.13260	2.13229
30	4.43731	4.43699	4.43743

qv - simulation times

nqubits	Simulation time main	Simulation time multiqubitgpu	Simulation time fixrocm
3	0.00055	0.00056	0.00057
4	0.00109	0.00104	0.00104
5	0.00103	0.00103	0.00105
6	0.00153	0.00155	0.00152
7	0.00154	0.00153	0.00155
8	0.00203	0.00204	0.00204
9	0.00205	0.00212	0.00210
10	0.00262	0.00264	0.00259
11	0.00266	0.00282	0.00278
12	0.00326	0.00328	0.00335
13	0.00331	0.00331	0.00323
14	0.00392	0.00385	0.00390
15	0.00386	0.00385	0.00388
16	0.00429	0.00446	0.00440
17	0.00442	0.00430	0.00437
18	0.00554	0.00560	0.00552
19	0.00673	0.00666	0.00678
20	0.00978	0.00991	0.00994
21	0.01471	0.01472	0.01484
22	0.02682	0.02685	0.02690
23	0.04789	0.04786	0.04779
24	0.09674	0.09671	0.09690
25	0.18977	0.18961	0.18981
26	0.39928	0.39922	0.39943
27	0.80149	0.80118	0.80150
28	1.71569	1.71599	1.71572
29	3.38466	3.38512	3.38455
30	7.31601	7.31566	7.31591

qft - dry run overhead

nqubits	delta main	delta multiqubitgpu	delta fixrocm
3	0.50600	1.13538	1.13443
4	0.51782	0.70583	0.71342
5	0.50966	0.71327	0.70894
6	0.51338	0.71801	0.71131
7	0.51297	0.71498	0.71341
8	0.51247	0.71702	0.70893
9	0.51243	0.70881	0.70851
10	0.51233	0.71408	0.71520
11	0.50800	0.71673	0.71505
12	0.50860	0.71547	0.71151
13	0.51156	0.71476	0.71485
14	0.51206	0.71056	0.71341
15	0.51082	0.71272	0.71272
16	0.51419	0.71710	0.71600
17	0.51712	0.71543	0.71881
18	0.51536	0.71398	0.71490
19	0.51353	0.72074	0.71850
20	0.51847	0.71789	0.72098
21	0.52013	0.71812	0.72064
22	0.51935	0.72013	0.72437
23	0.52238	0.72093	0.72061
24	0.51966	0.72555	0.72160
25	0.52140	0.72868	0.72414
26	0.51978	0.72428	0.72281
27	0.51812	0.72199	0.71927
28	0.51509	0.71866	0.71964
29	0.51091	0.70912	0.71425
30	0.48910	0.69470	0.69538

variational - dry run overhead

nqubits	delta main	delta multiqubitgpu	delta fixrocm
3	0.50932	0.70942	0.70957
4	0.51115	0.71280	0.71235
5	0.51016	0.71519	0.71482
6	0.51240	0.71522	0.70514
7	0.50688	0.70741	0.71343
8	0.50666	0.71094	0.71117
9	0.50773	0.70722	0.70982
10	0.50776	0.71402	0.70825
11	0.51665	0.70771	0.71423
12	0.51323	0.71545	0.71442
13	0.50632	0.71600	0.71012
14	0.51356	0.71467	0.71145
15	0.50701	0.71459	0.71528
16	0.51191	0.72116	0.71520
17	0.51211	0.71485	0.71230
18	0.50663	0.71494	0.71543
19	0.51929	0.71472	0.71117
20	0.51281	0.71722	0.71047
21	0.51599	0.71583	0.70926
22	0.51343	0.71671	0.71165
23	0.51521	0.70842	0.70766
24	0.50832	0.71208	0.71228
25	0.50583	0.71475	0.70992
26	0.51356	0.71521	0.71451
27	0.51062	0.70510	0.70490
28	0.50514	0.70421	0.70642
29	0.48779	0.69141	0.69917
30	0.47347	0.67801	0.67424

bv - dry run overhead

nqubits	delta main	delta multiqubitgpu	delta fixrocm
3	0.50389	0.71101	0.71201
4	0.50578	0.70876	0.70934
5	0.50564	0.71011	0.70903
6	0.50741	0.70865	0.70792
7	0.51030	0.70771	0.71134
8	0.50559	0.71036	0.70916
9	0.51019	0.70944	0.71090
10	0.51035	0.71359	0.71303
11	0.50659	0.70928	0.70955
12	0.50894	0.71110	0.71203
13	0.50907	0.70998	0.71149
14	0.50620	0.71470	0.70897
15	0.50793	0.71113	0.71075
16	0.50990	0.71100	0.70926
17	0.50641	0.71272	0.71628
18	0.50742	0.70893	0.71348
19	0.50990	0.71450	0.71516
20	0.51708	0.71440	0.71574
21	0.51125	0.71372	0.71442
22	0.50879	0.71461	0.71580
23	0.50713	0.70903	0.71162
24	0.50907	0.70802	0.70865
25	0.50843	0.71258	0.70754
26	0.50646	0.70859	0.70724
27	0.50258	0.70430	0.70739
28	0.50465	0.70688	0.70589
29	0.50231	0.70292	0.70423
30	0.49307	0.69398	0.69374

supremacy - dry run overhead

nqubits	delta main	delta multiqubitgpu	delta fixrocm
3	0.50895	0.70971	0.71043
4	0.50803	0.71247	0.71221
5	0.50867	0.71171	0.71373
6	0.50887	0.71013	0.71086
7	0.50891	0.71199	0.70991
8	0.51028	0.70930	0.70998
9	0.50841	0.71422	0.71056
10	0.50698	0.71189	0.71167
11	0.50867	0.71360	0.71045
12	0.50967	0.70944	0.71022
13	0.50931	0.71132	0.71015
14	0.50990	0.71007	0.71157
15	0.50775	0.71130	0.71036
16	0.50903	0.71232	0.71380
17	0.50891	0.70984	0.71116
18	0.50955	0.71213	0.71130
19	0.51059	0.71230	0.71120
20	0.50953	0.71041	0.71246
21	0.50877	0.71270	0.71184
22	0.50994	0.71221	0.71200
23	0.51276	0.71392	0.71194
24	0.51112	0.71274	0.71493
25	0.50926	0.71004	0.71070
26	0.50770	0.70891	0.71020
27	0.50554	0.71058	0.70449
28	0.50003	0.70185	0.70343
29	0.49402	0.69845	0.69420
30	0.47374	0.67465	0.67695

qv - dry run overhead

nqubits	delta main	delta multiqubitgpu	delta fixrocm
3	0.51344	0.71274	0.70779
4	0.51029	0.71115	0.71321
5	0.50925	0.70837	0.71013
6	0.50921	0.71082	0.71210
7	0.51268	0.71174	0.71035
8	0.51095	0.71131	0.71224
9	0.51031	0.71100	0.71399
10	0.51105	0.71263	0.71151
11	0.51093	0.71219	0.71526
12	0.51140	0.71173	0.71430
13	0.51256	0.71747	0.71260
14	0.51218	0.71464	0.71713
15	0.51327	0.71229	0.71348
16	0.51469	0.71368	0.71219
17	0.51575	0.71638	0.71581
18	0.51401	0.71165	0.71535
19	0.51752	0.71422	0.71572
20	0.51340	0.71548	0.71394
21	0.51353	0.71906	0.71586
22	0.51371	0.71465	0.71638
23	0.51654	0.71913	0.71523
24	0.51528	0.71613	0.71690
25	0.51305	0.71642	0.71747
26	0.51215	0.71330	0.71667
27	0.51107	0.71343	0.71095
28	0.50792	0.70824	0.70956
29	0.49898	0.70139	0.69964
30	0.47820	0.68516	0.68069

scarrazza · 2021-10-26T10:23:47Z

@mlazzarin thanks. All these numbers refer to the Radeon VII, correct?

mlazzarin · 2021-10-26T10:27:34Z

Sorry I didn't mention it. These numbers refer to an NVIDIA GPU, to see if the performance on NVIDIA changes or not.

scarrazza · 2021-10-26T10:36:23Z

Ok, good, so NVIDIA performance is unaffected when compared to the multiqubitgpu.

mlazzarin · 2021-10-26T10:40:28Z

Yes, exactly.

codecov · 2021-10-26T10:54:22Z

Codecov Report

Merging #33 (590e80c) into multiqubitgpu (402fd9c) will not change coverage.
The diff coverage is n/a.

@@               Coverage Diff               @@
##           multiqubitgpu       #33   +/-   ##
===============================================
  Coverage         100.00%   100.00%           
===============================================
  Files                  9         9           
  Lines                758       758           
===============================================
  Hits                 758       758

Flag	Coverage Δ
unittests	`100.00% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
src/qibojit/custom_operators/backends.py	`100.00% <ø> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 402fd9c...590e80c. Read the comment docs.

mlazzarin · 2021-10-26T12:04:55Z

@scarrazza shall we merge this?

mlazzarin added 3 commits October 23, 2021 09:34

Implement workaround for ROCm kernels

e2a83eb

Fix previous commit

9f4c059

Use thrust::complex in device functions of gates.cu.cc

967f891

mlazzarin self-assigned this Oct 25, 2021

mlazzarin mentioned this pull request Oct 25, 2021

Tests in Qibo repository fail with AMD ROCm #34

Closed

mlazzarin marked this pull request as ready for review October 25, 2021 12:27

mlazzarin requested review from scarrazza and stavros11 October 25, 2021 12:28

Merge branch 'multiqubitgpu' into fixrocm

590e80c

scarrazza merged commit 991eed8 into multiqubitgpu Oct 26, 2021

scarrazza deleted the fixrocm branch February 11, 2022 17:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix template kernels with ROCm #33

Fix template kernels with ROCm #33

mlazzarin commented Oct 25, 2021 •

edited

mlazzarin commented Oct 25, 2021

scarrazza commented Oct 25, 2021

mlazzarin commented Oct 25, 2021

scarrazza commented Oct 25, 2021

mlazzarin commented Oct 25, 2021

scarrazza commented Oct 25, 2021

mlazzarin commented Oct 25, 2021

mlazzarin commented Oct 26, 2021 •

edited

scarrazza commented Oct 26, 2021

mlazzarin commented Oct 26, 2021

scarrazza commented Oct 26, 2021

mlazzarin commented Oct 26, 2021

codecov bot commented Oct 26, 2021 •

edited

mlazzarin commented Oct 26, 2021

Fix template kernels with ROCm #33

Fix template kernels with ROCm #33

Conversation

mlazzarin commented Oct 25, 2021 • edited

mlazzarin commented Oct 25, 2021

scarrazza commented Oct 25, 2021

mlazzarin commented Oct 25, 2021

scarrazza commented Oct 25, 2021

mlazzarin commented Oct 25, 2021

scarrazza commented Oct 25, 2021

mlazzarin commented Oct 25, 2021

mlazzarin commented Oct 26, 2021 • edited

scarrazza commented Oct 26, 2021

mlazzarin commented Oct 26, 2021

scarrazza commented Oct 26, 2021

mlazzarin commented Oct 26, 2021

codecov bot commented Oct 26, 2021 • edited

Codecov Report

mlazzarin commented Oct 26, 2021

mlazzarin commented Oct 25, 2021 •

edited

mlazzarin commented Oct 26, 2021 •

edited

codecov bot commented Oct 26, 2021 •

edited