Bad perf for matmul of ba tensors #1667

WilliamTambellini · 2023-06-09T21:16:05Z

Hello 1dnn team,
Just asking if it is really expected for ba matmul to be so slow: eg:

M=63448
K=640
N=2
tag time
ab  18
ba  1790

With benchdnn :

wtambellini@lawtambe3 onednn-3.0/bin (master) $ ONEDNN_VERBOSE=1 OMP_NUM_THREADS=1 ./benchdnn --mode=P --matmul --dt=f32 --stag=ba --wtag=ba --dtag=ba 63448x640:640x2 
onednn_verbose,info,oneDNN v3.0.0 (commit 030eae4fe332eee75f10e05da4e8d7981c1a94b8)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:1
onednn_verbose,info,cpu,isa:Intel AVX2
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,prim_template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
...

Best
WT

The text was updated successfully, but these errors were encountered:

dzarukin · 2023-06-09T22:25:03Z

Hi @WilliamTambellini, benchdnn command is ill-formed which it gives a warning towards the end of the execution. To get the desired problem, descriptor should be put at the very end of the line, not before desired tags. Thanks.

WilliamTambellini · 2023-06-09T22:47:36Z

Tks @dzarukin
I ve fixed the benchdnn commandline call but still confirm the speed of ba matmul to be apparently bad:

$ ONEDNN_VERBOSE=1 OMP_NUM_THREADS=1 ./benchdnn --mode=P --matmul --dt=f32 --stag=ba --wtag=ba --dtag=ba 63448x640:640x2
onednn_verbose,info,oneDNN v3.0.0 (commit 030eae4fe332eee75f10e05da4e8d7981c1a94b8)
onednn_verbose,info,cpu,runtime:OpenMP,nthr:1
onednn_verbose,info,cpu,isa:Intel AVX2
onednn_verbose,info,gpu,runtime:none
onednn_verbose,info,prim_template:operation,engine,primitive,implementation,prop_kind,memory_descriptors,attributes,auxiliary,problem_desc,exec_time
onednn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:ab:f0 dst_f32::blocked:ba:f0,,,63448x640,268.588
onednn_verbose,exec,cpu,reorder,jit:uni,undef,src_f32::blocked:ab:f0 dst_f32::blocked:ba:f0,,,640x2,0.00195312
onednn_verbose,exec,cpu,matmul,ref:any,undef,src_f32::blocked:ba:f0 wei_f32::blocked:ba:f0 dst_f32::blocked:ba:f0,,,63448x640:640x2:63448x2,1626.54
onednn_verbose,exec,cpu,matmul,ref:any,undef,src_f32::blocked:ba:f0 wei_f32::blocked:ba:f0 dst_f32::blocked:ba:f0,,,63448x640:640x2:63448x2,1747.15

about 1700ms for ba matmul vs 16ms for ab matmul ?

dzarukin · 2023-06-09T23:01:18Z

That's caused by ba format on destination as it falls back to reference implementation. Optimized version doesn't support it, I suggest to refrain from using it.

WilliamTambellini · 2023-06-10T01:32:34Z

Tks @dzarukin
New results (on Intel(R) Xeon(R) Platinum 8259CL):

src	w	dst	time
ab 	ab 	ab  	20
ba  	ba 	ba  	2400
ba  	ba 	ab  	44
ba	ab	ab	45
ab	ba	ab 	22

A warning that the perf severly depends on format tags would be appreciated, for instance over there:
https://oneapi-src.github.io/oneDNN/v3.0/dev_guide_matmul.html
CU

AngryLoki · 2023-09-14T07:15:57Z

Hello,

Pardon my ignorance, but isn't Aᵀ×Bᵀ == (B×A)ᵀ, therefore matrices in ba*ba->ba can be reinterpreted as ab*ab->ab, arguments switched (which does not suffer from the fallback to reference implementation)?

WilliamTambellini added the question label Jun 9, 2023

vpirogov assigned dzarukin Jun 15, 2023

vpirogov added the platform:cpu-x64 Intel64/AMD64 processors. Codeowner: @oneapi-src/onednn-cpu-x64 label Mar 29, 2024

vpirogov added help wanted and removed question labels Jul 16, 2024

vpirogov unassigned dzarukin Jul 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad perf for matmul of ba tensors #1667

Bad perf for matmul of ba tensors #1667

WilliamTambellini commented Jun 9, 2023 •

edited

Loading

dzarukin commented Jun 9, 2023

WilliamTambellini commented Jun 9, 2023

dzarukin commented Jun 9, 2023

WilliamTambellini commented Jun 10, 2023 •

edited

Loading

AngryLoki commented Sep 14, 2023 •

edited

Loading

Bad perf for matmul of ba tensors #1667

Bad perf for matmul of ba tensors #1667

Comments

WilliamTambellini commented Jun 9, 2023 • edited Loading

dzarukin commented Jun 9, 2023

WilliamTambellini commented Jun 9, 2023

dzarukin commented Jun 9, 2023

WilliamTambellini commented Jun 10, 2023 • edited Loading

AngryLoki commented Sep 14, 2023 • edited Loading

WilliamTambellini commented Jun 9, 2023 •

edited

Loading

WilliamTambellini commented Jun 10, 2023 •

edited

Loading

AngryLoki commented Sep 14, 2023 •

edited

Loading