`(segment|grouped)_matmul` implementation using MKL BLAS #146

DamianSzwichtenberg · 2022-11-16T13:41:27Z

By default PyTorch is built with MKL BLAS support, we can take advantage of that and use gemm_batch to implement (segment|grouped)_matmul.

segment_matmul C++ benchmarks (24 core machine) showed that the new version is on average 3.5 times faster than the previous one (with maximum speedup up to 6 times - note that chosen shapes are quite small, so speedup will be more impressive if a problem becomes more complex).

Additionally, HeteroLinear benchmarks were performed (thanks to @puririshi98 for sharing the benchmarks code):

Same benchmark with time in log2 scale:

We can see that segment_matmul execution path starts shining from num_node_types=32 onward.

rusty1s

I think this looks great. Thanks for the effort!

Two questions:

Can we get a better review in than mine from @pyg-team/intel-team?
Can we enable it in our CPP tests to ensure that it is working correct?

CONTRIBUTING.md

DamianSzwichtenberg · 2022-11-17T10:27:40Z

Can we get a better review in than mine from @pyg-team/intel-team?

Sure, I'll ping someone.

Can we enable it in our CPP tests to ensure that it is working correct?

Do you mean to enable it in the CI? 😉

rusty1s · 2022-11-17T11:37:07Z

Do you mean to enable it in the CI? 😉

Yes.

yanbing-j · 2022-11-21T01:00:14Z

LGTM. Thanks for your hard work!
I have question about the improvement in E2E, do you verify this in a real benchmark?

codecov-commenter · 2022-11-23T07:55:51Z

Codecov Report

Merging #146 (32db269) into master (2eab973) will decrease coverage by 2.01%.
The diff coverage is 81.81%.

❗ Current head 32db269 differs from pull request most recent head a6048a4. Consider uploading reports for the commit a6048a4 to get more accurate results

@@            Coverage Diff             @@
##           master     #146      +/-   ##
==========================================
- Coverage   94.01%   92.00%   -2.02%     
==========================================
  Files          20       20              
  Lines         535      588      +53     
==========================================
+ Hits          503      541      +38     
- Misses         32       47      +15

Impacted Files	Coverage Δ
pyg_lib/csrc/ops/cpu/matmul_kernel.cpp	`83.33% <81.81%> (-16.67%)`	⬇️
pyg_lib/csrc/ops/matmul.h	`50.00% <0.00%> (-50.00%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

DamianSzwichtenberg · 2022-11-23T08:07:10Z

Do you mean to enable it in the CI? 😉

Yes.

@rusty1s Just added. Please take another look. 😉

DamianSzwichtenberg · 2022-11-23T08:11:26Z

LGTM. Thanks for your hard work! I have question about the improvement in E2E, do you verify this in a real benchmark?

Training and inference benchmarks, unfortunately, do not cover the segment_matmul execution path in HeteroLinear or RGCNConv.

mingfeima

Do we have test cases for it?

pyg_lib/csrc/ops/cpu/matmul_kernel.cpp

CMakeLists.txt

pyg_lib/csrc/ops/cpu/matmul_kernel.cpp

DamianSzwichtenberg · 2022-11-25T06:46:30Z

Do we have test cases for it?

Yes, here and here.

DamianSzwichtenberg · 2022-11-28T06:37:58Z

@mingfeima Please take another look. 😉

for more information, see https://pre-commit.ci

pyg_lib/csrc/ops/cpu/matmul_kernel.cpp

DamianSzwichtenberg added 0 - Priority P0 feature benchmark ops labels Nov 16, 2022

DamianSzwichtenberg requested a review from rusty1s November 16, 2022 13:41

DamianSzwichtenberg self-assigned this Nov 16, 2022

rusty1s approved these changes Nov 17, 2022

View reviewed changes

CONTRIBUTING.md Outdated Show resolved Hide resolved

DamianSzwichtenberg force-pushed the segment-mm-using-mkl-blas branch from 64a3b43 to 2e38ea5 Compare November 23, 2022 07:55

mingfeima requested changes Nov 24, 2022

View reviewed changes

DamianSzwichtenberg requested a review from mingfeima November 28, 2022 06:37

DamianSzwichtenberg force-pushed the segment-mm-using-mkl-blas branch from 32db269 to 13c6526 Compare November 28, 2022 11:46

DamianSzwichtenberg and others added 9 commits November 29, 2022 06:54

Add segment_matmul benchmark

1c02852

Add possibility to build with MKL BLAS support

3f66c36

Improve (segment|grouped)_matmul performance with MKL BLAS

f737801

Update CONTRIBUTING.md

f424cdf

[pre-commit.ci] auto fixes from pre-commit.com hooks

50b39f3

for more information, see https://pre-commit.ci

Update CHANGELOG.md

db3dee3

Enable MKL BLAS in the CI test session

20adc9d

Make parallel runtime aligned with PyTorch

aae3110

Reserve place for tensors in vector

672eff6

DamianSzwichtenberg force-pushed the segment-mm-using-mkl-blas branch from 13c6526 to 6b76eb8 Compare November 29, 2022 06:31

mingfeima approved these changes Nov 29, 2022

View reviewed changes

pyg_lib/csrc/ops/cpu/matmul_kernel.cpp Show resolved Hide resolved

pyg_lib/csrc/ops/cpu/matmul_kernel.cpp Show resolved Hide resolved

pyg_lib/csrc/ops/cpu/matmul_kernel.cpp Outdated Show resolved Hide resolved

pyg_lib/csrc/ops/cpu/matmul_kernel.cpp Show resolved Hide resolved

DamianSzwichtenberg added 2 commits November 29, 2022 08:21

Improve segment_matmul by reducing tensor creation overhead

f9a53d2

Replace std::map with phmap::flat_hash_map

a6048a4

DamianSzwichtenberg force-pushed the segment-mm-using-mkl-blas branch from 6b76eb8 to a6048a4 Compare November 29, 2022 07:59

DamianSzwichtenberg merged commit df545db into pyg-team:master Nov 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`(segment|grouped)_matmul` implementation using MKL BLAS #146

`(segment|grouped)_matmul` implementation using MKL BLAS #146

DamianSzwichtenberg commented Nov 16, 2022

rusty1s left a comment

DamianSzwichtenberg commented Nov 17, 2022

rusty1s commented Nov 17, 2022

yanbing-j commented Nov 21, 2022

codecov-commenter commented Nov 23, 2022 •

edited

Loading

DamianSzwichtenberg commented Nov 23, 2022 •

edited

Loading

DamianSzwichtenberg commented Nov 23, 2022

mingfeima left a comment

DamianSzwichtenberg commented Nov 25, 2022

DamianSzwichtenberg commented Nov 28, 2022

(segment|grouped)_matmul implementation using MKL BLAS #146

(segment|grouped)_matmul implementation using MKL BLAS #146

Conversation

DamianSzwichtenberg commented Nov 16, 2022

rusty1s left a comment

Choose a reason for hiding this comment

DamianSzwichtenberg commented Nov 17, 2022

rusty1s commented Nov 17, 2022

yanbing-j commented Nov 21, 2022

codecov-commenter commented Nov 23, 2022 • edited Loading

Codecov Report

DamianSzwichtenberg commented Nov 23, 2022 • edited Loading

DamianSzwichtenberg commented Nov 23, 2022

mingfeima left a comment

Choose a reason for hiding this comment

DamianSzwichtenberg commented Nov 25, 2022

DamianSzwichtenberg commented Nov 28, 2022

`(segment|grouped)_matmul` implementation using MKL BLAS #146

`(segment|grouped)_matmul` implementation using MKL BLAS #146

codecov-commenter commented Nov 23, 2022 •

edited

Loading

DamianSzwichtenberg commented Nov 23, 2022 •

edited

Loading