Improve trans and untrans with AVX512 #117

HackToday · 2022-04-06T06:58:06Z

Signed-off-by: Wu, Kaiqiang kaiqiang.wu@intel.com
Co-authored-by: vesslanjin jun.i.jin@intel.com

Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com> Co-authored-by: vesslanjin <jun.i.jin@intel.com>

HackToday · 2022-04-06T07:04:42Z

With performance test against AVX2 and AVX512, I test against 4 byte elem, elem size varies from 8-120(incr step 8),
Performance speedup ratio can be 0.94x~1.5x,
even in some cases not better than AVX2, it could keep nearly same performance. In summary, AVX512 could be a benefit for some modern platforms.

HackToday · 2022-04-06T07:06:58Z

@jrs65 and @kiyo-masui Could you help check if it is OK for such feature enablement for this repo?

HackToday · 2022-04-20T01:04:56Z

@jrs65 and @kiyo-masui please help check if missed

jrs65 · 2022-04-23T00:59:45Z

Hi @HackToday. Sorry for the belated response, it's been a busy end to the semester for myself (and Kiyo too I imagine).

Thanks for putting this together, it's definitely appreciated. Your code looks good to me, but I need to look around for an AVX512 machine for me to run the tests on as I think Github actions doesn't use any AVX512 supporting hosts.

Also, I'm intrigued if you have any benchmarks of this. How much does AVX512 support speed things up?

HackToday · 2022-04-24T05:50:17Z

hi @jrs65 Thanks for your reply.

For AVX512 available system, I tested against with PR changes, to count following

bshuf_trans_byte_elem_SSE
bshuf_trans_bit_byte_XXX (can be SSE, AVX, AVX512)

The tests show that total element size varies from 8-120(8, 16, 24, 32 etc. step 8, as Fig1 x label), 4 byte element.
y label: AVX2 speed up vs SSE, AVX512 speed up vs SSE.

Performance speedup ratio can be 0.94x~1.5x,(AVX512 vs AVX2) Please check Fig1.

Fig 1

even in some cases not better than AVX2, it could keep nearly same performance.

Please let me know if need more info.

HackToday · 2022-04-26T15:48:21Z

@jrs65 has added one more improvement.(untrans part within bitshuffle), it is same usage like trans with AVX512. Also for 8 byte can have such following improvement.

(if with more large size can achieve more speedup ratio， reach to 1.5x)

HackToday · 2022-05-03T16:34:46Z

@jrs65 and @kiyo-masui in case anything missed. BTW, the workflows CI seems need approval to run.

Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com>

jrs65 · 2022-05-07T16:45:32Z

Hi @HackToday

Thanks for all your efforts here, and apologies for the slow responses. I've got the code built and running on one of my own machines (the cluster we use has some AVX512 nodes), and on the machine that you gave me access to elsewhere. Everything seems to run fine, and with a nice speed boost.

I'm going to merge your code in now. I'll wait a few weeks to cut a release (mostly as I'm going on vacation) but also so I can see about merging in a few other outstanding PRs.

HackToday · 2022-05-08T01:29:21Z

Thanks @jrs65 for your time and help for the verification.

Improve trans bit elem with AVX512

b2cbc1b

Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com> Co-authored-by: vesslanjin <jun.i.jin@intel.com>

HackToday changed the title ~~Improve trans bit elem with AVX512~~ Improve trans and untrans with AVX512 Apr 26, 2022

HackToday force-pushed the master branch from f354ace to 84544f6 Compare May 7, 2022 08:40

Improve untrans with AVX512

84544f6

Signed-off-by: Wu, Kaiqiang <kaiqiang.wu@intel.com>

jrs65 merged commit fdfcd40 into kiyo-masui:master May 7, 2022

wanweiqiangintel mentioned this pull request Jan 9, 2023

[Enhancement] upgrade bit-shuffle version to support AVX512 StarRocks/starrocks#16360

Merged

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve trans and untrans with AVX512 #117

Improve trans and untrans with AVX512 #117

HackToday commented Apr 6, 2022

HackToday commented Apr 6, 2022

HackToday commented Apr 6, 2022

HackToday commented Apr 20, 2022

jrs65 commented Apr 23, 2022

HackToday commented Apr 24, 2022 •

edited

Loading

HackToday commented Apr 26, 2022

HackToday commented May 3, 2022

jrs65 commented May 7, 2022

HackToday commented May 8, 2022

Improve trans and untrans with AVX512 #117

Improve trans and untrans with AVX512 #117

Conversation

HackToday commented Apr 6, 2022

HackToday commented Apr 6, 2022

HackToday commented Apr 6, 2022

HackToday commented Apr 20, 2022

jrs65 commented Apr 23, 2022

HackToday commented Apr 24, 2022 • edited Loading

HackToday commented Apr 26, 2022

HackToday commented May 3, 2022

jrs65 commented May 7, 2022

HackToday commented May 8, 2022

HackToday commented Apr 24, 2022 •

edited

Loading