Skip to content
This repository was archived by the owner on Aug 30, 2024. It is now read-only.

Conversation

@DDEle
Copy link
Contributor

@DDEle DDEle commented Aug 6, 2024

Type of Change: Feature

API not changed

Description

Previously there is a bug with the mask load, so we have to achieve accuracy with extra overhead. This overhead will be removed in this PR.

Expected Behavior & Potential Risk

N/A

How has this PR been tested?

Internal IPEX CI

Performance on MTL

xetla barnch

[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 9.90242
[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.4184
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.3561
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.5452
[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 51.7042
[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 49.3453
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 49.573
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 52.3423

This PR

[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 12.8365
[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.3975
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 11.3263
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen33
[kernel time]The maximum gflops(GPU_time) is 10.0362
[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 56.154
[ RUN      ] XeTLA/FMHATest.kUseBiasOFF_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 58.3497
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastOFF_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 55.4583
[ RUN      ] XeTLA/FMHATest.kUseBiasON_kSeqLastON_bs1_hn32_hs128_qlen1_klen1023
[kernel time]The maximum gflops(GPU_time) is 58.2443

Dependency Change?

No

@DDEle DDEle requested review from airMeng and sunjiweiswift August 6, 2024 02:25
@DDEle
Copy link
Contributor Author

DDEle commented Aug 9, 2024

Ready to merge as internal IPEX PR merges.

@sunjiweiswift
Copy link

Using esimd's API has achieved significant performance improvements

@airMeng airMeng merged commit 8fc6c57 into xetla Aug 9, 2024
@airMeng airMeng deleted the xetla-zero-passthrough branch August 9, 2024 07:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants