Alternative access order for the same buffer can bring big perf win #126913

shunting314 · 2024-05-22T21:24:36Z

Check this softmax kernel generated by inductor: https://gist.github.com/shunting314/16bf79d906bd2e929a62c0b2f3c02150 (call it k1)

If we reverse the access order for the second for loop from:

for roffset in range(0, rnumel, RBLOCK):

to

for roffset in range(rnumel - rnumel % RBLOCK, -RBLOCK, -RBLOCK):

we get k2 (https://gist.github.com/shunting314/e749c3766757adaed729b51d38cd3169 )

k2 is 1.54 x faster than k1 (5.231ms v.s. 8.067ms). The speedup is mainly due to more cache hit.

Credit to llm.c since I learn the idea from there. This is probably something we can apply in general in inductor.

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @jansel @Chillee @eellison

The text was updated successfully, but these errors were encountered:

jansel · 2024-05-23T04:14:16Z

Interesting, maybe we should just always alternate the order for multiple reduction loops.

shunting314 added oncall: pt2 module: inductor labels May 22, 2024

mlazos added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels May 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative access order for the same buffer can bring big perf win #126913

Alternative access order for the same buffer can bring big perf win #126913

shunting314 commented May 22, 2024 •

edited by pytorch-bot bot

jansel commented May 23, 2024

Alternative access order for the same buffer can bring big perf win #126913

Alternative access order for the same buffer can bring big perf win #126913

Comments

shunting314 commented May 22, 2024 • edited by pytorch-bot bot

jansel commented May 23, 2024

shunting314 commented May 22, 2024 •

edited by pytorch-bot bot