Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative access order for the same buffer can bring big perf win #126913

Open
shunting314 opened this issue May 22, 2024 · 1 comment
Open
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: inductor oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@shunting314
Copy link
Contributor

shunting314 commented May 22, 2024

Check this softmax kernel generated by inductor: https://gist.github.com/shunting314/16bf79d906bd2e929a62c0b2f3c02150 (call it k1)

If we reverse the access order for the second for loop from:

for roffset in range(0, rnumel, RBLOCK):

to

for roffset in range(rnumel - rnumel % RBLOCK, -RBLOCK, -RBLOCK):

we get k2 (https://gist.github.com/shunting314/e749c3766757adaed729b51d38cd3169 )

k2 is 1.54 x faster than k1 (5.231ms v.s. 8.067ms). The speedup is mainly due to more cache hit.

Credit to llm.c since I learn the idea from there. This is probably something we can apply in general in inductor.

cc @ezyang @msaroufim @bdhirsh @anijain2305 @chauhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @jansel @Chillee @eellison

@jansel
Copy link
Contributor

jansel commented May 23, 2024

Interesting, maybe we should just always alternate the order for multiple reduction loops.

@mlazos mlazos added triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module enhancement Not as big of a feature, but technically not a bug. Should be easy to fix labels May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Not as big of a feature, but technically not a bug. Should be easy to fix module: inductor oncall: pt2 triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

3 participants