Skip to content

Conversation

@v0i0
Copy link
Contributor

@v0i0 v0i0 commented Sep 24, 2025

about doubles perf for B=32K, H=2K, presumably before was storing a bunch into inv_sqrt?
might also indicate something funny happening with keepdim.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 24, 2025
@v0i0 v0i0 requested review from mengluy0125 and yf225 and removed request for mengluy0125 September 24, 2025 16:53
@v0i0
Copy link
Contributor Author

v0i0 commented Sep 24, 2025

perf results on h100:

32768,256,fwd,helion_orig,1642.1788056405082
32768,256,fwd,helion_new,1578.5740566290297
32768,512,fwd,helion_orig,1817.2547849940113
32768,512,fwd,helion_new,1866.2814686329166
32768,1024,fwd,helion_orig,1488.1836334821526
32768,1024,fwd,helion_new,2016.8193458851026
32768,2048,fwd,helion_orig,935.4300356703114
32768,2048,fwd,helion_new,2087.6677455652543
32768,4096,fwd,helion_orig,242.58288588525073
32768,4096,fwd,helion_new,2128.879800252783
32768,8192,fwd,helion_orig,187.45590805263205
32768,8192,fwd,helion_new,1800.378577328567
32768,16384,fwd,helion_orig,181.87879046800123
32768,16384,fwd,helion_new,232.25287573963158
32768,32768,fwd,helion_orig,188.59863561089344
32768,32768,fwd,helion_new,201.6228783951727

@v0i0 v0i0 merged commit 8734c2c into pytorch:main Sep 26, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants