You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The text was updated successfully, but these errors were encountered:
leeeizhang
changed the title
[BUG] performance degradation in speed and memory without FP8 compile
[FP8] performance degradation in speed and memory without compile
Aug 15, 2024
I believe this is expected, in eager mode you'd be dispatching a kernel for a quant, matmul and dequant seperately wheras with compile you can do things like fuse the matmul and dequant into a single kernel. @vkuzo might have a longer form answer that applies to fp8
The FP8 FFN performance degrades in both speed and GPU memory if it is not compiled.
Track Logs (torch.profile)
Testbed
2.5.0.dev20240814+cu121
2024.8.15+cu121
12.1 (NVIDIA L20, SM89)
Codes to Reproduce Issues
The text was updated successfully, but these errors were encountered: