Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[cpu] [inductor] decompose bmm for memory bound in lowering (#124826)
Fixes #124697. Resolve the issue of large regression of GPT-FAST MOE with `coordinate_descent_tuning` disabled. To get better perf for memory bound case, we decompose bmm in lowering. Pull Request resolved: #124826 Approved by: https://github.com/jgong5, https://github.com/jansel
- Loading branch information