Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CPU] Enable 'iree-llvmcpu-reassociate-fp-reductions' by default #13822

Merged
merged 1 commit into from
Jul 18, 2023

Conversation

hanhanW
Copy link
Contributor

@hanhanW hanhanW commented May 26, 2023

This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.

Fixes #13706

@hanhanW
Copy link
Contributor Author

hanhanW commented May 26, 2023

@dcaballe we can enable the flag in this PR, it depends on #13821

I verified that we are able to compile BertLarge with the flag on!

…e-org#13685)

This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.
@hanhanW
Copy link
Contributor Author

hanhanW commented Jul 18, 2023

#14068 addresses the issue, so we are able to land this. @dcaballe I keep you as the author of the commit, FYI.

Copy link
Contributor

@dcaballe dcaballe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@hanhanW hanhanW added benchmarks:x86_64 Run default x86_64 benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:android-cpu Run default Android CPU benchmarks labels Jul 18, 2023
@github-actions
Copy link

Abbreviated Benchmark Summary

@ commit 1ed307c9f5d59cf5885224fa126d2b68c0de052b (vs. base 2c45bc14c8ce56cb5f90ae322502f1a89c1d6753)

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 4218.825 (vs. 5087.930, 17.08%↓) 4214.648 83.617
MobileNetV3Small\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 891.194 (vs. 1020.261, 12.65%↓) 891.134 11.370
MobileNetV2\_fp32(tflite) [vmvx-generic-vmvx-vmvx][experimental-flags] local\_task(vmvx\_module)[4-thread,full-inference,system-scheduling] with zeros @ pixel-4[big-core] 5142.895 (vs. 5502.707, 6.54%↓) 5145.050 8.988

[Top 3 out of 4 results showed]

Improved Total Dispatch Sizes 🎉

Benchmark Name Total Dispatch Size (bytes)
EfficientNet\_int8(tflite) [riscv\_32-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 131064 (vs. 279384, 53.09%↓)
PersonDetect\_int8(tflite) [riscv\_32-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 87272 (vs. 174328, 49.94%↓)
EfficientNet\_int8(tflite) [riscv\_64-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 87544 (vs. 144376, 39.36%↓)

[Top 3 out of 15 results showed]

Improved Total Artifact Sizes 🎉

Benchmark Name Total Artifact Size (bytes)
PersonDetect\_int8(tflite) [riscv\_32-generic-linux\_gnu-llvm\_cpu][default-flags,compile-stats] 347143 (vs. 434183, 20.05%↓)

For more information:

Source Workflow Run

@MaheshRavishankar
Copy link
Contributor

Nice!

@hanhanW hanhanW merged commit 437a4e3 into iree-org:main Jul 18, 2023
@hanhanW hanhanW deleted the fp-res branch July 18, 2023 17:29
@MaheshRavishankar
Copy link
Contributor

Looks like this fails post submit tests. Is that flaky?

@pzread
Copy link
Contributor

pzread commented Jul 18, 2023

Looks like this fails post submit tests. Is that flaky?

Re-ran and the build passed, looks like the issue of #14417

nhasabni pushed a commit to plaidml/iree that referenced this pull request Aug 24, 2023
…e-org#13822)

This PR enables fp reduction reassociation by default. When this flag is disabled, we are basically not vectorizing the reduction dimension at all, which results in extra unrolling of scalar instructions. It's difficult that an external user really understands the implications of this flag and that it has to be enabled to get some performance on fp reductions.

Fixes iree-org#13706

Co-authored-by: Diego Caballero <diegocaballero@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:x86_64 Run default x86_64 benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Failed to compile reduction + broadcast dispatch in CPU backend
4 participants