Fuse iota ops with consumers always. #14070

MaheshRavishankar · 2023-06-12T22:51:51Z

Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated linalg.indexed_generic were around it would still be a copy-like op).

Fixes #13745

Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated `linalg.indexed_generic` were around it would still be a copy-like op). Fixes iree-org#13745

antiagainst

Cool!

github-actions · 2023-06-13T00:39:14Z

Abbreviated Benchmark Summary

@ commit a3ef066ccf9d8065146c663739adacbfc27662bc (vs. base 96d92136809d2294afde3fc11f2eea53e7fa6a78)

Regressed Latencies 🚩

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu]	25.075 (vs. 20.749, 20.85%↑)	25.060	0.457
MobileNetV3Small\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with zeros @ moto-edge-x30[gpu]	4.947 (vs. 4.694, 5.38%↑)	5.047	0.382

Improved Latencies 🎉

Benchmark Name	Average Latency (ms)	Median Latency (ms)	Latency Standard Deviation (ms)
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,mmt4d] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	181.504 (vs. 198.355, 8.50%↓)	182.869	6.134
MobileNetV3Small\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,mmt4d] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core]	7.731 (vs. 8.340, 7.30%↓)	7.762	0.082
PoseNet\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,mmt4d] local\_sync(embedded\_elf)[full-inference,default-flags] with zeros @ pixel-6-pro[big-core]	28.315 (vs. 30.528, 7.25%↓)	28.449	0.508

[Top 3 out of 17 results showed]

Improved Stream IR Dispatch Count (# of cmd.dispatch ops) 🎉

Benchmark Name	Stream IR Dispatch Count (# of cmd.dispatch ops)
Unet2dPT(linalg) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats]	1045 (vs. 1046, 0.10%↓)
Unet2dPT(linalg) [nvidia-ampere-vulkan\_linux-vulkan\_spirv][experimental-flags,tensorcore,compile-stats]	1205 (vs. 1206, 0.08%↓)
Unet2dPT(linalg) [nvidia-pascal-vulkan\_linux-vulkan\_spirv][experimental-flags,simt,compile-stats]	1205 (vs. 1206, 0.08%↓)

For more information:

Source Workflow Run

Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated linalg.indexed_generic were around it would still be a copy-like op). Fixes iree-org#13745

Fuse iota ops with consumers always.

4ffc394

Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated `linalg.indexed_generic` were around it would still be a copy-like op). Fixes iree-org#13745

MaheshRavishankar requested a review from hanhanW as a code owner June 12, 2023 22:51

MaheshRavishankar requested a review from benvanik June 12, 2023 22:52

hanhanW approved these changes Jun 12, 2023

View reviewed changes

antiagainst approved these changes Jun 12, 2023

View reviewed changes

MaheshRavishankar merged commit e0d36f9 into iree-org:main Jun 13, 2023
63 of 67 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fuse iota ops with consumers always. #14070

Fuse iota ops with consumers always. #14070

MaheshRavishankar commented Jun 12, 2023

antiagainst left a comment

github-actions bot commented Jun 13, 2023 •

edited

Loading

Fuse iota ops with consumers always. #14070

Fuse iota ops with consumers always. #14070

Conversation

MaheshRavishankar commented Jun 12, 2023

antiagainst left a comment

Choose a reason for hiding this comment

github-actions bot commented Jun 13, 2023 • edited Loading

Abbreviated Benchmark Summary

Regressed Latencies 🚩

Improved Latencies 🎉

Improved Stream IR Dispatch Count (# of cmd.dispatch ops) 🎉

github-actions bot commented Jun 13, 2023 •

edited

Loading