Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuse iota ops with consumers always. #14070

Merged
merged 1 commit into from
Jun 13, 2023

Conversation

MaheshRavishankar
Copy link
Contributor

Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated linalg.indexed_generic were around it would still be a copy-like op).

Fixes #13745

Current fusion heuristics always fuse copy-like ops with its
consumers. Iota ops are also copy-like ops (indeed if the deprecated
`linalg.indexed_generic` were around it would still be a copy-like
op).

Fixes iree-org#13745
Copy link
Contributor

@antiagainst antiagainst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@MaheshRavishankar MaheshRavishankar added benchmarks:cuda Run default CUDA benchmarks benchmarks:x86_64 Run default x86_64 benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:android-gpu Run default Android GPU benchmarks benchmarks:vulkan-nvidia Run default Vulkan benchmarks on NVIDIA GPU labels Jun 12, 2023
@github-actions
Copy link

github-actions bot commented Jun 13, 2023

Abbreviated Benchmark Summary

@ commit a3ef066ccf9d8065146c663739adacbfc27662bc (vs. base 96d92136809d2294afde3fc11f2eea53e7fa6a78)

Regressed Latencies 🚩

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MiniLML12H384Uncased(stablehlo) [x86\_64-cascadelake-linux\_gnu-llvm\_cpu][default-flags] local\_task(embedded\_elf)[8-thread,full-inference,default-flags] with zeros @ c2-standard-16[cpu] 25.075 (vs. 20.749, 20.85%↑) 25.060 0.457
MobileNetV3Small\_fp32(tflite) [qualcomm-adreno-vulkan\_android31-vulkan\_spirv][experimental-flags,fuse-padding] vulkan(none)[full-inference,default-flags] with zeros @ moto-edge-x30[gpu] 4.947 (vs. 4.694, 5.38%↑) 5.047 0.382

Improved Latencies 🎉

Benchmark Name Average Latency (ms) Median Latency (ms) Latency Standard Deviation (ms)
MobileBertSquad\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,mmt4d] local\_task(embedded\_elf)[4-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 181.504 (vs. 198.355, 8.50%↓) 182.869 6.134
MobileNetV3Small\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,mmt4d] local\_task(embedded\_elf)[1-thread,full-inference,system-scheduling] with zeros @ pixel-6-pro[big-core] 7.731 (vs. 8.340, 7.30%↓) 7.762 0.082
PoseNet\_fp32(tflite) [armv8.2-a-generic-linux\_android29-llvm\_cpu][experimental-flags,mmt4d] local\_sync(embedded\_elf)[full-inference,default-flags] with zeros @ pixel-6-pro[big-core] 28.315 (vs. 30.528, 7.25%↓) 28.449 0.508

[Top 3 out of 17 results showed]

Improved Stream IR Dispatch Count (# of cmd.dispatch ops) 🎉

Benchmark Name Stream IR Dispatch Count (# of cmd.dispatch ops)
Unet2dPT(linalg) [cuda-sm\_80-linux\_gnu-cuda][default-flags,compile-stats] 1045 (vs. 1046, 0.10%↓)
Unet2dPT(linalg) [nvidia-ampere-vulkan\_linux-vulkan\_spirv][experimental-flags,tensorcore,compile-stats] 1205 (vs. 1206, 0.08%↓)
Unet2dPT(linalg) [nvidia-pascal-vulkan\_linux-vulkan\_spirv][experimental-flags,simt,compile-stats] 1205 (vs. 1206, 0.08%↓)

For more information:

Source Workflow Run

@MaheshRavishankar MaheshRavishankar merged commit e0d36f9 into iree-org:main Jun 13, 2023
63 of 67 checks passed
nhasabni pushed a commit to plaidml/iree that referenced this pull request Aug 24, 2023
Current fusion heuristics always fuse copy-like ops with its consumers. Iota ops are also copy-like ops (indeed if the deprecated linalg.indexed_generic were around it would still be a copy-like op).

Fixes iree-org#13745
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarks:android-cpu Run default Android CPU benchmarks benchmarks:android-gpu Run default Android GPU benchmarks benchmarks:comp-stats Run default compilation statistics benchmarks benchmarks:cuda Run default CUDA benchmarks benchmarks:vulkan-nvidia Run default Vulkan benchmarks on NVIDIA GPU benchmarks:x86_64 Run default x86_64 benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HLO iota should fuse with other ops and not (often) require materialization into memory.
3 participants