Add deferred calls to the sequencer IR and runtime #40

benvanik · 2019-10-13T19:55:24Z

A deferred call allows for explicit compile-time indication of which parts of the sequencer execution graph are optimal for coalescing and possible batching. To start we can use heuristics to identify candidates (large conv/matmul/etc) while in the future we can add cost analysis and profile-guided annotation. The runtime can trigger fiber yielding and manage the policy used to flush pending deferred calls.

Dynamic shapes will be required to effectively perform batching, however coalescing should be possible even with fully static shapes. Ideally we would be able to loosen static shaping of call trees to allow batching even when the input HLO is fully shaped by either inserting dynamic dimensions or making outer dimensions dynamic when it would cause no observable changes.

This pull request adds support for testing e2e GEMM on a CUDA backend for F16 input and F16 accumulation. - Functional testing of F16 GEMMs requires setting the right tolerance value to ensure correctness checks. - Tolerance-based testing is fragile and finding tolerances is hard. - Instead we fill the buffers as small integers centered at zero and test for equivalence. This pull requests adds `e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda` e2e GEMM test to CUDA backend. We have not total four e2e GEMM tests on CUDA backend. ```bash manigupta@manigupta-gpu-a100 ~/cpu_machine_workspace/repos/iree/iree_tree_1/iree-build-debug $ ctest -j96 -R e2e_matmul.*cuda 1/4 Test #41: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ... Passed 2.83 sec 2/4 Test #42: iree/tests/e2e/matmul/e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ... Passed 3.09 sec 3/4 Test #40: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulSimt_cuda_cuda ......... Passed 3.38 sec 4/4 Test #43: iree/tests/e2e/matmul/e2e_matmul_direct_f32_large_split_k_cuda_cuda ....................... Passed 4.43 sec ```

Automatically created Co-authored-by: OpenXLA Dep Roller <iree-github-actions-bot@google.com>

benvanik added the enhancement ➕ New feature or request label Oct 13, 2019

benvanik added this to the Cellular Batching milestone Oct 13, 2019

benvanik self-assigned this Oct 13, 2019

benvanik added this to Ideas in Runtime Development via automation Oct 13, 2019

benvanik added this to Ideas in Compiler Development via automation Oct 13, 2019

benvanik added the compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) label Mar 19, 2020

benvanik added the obsolete label Nov 21, 2020

benvanik closed this as completed Nov 21, 2020

Runtime Development automation moved this from Ideas to Done Nov 21, 2020

Compiler Development automation moved this from Ideas to Done Nov 21, 2020

This was referenced Dec 2, 2020

Merge main -> google #4057

Closed

Merge main -> google #4062

Merged

GMNGeoffrey mentioned this issue Mar 29, 2023

Add attention op as transform dialect op #12739

Merged

dpackwood mentioned this issue Sep 8, 2023

RaiseSpecialOps (iree-flow-raise-special-ops) causes compiler crash for some input #14933

Closed

stellaraccident pushed a commit that referenced this issue Sep 24, 2023

Update nightly dependencies (#40)

d219df6

Automatically created Co-authored-by: OpenXLA Dep Roller <iree-github-actions-bot@google.com>

gabeweisz mentioned this issue Mar 25, 2024

Failure : unimplemented: found unhandled case of expansion/collapse in aten.view #16887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add deferred calls to the sequencer IR and runtime #40

Add deferred calls to the sequencer IR and runtime #40

benvanik commented Oct 13, 2019

Add deferred calls to the sequencer IR and runtime #40

Add deferred calls to the sequencer IR and runtime #40

Comments

benvanik commented Oct 13, 2019