Loosen static shapes on deferred calls to allow cellular batching #43

benvanik · 2019-10-13T20:04:08Z

After identifying good targets for cellular batching (#41) we'll need to ensure we can actually batch. Though coalescing is possible even if batching is not and often still provides throughput benefits the real wins come from increasing arithmetic density of the GEMVs. We should be able to detect which shape dimensions we can make partial for a given deferred call body and do so.

benvanik · 2020-11-22T02:02:35Z

This will likely shake out from other work related to semantically deduping executables and reducing compiled binary sizes.

This pull request adds support for testing e2e GEMM on a CUDA backend for F16 input and F16 accumulation. - Functional testing of F16 GEMMs requires setting the right tolerance value to ensure correctness checks. - Tolerance-based testing is fragile and finding tolerances is hard. - Instead we fill the buffers as small integers centered at zero and test for equivalence. This pull requests adds `e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda` e2e GEMM test to CUDA backend. We have not total four e2e GEMM tests on CUDA backend. ```bash manigupta@manigupta-gpu-a100 ~/cpu_machine_workspace/repos/iree/iree_tree_1/iree-build-debug $ ctest -j96 -R e2e_matmul.*cuda 1/4 Test #41: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ... Passed 2.83 sec 2/4 Test #42: iree/tests/e2e/matmul/e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ... Passed 3.09 sec 3/4 Test #40: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulSimt_cuda_cuda ......... Passed 3.38 sec 4/4 Test #43: iree/tests/e2e/matmul/e2e_matmul_direct_f32_large_split_k_cuda_cuda ....................... Passed 4.43 sec ```

Automatically created Co-authored-by: OpenXLA Dep Roller <iree-github-actions-bot@google.com>

benvanik added the enhancement ➕ New feature or request label Oct 13, 2019

benvanik added this to the Cellular Batching milestone Oct 13, 2019

benvanik added this to Ideas in Compiler Development via automation Oct 13, 2019

benvanik added the compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) label Mar 19, 2020

benvanik closed this as completed Nov 22, 2020

benvanik added the obsolete label Nov 22, 2020

GMNGeoffrey mentioned this issue Mar 29, 2023

Add attention op as transform dialect op #12739

Merged

dpackwood mentioned this issue Sep 8, 2023

RaiseSpecialOps (iree-flow-raise-special-ops) causes compiler crash for some input #14933

Closed

stellaraccident pushed a commit that referenced this issue Sep 24, 2023

Update nightly dependencies (#43)

ed1ec20

Automatically created Co-authored-by: OpenXLA Dep Roller <iree-github-actions-bot@google.com>

gabeweisz mentioned this issue Mar 25, 2024

Failure : unimplemented: found unhandled case of expansion/collapse in aten.view #16887

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Loosen static shapes on deferred calls to allow cellular batching #43

Loosen static shapes on deferred calls to allow cellular batching #43

benvanik commented Oct 13, 2019

benvanik commented Nov 22, 2020

Loosen static shapes on deferred calls to allow cellular batching #43

Loosen static shapes on deferred calls to allow cellular batching #43

Comments

benvanik commented Oct 13, 2019

benvanik commented Nov 22, 2020