Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deferred calls to the sequencer IR and runtime #40

Closed
benvanik opened this issue Oct 13, 2019 · 0 comments
Closed

Add deferred calls to the sequencer IR and runtime #40

benvanik opened this issue Oct 13, 2019 · 0 comments
Assignees
Labels
compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) enhancement ➕ New feature or request

Comments

@benvanik
Copy link
Collaborator

A deferred call allows for explicit compile-time indication of which parts of the sequencer execution graph are optimal for coalescing and possible batching. To start we can use heuristics to identify candidates (large conv/matmul/etc) while in the future we can add cost analysis and profile-guided annotation. The runtime can trigger fiber yielding and manage the policy used to flush pending deferred calls.

Dynamic shapes will be required to effectively perform batching, however coalescing should be possible even with fully static shapes. Ideally we would be able to loosen static shaping of call trees to allow batching even when the input HLO is fully shaped by either inserting dynamic dimensions or making outer dimensions dynamic when it would cause no observable changes.

@benvanik benvanik added the enhancement ➕ New feature or request label Oct 13, 2019
@benvanik benvanik added this to the Cellular Batching milestone Oct 13, 2019
@benvanik benvanik self-assigned this Oct 13, 2019
@benvanik benvanik added this to Ideas in Runtime Development via automation Oct 13, 2019
@benvanik benvanik added this to Ideas in Compiler Development via automation Oct 13, 2019
@benvanik benvanik added the compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) label Mar 19, 2020
Runtime Development automation moved this from Ideas to Done Nov 21, 2020
Compiler Development automation moved this from Ideas to Done Nov 21, 2020
This was referenced Dec 2, 2020
manishucsd pushed a commit that referenced this issue Nov 9, 2022
This pull request adds support for testing e2e GEMM on a CUDA backend for F16 input and F16 accumulation.

- Functional testing of F16 GEMMs requires setting the right tolerance value to ensure correctness checks.
- Tolerance-based testing is fragile and finding tolerances is hard. 
- Instead we fill the buffers as small integers centered at zero and test for equivalence.

This pull requests adds `e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda` e2e GEMM test to CUDA backend. We have not total four e2e GEMM tests on CUDA backend.

```bash
manigupta@manigupta-gpu-a100 ~/cpu_machine_workspace/repos/iree/iree_tree_1/iree-build-debug $ ctest -j96 -R e2e_matmul.*cuda

1/4 Test #41: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ...   Passed    2.83 sec
2/4 Test #42: iree/tests/e2e/matmul/e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ...   Passed    3.09 sec
3/4 Test #40: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulSimt_cuda_cuda .........   Passed    3.38 sec
4/4 Test #43: iree/tests/e2e/matmul/e2e_matmul_direct_f32_large_split_k_cuda_cuda .......................   Passed    4.43 sec
```
stellaraccident pushed a commit that referenced this issue Sep 24, 2023
Automatically created

Co-authored-by: OpenXLA Dep Roller <iree-github-actions-bot@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) enhancement ➕ New feature or request
Projects
No open projects
Development

No branches or pull requests

1 participant