Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loosen static shapes on deferred calls to allow cellular batching #43

Closed
benvanik opened this issue Oct 13, 2019 · 1 comment
Closed
Labels
compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) enhancement ➕ New feature or request

Comments

@benvanik
Copy link
Collaborator

After identifying good targets for cellular batching (#41) we'll need to ensure we can actually batch. Though coalescing is possible even if batching is not and often still provides throughput benefits the real wins come from increasing arithmetic density of the GEMVs. We should be able to detect which shape dimensions we can make partial for a given deferred call body and do so.

@benvanik benvanik added the enhancement ➕ New feature or request label Oct 13, 2019
@benvanik benvanik added this to the Cellular Batching milestone Oct 13, 2019
@benvanik benvanik added this to Ideas in Compiler Development via automation Oct 13, 2019
@benvanik benvanik added the compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) label Mar 19, 2020
@benvanik
Copy link
Collaborator Author

This will likely shake out from other work related to semantically deduping executables and reducing compiled binary sizes.

manishucsd added a commit that referenced this issue Nov 9, 2022
This pull request adds support for testing e2e GEMM on a CUDA backend for F16 input and F16 accumulation.

- Functional testing of F16 GEMMs requires setting the right tolerance value to ensure correctness checks.
- Tolerance-based testing is fragile and finding tolerances is hard. 
- Instead we fill the buffers as small integers centered at zero and test for equivalence.

This pull requests adds `e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda` e2e GEMM test to CUDA backend. We have not total four e2e GEMM tests on CUDA backend.

```bash
manigupta@manigupta-gpu-a100 ~/cpu_machine_workspace/repos/iree/iree_tree_1/iree-build-debug $ ctest -j96 -R e2e_matmul.*cuda

1/4 Test #41: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ...   Passed    2.83 sec
2/4 Test #42: iree/tests/e2e/matmul/e2e_matmul_direct_f16_gpu_large_LLVMGPUMatmulTensorCore_cuda_cuda ...   Passed    3.09 sec
3/4 Test #40: iree/tests/e2e/matmul/e2e_matmul_direct_f32_gpu_large_LLVMGPUMatmulSimt_cuda_cuda .........   Passed    3.38 sec
4/4 Test #43: iree/tests/e2e/matmul/e2e_matmul_direct_f32_large_split_k_cuda_cuda .......................   Passed    4.43 sec
```
stellaraccident pushed a commit that referenced this issue Sep 24, 2023
Automatically created

Co-authored-by: OpenXLA Dep Roller <iree-github-actions-bot@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/dialects Relating to the IREE compiler dialects (flow, hal, vm) enhancement ➕ New feature or request
Projects
No open projects
Development

No branches or pull requests

1 participant