-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Problem statement
The BLAS level-2 gemv kernel has no CuTe backend implementation in this project. The README BLAS table shows an empty CuTe column for gemv, which prevents full cross-backend coverage for matrix-vector multiply.
Without a CuTe gemv kernel:
- users cannot see how GEMV is expressed with CuTe’s layout and tiling abstractions,
- there is no CuTe performance baseline to compare against PyTorch and Triton
gemv, - CuTe-based higher-level modules lack a fundamental building block.
Proposed solution
Implement a CuTe-based gemv kernel matching the Python reference semantics and aligning with the project’s backend structure.
Concretely:
- Add a CuTe
gemvkernel in the appropriate CuTe backend directory, implementing$y = \alpha A x + \beta y$ . - Use CuTe primitives to describe the matrix layout, vector access, and thread scheduling.
- Ensure API parity with other backends so callers can dispatch to CuTe
gemvuniformly.
Alternatives considered
Alternatives such as omitting CuTe gemv or relying solely on other backends would:
- reduce the educational impact of comparing CuTe to PyTorch/Triton on GEMV,
- leave the CuTe column partially empty in the README BLAS table,
- limit the ability to build CuTe-based end-to-end examples.
Implementation details
- Establish the file layout and build integration for CuTe kernels.
- Implement
gemvusing CuTe constructs for row-major/column-major layouts and tiling. - Match numerical behaviour and broadcasting of scalars
alphaandbetawith the Python reference. - Integrate with planned testing and benchmarking for
gemv.
Use case
The CuTe gemv kernel will:
- demonstrate GEMV implementation in CuTe,
- enable detailed performance comparisons across backends,
- support more complex CuTe-based BLAS and Transformer kernels.
Related work
- CuTe/CUTLASS examples of GEMV/GEMM.
- Standard BLAS
gemvimplementations.
Additional context
This issue complements the gemv Python/PyTorch/Triton feature requests and helps complete the CuTe column in the README BLAS table.
Metadata
Metadata
Assignees
Labels
No labels