Skip to content

Add TensorLocation enum + fused matmul epilogues#248

Merged
chhwang merged 2 commits into
mainfrom
pr2-fused-matmul-epilogues
May 26, 2026
Merged

Add TensorLocation enum + fused matmul epilogues#248
chhwang merged 2 commits into
mainfrom
pr2-fused-matmul-epilogues

Conversation

@chhwang
Copy link
Copy Markdown
Contributor

@chhwang chhwang commented May 25, 2026

  • Add TensorLocation enum (GLOBAL/SHARED/REGISTER) to ModelTensor
  • Add CUTLASS GEMM fused epilogue: gemm_with_functor template + FunctorScale, FunctorGelu, FunctorScaleExp, FunctorAdd
  • New ops: ModelOpMatmulScale, ModelOpMatmulGelu, ModelOpMma, ModelOpStore
  • matmul_scale fuses attention Q@K^T/sqrt(dk) into one kernel
  • matmul_gelu fuses FFN1 linear+GELU into one kernel
  • mma/store provide register-tagged output for future fusion

chhwang added 2 commits May 25, 2026 21:37
- Add TensorLocation enum (GLOBAL/SHARED/REGISTER) to ModelTensor
- Add CUTLASS GEMM fused epilogue: gemm_with_functor<F> template +
  FunctorScale, FunctorGelu, FunctorScaleExp, FunctorAdd
- New ops: ModelOpMatmulScale, ModelOpMatmulGelu, ModelOpMma, ModelOpStore
- matmul_scale fuses attention Q@K^T/sqrt(dk) into one kernel
- matmul_gelu fuses FFN1 linear+GELU into one kernel
- mma/store provide register-tagged output for future fusion
- Add Model::matmul_gelu/matmul_scale/matmul_add/mma/store declarations
  to model.hpp (fixes build — definitions without class declarations)
- Register new ops (MatmulScale, MatmulGelu, MatmulAdd, Mma, Store)
  in model_op.cpp factory
- Add accumulator precision comment in gemm_fused.h documenting that
  GemmConfiguration uses ElementC (half_t for fp16) as MMA accumulator
- Add data-type validation in MatmulScale/MatmulGelu impl_name
- Add stride/shape validation and BatchStride overrides in MatmulAdd
- Add defensive assertions in MatmulGelu/MatmulScale for impl_name
  substring extraction
- Document TensorLocation enum values in model_tensor.hpp
@chhwang chhwang marked this pull request as ready for review May 26, 2026 02:17
@chhwang chhwang merged commit e832949 into main May 26, 2026
5 of 11 checks passed
@chhwang chhwang deleted the pr2-fused-matmul-epilogues branch May 26, 2026 02:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant