Skip to content

CUTLASS 4.5.1

Latest

Choose a tag to compare

@hwu36 hwu36 released this 27 May 02:31
· 1 commit to release/4.5 since this release
2e60284

CuTe DSL

  • Bug fixing and improvements
    • Fixed following issues:
      #3219
      #3218
      #3212
      #3210
      #3208
      #3201
      #3227
    • Fixed Jax int64 stride divisibility issue
    • Fixed issues for SM120 blockscaled MMAs
      • added missing MXFP8MMAOP and MXF8F6F4MMAOP for sm120.

CUTLASS C++

  • Fix SM100 F8F6F4 SS MMA (1SM and 2SM) traits to use typed op templates.
  • Add UE8M0 (uniform exponent distribution) initialization support in tensor fill utilities.
  • Add cvt.rn.bf16x2.e4m3x2 conversion instruction support to numeric_conversion.h.
  • Update example 93 with paged KV cache support for Blackwell low-latency GQA.