Skip to content

tensor-layouts 0.1

Choose a tag to compare

@jduprat jduprat released this 17 Mar 22:51
· 201 commits to main since this release

This is the first release of the tensor-layouts library — a pure-Python implementation of NVIDIA's CuTe layout algebra. No GPU required.

Highlights

  • Full CuTe layout algebra — compose, complement, logical_divide, logical_product, coalesce, flatten, upcast, downcast, and more
  • Swizzle support — XOR-based bank conflict avoidance patterns with Swizzle(B, M, S)
  • MMA atom definitions for NVIDIA and AMD architectures:
    • NVIDIA: SM70 (Volta), SM75 (Turing), SM80 (Ampere), SM89 (Ada), SM90 (Hopper GMMA), SM100 (Blackwell UMMA), SM120 (Blackwell B200)
    • AMD: CDNA1 (MI100), CDNA2 (MI200), CDNA3 (MI300), CDNA3+ (gfx950)
  • Rich visualization (pip install tensor-layouts[viz]):
    • Layout grids with thread/value coloring
    • Swizzle before/after comparison views
    • MMA atom thread-value layouts
    • Combined tiled MMA grids
    • Copy atom layouts
    • Hierarchical N-level rendering with color-coded boundaries
  • Oracle tests that cross-validate against NVIDIA's pycute and AMD's matrix instruction calculator
  • Zero dependencies for the core library; only matplotlib needed for visualization

Install

pip install tensor-layouts        # core
pip install tensor-layouts[viz]   # with visualization

Links
Documentation: https://github.com/facebookresearch/tensor-layouts
PyPI: https://pypi.org/project/tensor-layouts/