tensor-layouts 0.1
This is the first release of the tensor-layouts library — a pure-Python implementation of NVIDIA's CuTe layout algebra. No GPU required.
Highlights
- Full CuTe layout algebra — compose, complement, logical_divide, logical_product, coalesce, flatten, upcast, downcast, and more
- Swizzle support — XOR-based bank conflict avoidance patterns with Swizzle(B, M, S)
- MMA atom definitions for NVIDIA and AMD architectures:
- NVIDIA: SM70 (Volta), SM75 (Turing), SM80 (Ampere), SM89 (Ada), SM90 (Hopper GMMA), SM100 (Blackwell UMMA), SM120 (Blackwell B200)
- AMD: CDNA1 (MI100), CDNA2 (MI200), CDNA3 (MI300), CDNA3+ (gfx950)
- Rich visualization (pip install tensor-layouts[viz]):
- Layout grids with thread/value coloring
- Swizzle before/after comparison views
- MMA atom thread-value layouts
- Combined tiled MMA grids
- Copy atom layouts
- Hierarchical N-level rendering with color-coded boundaries
- Oracle tests that cross-validate against NVIDIA's pycute and AMD's matrix instruction calculator
- Zero dependencies for the core library; only matplotlib needed for visualization
Install
pip install tensor-layouts # core
pip install tensor-layouts[viz] # with visualization
Links
Documentation: https://github.com/facebookresearch/tensor-layouts
PyPI: https://pypi.org/project/tensor-layouts/