tensor-layouts 0.3.1
What's Changed
New analysis helpers
aliasing_profile()— analysis helper for detecting layout aliasing patterns (thanks @soumyadipsarkar, #11)thread_stride_profile()— analysis helper for inspecting per-thread stride behavior (thanks @soumyadipsarkar)gap_profile()— layout sparsity analysis helper (thanks @soumyadipsarkar)
Layout API
is_empty()— helper for the unit/empty-shape layout (rank 0, size 1, multiplicative identity for composition and concatenation); distinct from a zero-sized layout likeLayout((0,), (0,))as_list()— helper that replaces thelist(as_tuple(...))pattern when shapes/strides need to be mutatedis_afine_layout()→is_affine()— renamed and widened to apply to any type with a.layoutattribute; structural check (a swizzle-freeComposedLayoutstill returnsFalse, since there is no machinery to coalesce one back into a flatLayoutwith non-zero preoffset)
Robustness
- Reverse swizzle composition — fixed, with new pycute and CuTe C++ oracle regressions covering the failure mode
- Exact MMA tile sizes —
tile_mma_grid()now rejects tile_mnk values that are not exact multiples of the natural MMA atom shape, instead of silently floor-dividing and producing a smaller-than-requested grid. - Tighter layout helper validation across
layout_utils.py - Flat 1D tensor storage required —
Tensornow rejects multi-dimensional storage backings, with clearer error messages and updateddocs/tensor_api.md - Negative layout shapes rejected at
Layoutconstruction - Internal
compose()/divide()asserts promoted toTypeError— checks now survivepython -O.src/tensor_layouts/is now assert-free in production paths.
Refactors (no functional change)
compose()split into one helper per(lhs, rhs)case for readability_draw_grid()split into single-purpose passes (font auto-sizing, base cells, hierarchy overlays, highlight overlay, value/margin labels)- Single-axis figure builders routed through a shared
_new_axes()helper so matplotlib defaults can be tuned in one place explain()dispatched through a function→handler table, keyed on the callable so wrappers/aliases resolve correctly- IPython detection cleaned up
tests/tensor.pyconverted to flatdef test_*()style, matching every other test module
Docs
- README: links to the algorithms / applications / GEMM example notebooks, plus a few meaningful external references
tests/<name>.pymirrorssrc/tensor_layouts/<name>.pynaming convention spelled out inpyproject.tomlandCONTRIBUTING.md- SM90 GMMA preamble explaining the warpgroup-level (128-thread) convention with shared-memory operands behind a hardware descriptor — distinct from the warp-level SM_70/80 atoms next to it
- Thread-Value
(T, V)layout convention documented in_tv_dimensions, so callers ofbank_conflicts,coalescing_efficiency, etc. can interpret mode 0 vs mode 1+ - AMD
make_mfma_atom()parameters documented CDNA_4x4x4naming-vs-shape note clarified- Bit-twiddling intent in
make_swizzleclarified viz_api.mdkeyword fixed:num_shades→num_colors(examples would have hitTypeError)- NVIDIA CuTe quickstart link fixed (thanks @soumyadipsarkar)
Other
Full Changelog: v0.3.0...v0.3.1