Skip to content

tensor-layouts 0.3.1

Choose a tag to compare

@jduprat jduprat released this 22 Apr 20:35
· 47 commits to main since this release

What's Changed

New analysis helpers

  • aliasing_profile() — analysis helper for detecting layout aliasing patterns (thanks @soumyadipsarkar, #11)
  • thread_stride_profile() — analysis helper for inspecting per-thread stride behavior (thanks @soumyadipsarkar)
  • gap_profile() — layout sparsity analysis helper (thanks @soumyadipsarkar)

Layout API

  • is_empty() — helper for the unit/empty-shape layout (rank 0, size 1, multiplicative identity for composition and concatenation); distinct from a zero-sized layout like Layout((0,), (0,))
  • as_list() — helper that replaces the list(as_tuple(...)) pattern when shapes/strides need to be mutated
  • is_afine_layout()is_affine() — renamed and widened to apply to any type with a .layout attribute; structural check (a swizzle-free ComposedLayout still returns False, since there is no machinery to coalesce one back into a flat Layout with non-zero preoffset)

Robustness

  • Reverse swizzle composition — fixed, with new pycute and CuTe C++ oracle regressions covering the failure mode
  • Exact MMA tile sizestile_mma_grid() now rejects tile_mnk values that are not exact multiples of the natural MMA atom shape, instead of silently floor-dividing and producing a smaller-than-requested grid.
  • Tighter layout helper validation across layout_utils.py
  • Flat 1D tensor storage requiredTensor now rejects multi-dimensional storage backings, with clearer error messages and updated docs/tensor_api.md
  • Negative layout shapes rejected at Layout construction
  • Internal compose() / divide() asserts promoted to TypeError — checks now survive python -O. src/tensor_layouts/ is now assert-free in production paths.

Refactors (no functional change)

  • compose() split into one helper per (lhs, rhs) case for readability
  • _draw_grid() split into single-purpose passes (font auto-sizing, base cells, hierarchy overlays, highlight overlay, value/margin labels)
  • Single-axis figure builders routed through a shared _new_axes() helper so matplotlib defaults can be tuned in one place
  • explain() dispatched through a function→handler table, keyed on the callable so wrappers/aliases resolve correctly
  • IPython detection cleaned up
  • tests/tensor.py converted to flat def test_*() style, matching every other test module

Docs

  • README: links to the algorithms / applications / GEMM example notebooks, plus a few meaningful external references
  • tests/<name>.py mirrors src/tensor_layouts/<name>.py naming convention spelled out in pyproject.toml and CONTRIBUTING.md
  • SM90 GMMA preamble explaining the warpgroup-level (128-thread) convention with shared-memory operands behind a hardware descriptor — distinct from the warp-level SM_70/80 atoms next to it
  • Thread-Value (T, V) layout convention documented in _tv_dimensions, so callers of bank_conflicts, coalescing_efficiency, etc. can interpret mode 0 vs mode 1+
  • AMD make_mfma_atom() parameters documented
  • CDNA_4x4x4 naming-vs-shape note clarified
  • Bit-twiddling intent in make_swizzle clarified
  • viz_api.md keyword fixed: num_shadesnum_colors (examples would have hit TypeError)
  • NVIDIA CuTe quickstart link fixed (thanks @soumyadipsarkar)

Other

  • File restoration after inadvertent cross-project modifications (#4, thanks @paulshen / @oshannessy)

Full Changelog: v0.3.0...v0.3.1