What's Changed

New analysis helpers

aliasing_profile() — analysis helper for detecting layout aliasing patterns (thanks @soumyadipsarkar, #11)
thread_stride_profile() — analysis helper for inspecting per-thread stride behavior (thanks @soumyadipsarkar)
gap_profile() — layout sparsity analysis helper (thanks @soumyadipsarkar)

is_empty() — helper for the unit/empty-shape layout (rank 0, size 1, multiplicative identity for composition and concatenation); distinct from a zero-sized layout like Layout((0,), (0,))
as_list() — helper that replaces the list(as_tuple(...)) pattern when shapes/strides need to be mutated
is_afine_layout() → is_affine() — renamed and widened to apply to any type with a .layout attribute; structural check (a swizzle-free ComposedLayout still returns False, since there is no machinery to coalesce one back into a flat Layout with non-zero preoffset)

Reverse swizzle composition — fixed, with new pycute and CuTe C++ oracle regressions covering the failure mode
Exact MMA tile sizes — tile_mma_grid() now rejects tile_mnk values that are not exact multiples of the natural MMA atom shape, instead of silently floor-dividing and producing a smaller-than-requested grid.
Tighter layout helper validation across layout_utils.py
Flat 1D tensor storage required — Tensor now rejects multi-dimensional storage backings, with clearer error messages and updated docs/tensor_api.md
Negative layout shapes rejected at Layout construction
Internal compose() / divide() asserts promoted to TypeError — checks now survive python -O. src/tensor_layouts/ is now assert-free in production paths.

compose() split into one helper per (lhs, rhs) case for readability
_draw_grid() split into single-purpose passes (font auto-sizing, base cells, hierarchy overlays, highlight overlay, value/margin labels)
Single-axis figure builders routed through a shared _new_axes() helper so matplotlib defaults can be tuned in one place
explain() dispatched through a function→handler table, keyed on the callable so wrappers/aliases resolve correctly
IPython detection cleaned up
tests/tensor.py converted to flat def test_*() style, matching every other test module

README: links to the algorithms / applications / GEMM example notebooks, plus a few meaningful external references
tests/<name>.py mirrors src/tensor_layouts/<name>.py naming convention spelled out in pyproject.toml and CONTRIBUTING.md
SM90 GMMA preamble explaining the warpgroup-level (128-thread) convention with shared-memory operands behind a hardware descriptor — distinct from the warp-level SM_70/80 atoms next to it
Thread-Value (T, V) layout convention documented in _tv_dimensions, so callers of bank_conflicts, coalescing_efficiency, etc. can interpret mode 0 vs mode 1+
AMD make_mfma_atom() parameters documented
CDNA_4x4x4 naming-vs-shape note clarified
Bit-twiddling intent in make_swizzle clarified
viz_api.md keyword fixed: num_shades → num_colors (examples would have hit TypeError)
NVIDIA CuTe quickstart link fixed (thanks @soumyadipsarkar)

File restoration after inadvertent cross-project modifications (#4, thanks @paulshen / @oshannessy)

Full Changelog: v0.3.0...v0.3.1