Skip to content

tensor-layouts 0.3.2

Latest

Choose a tag to compare

@jduprat jduprat released this 15 May 18:09

What's Changed

New analysis helpers

  • permutation_parity() and is_even_permutation() — detect orientation of dense, injective layouts (thanks @neuralsorcerer, #18)
  • from_F2_matrix() — inverse constructor for to_F2_matrix() with affine + brute-force-Swizzle-extraction reconstruction; round-trip identity holds. to_F2_matrix() strengthened to accept any F2-linear ComposedLayout.

Tensor API

  • Tensor.to_list() and Tensor.copy_from() — flat copy / snapshot helpers (thanks @neuralsorcerer, #23)

Layout API

  • Layout is now purely affine. The swizzle= constructor kwarg and the Layout.swizzle attribute are removed; ComposedLayout is the canonical (and only) carrier for every swizzled / non-affine form. Code that built Layout(..., swizzle=Sw) directly should switch to compose(Sw, layout) or ComposedLayout(Sw, layout). Layout.__repr__ is now exact eval-roundtrip (Layout(shape, stride)).
  • ComposedLayout.preoffsetComposedLayout.offset (renamed).
  • ComposedLayout's offset is now keyword-only — both ComposedLayout(Sw, L, k) and the CuTe-style ComposedLayout(Sw, k, L) porting trap now fail at the call site.
  • Swizzle is now allowed in ComposedLayout's inner slot — the inverse-form ComposedLayout(Layout, offset, Swizzle) arising from right_inverse / left_inverse on offset-bearing swizzle-fronted ComposedLayouts. coalesce() on this form is a no-op (rank-1; no structure to merge).
  • complement() now forwards through ComposedLayout (was unsupported).
  • split_outer_swizzle() — new public structural recogniser for the canonical ComposedLayout(Sw, L, offset=0) form. Replaces the private _split_zero_offset_swizzle that tensor.py had been reaching into.
  • LayoutError(ValueError), UnsupportedComposedLayoutError(NotImplementedError), TensorStorageError(ValueError) — new exception classes for catching layout-algebra and tensor-storage errors specifically. Existing except ValueError / except NotImplementedError handlers continue to catch them.

Fixes

  • Tensor offset alignment with CuTe. A Tensor over a Layout-with-embedded-swizzle previously folded the external offset into the swizzle's input domain (Sw(offset + L(coord))) while a Tensor over a ComposedLayout added the offset AFTER the layout call. The two forms thus disagreed on addresses for nonzero Tensor offset. Both forms now follow CuTe's tensor(coord) == tensor.offset + tensor.layout(coord).
  • cosize(ComposedLayout) now uses max(L(i)) + 1 enumeration over the full domain — the previous delegation to inner-or-outer mis-reported the codomain extent for five common forms and could cause buffer under-allocation. O(n) instead of O(1); cached on the instance for amortised cost.
  • cosize() on embedded-swizzle Layout for non-power-of-2 shapes now correctly accounts for the swizzle's XOR (e.g. Layout(5, 1, swizzle=Swizzle(2, 0, 2)) was reporting cosize 5, true value is 6).
  • Surjectivity checks for explicit and shifted codomains (thanks @neuralsorcerer, #21)
  • Transfer swizzle through logical_product against swizzled tiles (was silently dropping the embedded swizzle and returning a semantically wrong plain layout).
  • Tensor[(slice(None), 0), 1] (slice nested in a hierarchical coordinate tuple) now raises TypeError instead of being silently passed through to slice_and_offset.
  • Drop the typing.Self fallback that imported the undeclared typing_extensions — would ImportError on a fresh 3.10 install.

Robustness

  • Reject complement / logical_product / logical_divide on the inverse-form ComposedLayout(Layout, offset, Swizzle) with NotImplementedError.
  • Reject logical_product on ComposedLayout(Layout, Swizzle, offset) (was crashing in the affine fallback with AttributeError on .stride).
  • as_affine_layout() performs an explicit is_affine() post-check; the error points callers at as_layout_expr() for the non-affine path.
  • coalescing_efficiency / segment_analysis validate warp_size > 0 (was silently producing nonsense from min(thread_count, 0)).
  • viz raises an actionable ImportError pointing at pip install tensor-layouts[viz] when matplotlib/numpy are missing, instead of surfacing a deep ModuleNotFoundError on matplotlib internals.
  • Aligned four pre-existing exception-class inconsistencies before introducing the new hierarchy: to_F2_matrix F6 rejection ValueErrorNotImplementedError; slice_modes / dice_modes structure mismatch TypeErrorValueError; prefix_product / suffix_product tuple-init-on-scalar TypeErrorValueError; _validate_order_permutation 'not iterable' ValueErrorTypeError.

Performance

  • cosize() results cached on each ComposedLayout instance and on swizzled Layout instances (the ComposedLayout cache uses a declarative dataclass field with init/repr/eq/hash all False, so cached and uncached layouts still compare equal and remain dict-key compatible).
  • _address_bounds has an O(1) fast path for the canonical Sw o L form (bounds = (offset, offset + cosize(layout) - 1)), replacing the O(size) per-coordinate walk in _validate_storage. Works for any Tensor offset on ComposedLayout.
  • complement(ComposedLayout) decays swizzled slices to plain Layout when the swizzle's Y/Z bits aren't both touched on the surviving subspace.

Refactors (no functional change)

  • layouts.py (4.4k LOC) split into a layouts/ package with three layered modules: core (exceptions, type predicates, tuple operations, Layout, Tile, Swizzle), expr (ComposedLayout and the LayoutExpr = Layout | ComposedLayout predicates / coercers), and algebra (compose, complement, divide, product, inverses, ...). Dependency direction is strictly one-way (core ← expr ← algebra), enforced by the import graph. No public API change — every name previously importable from tensor_layouts.layouts remains importable from the same path.
  • bank_conflicts / per_group_bank_conflicts deduped via shared _bank_conflicts_for_thread_range.
  • coalescing_efficiency / per_group_coalescing deduped via shared _coalescing_for_thread_range.
  • Layout._calculate_max_offset moved to module-level _affine_max_offset (the staticmethod never used self).
  • Internal _affine_inner_strip_swizzle rename.

CI and Tests

  • Python 3.14 added to the CI matrix and a lint job added (thanks @neuralsorcerer, #17)
  • 20 new CuTe C++ oracle entries pinning complement, coalesce, compose, right_inverse on ComposedLayout form variants F2-F8; compose_truncation_paper oracle case for paper section 3.3.3.
  • CuTe C++ oracle: CUTLASS_PATH / CUTLASS_INCLUDE_DIR env-var override for out-of-tree CUTLASS installations.
  • 32 hand-written AMD oracle C-layout per-atom tests parametrized into a single ORACLE_C_LAYOUT_CASES-driven test.
  • examples/composed.py added to make examples (was the only example not exercised by the smoke target).

Docs

  • docs/layout_api.md / docs/tensor_api.md / docs/analysis_api.md and the examples/ rewritten to reflect Layout-becomes-purely-affine: no more "Layout may also carry one canonical final swizzle" framing, single-form Layout(shape, stride) repr, compose(Sw, L) always returns ComposedLayout(Sw, L, 0).
  • New 'Constructor signature vs CuTe / pycute' subsection in docs/layout_api.md documenting the ComposedLayout(outer, inner, offset=k) ordering vs CuTe's positional ComposedLayout<A, Offset, B>.
  • permutation_parity / is_even_permutation documented in docs/analysis_api.md (thanks @neuralsorcerer)
  • Document supported / unsupported ops for the inverse-form ComposedLayout in the class docstring and docs/layout_api.md. to_F2_matrix / from_F2_matrix documented in docs/analysis_api.md.

Other

  • Revert CONTRIBUTING.md change from D101685100 (thanks @FindHao, #20)

Full Changelog: v0.3.1...v0.3.2