What's Changed
New analysis helpers
permutation_parity()andis_even_permutation()— detect orientation of dense, injective layouts (thanks @neuralsorcerer, #18)from_F2_matrix()— inverse constructor forto_F2_matrix()with affine + brute-force-Swizzle-extraction reconstruction; round-trip identity holds.to_F2_matrix()strengthened to accept any F2-linearComposedLayout.
Tensor API
Tensor.to_list()andTensor.copy_from()— flat copy / snapshot helpers (thanks @neuralsorcerer, #23)
Layout API
Layoutis now purely affine. Theswizzle=constructor kwarg and theLayout.swizzleattribute are removed;ComposedLayoutis the canonical (and only) carrier for every swizzled / non-affine form. Code that builtLayout(..., swizzle=Sw)directly should switch tocompose(Sw, layout)orComposedLayout(Sw, layout).Layout.__repr__is now exact eval-roundtrip (Layout(shape, stride)).ComposedLayout.preoffset→ComposedLayout.offset(renamed).ComposedLayout'soffsetis now keyword-only — bothComposedLayout(Sw, L, k)and the CuTe-styleComposedLayout(Sw, k, L)porting trap now fail at the call site.Swizzleis now allowed inComposedLayout's inner slot — the inverse-formComposedLayout(Layout, offset, Swizzle)arising fromright_inverse/left_inverseon offset-bearing swizzle-fronted ComposedLayouts.coalesce()on this form is a no-op (rank-1; no structure to merge).complement()now forwards throughComposedLayout(was unsupported).split_outer_swizzle()— new public structural recogniser for the canonicalComposedLayout(Sw, L, offset=0)form. Replaces the private_split_zero_offset_swizzlethattensor.pyhad been reaching into.LayoutError(ValueError),UnsupportedComposedLayoutError(NotImplementedError),TensorStorageError(ValueError)— new exception classes for catching layout-algebra and tensor-storage errors specifically. Existingexcept ValueError/except NotImplementedErrorhandlers continue to catch them.
Fixes
- Tensor offset alignment with CuTe. A
Tensorover a Layout-with-embedded-swizzle previously folded the external offset into the swizzle's input domain (Sw(offset + L(coord))) while aTensorover aComposedLayoutadded the offset AFTER the layout call. The two forms thus disagreed on addresses for nonzeroTensoroffset. Both forms now follow CuTe'stensor(coord) == tensor.offset + tensor.layout(coord). cosize(ComposedLayout)now usesmax(L(i)) + 1enumeration over the full domain — the previous delegation to inner-or-outer mis-reported the codomain extent for five common forms and could cause buffer under-allocation. O(n) instead of O(1); cached on the instance for amortised cost.cosize()on embedded-swizzleLayoutfor non-power-of-2 shapes now correctly accounts for the swizzle's XOR (e.g.Layout(5, 1, swizzle=Swizzle(2, 0, 2))was reporting cosize 5, true value is 6).- Surjectivity checks for explicit and shifted codomains (thanks @neuralsorcerer, #21)
- Transfer swizzle through
logical_productagainst swizzled tiles (was silently dropping the embedded swizzle and returning a semantically wrong plain layout). Tensor[(slice(None), 0), 1](slice nested in a hierarchical coordinate tuple) now raisesTypeErrorinstead of being silently passed through toslice_and_offset.- Drop the
typing.Selffallback that imported the undeclaredtyping_extensions— wouldImportErroron a fresh 3.10 install.
Robustness
- Reject
complement/logical_product/logical_divideon the inverse-formComposedLayout(Layout, offset, Swizzle)withNotImplementedError. - Reject
logical_productonComposedLayout(Layout, Swizzle, offset)(was crashing in the affine fallback withAttributeErroron.stride). as_affine_layout()performs an explicitis_affine()post-check; the error points callers atas_layout_expr()for the non-affine path.coalescing_efficiency/segment_analysisvalidatewarp_size > 0(was silently producing nonsense frommin(thread_count, 0)).vizraises an actionableImportErrorpointing atpip install tensor-layouts[viz]when matplotlib/numpy are missing, instead of surfacing a deepModuleNotFoundErroron matplotlib internals.- Aligned four pre-existing exception-class inconsistencies before introducing the new hierarchy:
to_F2_matrixF6 rejectionValueError→NotImplementedError;slice_modes/dice_modesstructure mismatchTypeError→ValueError;prefix_product/suffix_producttuple-init-on-scalarTypeError→ValueError;_validate_order_permutation'not iterable'ValueError→TypeError.
Performance
cosize()results cached on eachComposedLayoutinstance and on swizzledLayoutinstances (theComposedLayoutcache uses a declarative dataclass field withinit/repr/eq/hashallFalse, so cached and uncached layouts still compare equal and remain dict-key compatible)._address_boundshas an O(1) fast path for the canonicalSw o Lform (bounds = (offset, offset + cosize(layout) - 1)), replacing the O(size) per-coordinate walk in_validate_storage. Works for anyTensoroffset onComposedLayout.complement(ComposedLayout)decays swizzled slices to plainLayoutwhen the swizzle's Y/Z bits aren't both touched on the surviving subspace.
Refactors (no functional change)
layouts.py(4.4k LOC) split into alayouts/package with three layered modules:core(exceptions, type predicates, tuple operations,Layout,Tile,Swizzle),expr(ComposedLayoutand theLayoutExpr = Layout | ComposedLayoutpredicates / coercers), andalgebra(compose, complement, divide, product, inverses, ...). Dependency direction is strictly one-way (core ← expr ← algebra), enforced by the import graph. No public API change — every name previously importable fromtensor_layouts.layoutsremains importable from the same path.bank_conflicts/per_group_bank_conflictsdeduped via shared_bank_conflicts_for_thread_range.coalescing_efficiency/per_group_coalescingdeduped via shared_coalescing_for_thread_range.Layout._calculate_max_offsetmoved to module-level_affine_max_offset(the staticmethod never usedself).- Internal
_affine_inner→_strip_swizzlerename.
CI and Tests
- Python 3.14 added to the CI matrix and a lint job added (thanks @neuralsorcerer, #17)
- 20 new CuTe C++ oracle entries pinning
complement,coalesce,compose,right_inverseon ComposedLayout form variants F2-F8;compose_truncation_paperoracle case for paper section 3.3.3. - CuTe C++ oracle:
CUTLASS_PATH/CUTLASS_INCLUDE_DIRenv-var override for out-of-tree CUTLASS installations. - 32 hand-written AMD oracle C-layout per-atom tests parametrized into a single
ORACLE_C_LAYOUT_CASES-driven test. examples/composed.pyadded tomake examples(was the only example not exercised by the smoke target).
Docs
docs/layout_api.md/docs/tensor_api.md/docs/analysis_api.mdand theexamples/rewritten to reflect Layout-becomes-purely-affine: no more "Layout may also carry one canonical final swizzle" framing, single-formLayout(shape, stride)repr,compose(Sw, L)always returnsComposedLayout(Sw, L, 0).- New 'Constructor signature vs CuTe / pycute' subsection in
docs/layout_api.mddocumenting theComposedLayout(outer, inner, offset=k)ordering vs CuTe's positionalComposedLayout<A, Offset, B>. permutation_parity/is_even_permutationdocumented indocs/analysis_api.md(thanks @neuralsorcerer)- Document supported / unsupported ops for the inverse-form
ComposedLayoutin the class docstring anddocs/layout_api.md.to_F2_matrix/from_F2_matrixdocumented indocs/analysis_api.md.
Other
Full Changelog: v0.3.1...v0.3.2