tensor-layouts 0.2.1
What's Changed
Negative stride support
- Full negative stride support across Layout, Tensor, analysis, and visualization — cosize() and compose() decompose by magnitude and carry sign, matching CuTe C++; Tensor.view() preserves base offset; storage validation uses true addressed range instead of cosize alone
- Analysis functions (coalescing_efficiency, segment_analysis, per_group_coalescing, cycles, order) rebase the addressed footprint to a local origin for negative-stride layouts
- Visualization TV mapping rebases negative offsets; explicit cell_labels no longer use Python negative-index wraparound
CuTe conformance fixes
- left_inverse for non-contiguous (padded) layouts — complete rewrite
- compose to truncate unreachable modes before the divisibility check (§3.3.2 of arXiv:2603.02298v1)
- compose and logical_divide for nested tuple tilers
- zipped_divide, tiled_divide, flat_divide to preserve Layout tiler strides instead of silently degrading to shape tilers
- Canonicalize stride to 0 for unit-extent modes in logical_divide
- Layout.call(None) as full-slice identity, matching CuTe's slice(_, layout)
Tensor
- tensor[:] whole-view full slice, matching the explicit tensor[:, :] behavior
- Preserve swizzle attribute in slice_and_offset sublayout results
Analysis
- explain(compose) crash on tuple tilers
- explain(logical_product) to use cosize(B) for complement bound
- Move exhaustive introspection helpers (image, is_injective, is_surjective, is_bijective, is_contiguous, functionally_equal) from layouts.py to analysis.py — keeps the core module efficient, O(size) enumeration is opt-in
Visualization
- Vertical arrangement in draw_swizzle for wide layouts — before/after grids stack top-to-bottom when columns exceed a threshold
Testing
- CuTe C++ oracle test suite — compiles regression cases directly against installed CUTLASS headers for compose, logical_divide, zipped_divide, tiled_divide, flat_divide, left_inverse, and logical_product; gracefully skips when CUTLASS or a C++ compiler is unavailable
- Paper examples test suite (arXiv:2603.02298v1) with --draw pytest option for visual output
- Fix duplicate test name shadowing draw_swizzle coverage
Robustness
- Reject free coordinates (slices, None) in Tensor.__setitem__ with a clear TypeError guiding users to the slice-then-index pattern
Cleanup
- Configure Ruff with correct src/tensor_layouts/ paths, add extend-exclude = ["*.ipynb"], fix lint warnings across the codebase
Docs & build
im2col figure and CONV→GEMM mapping clarification in applications notebook
Document shape_div strict scalar divisibility policy — intentional divergence from CuTe C++ ceil_div fallback
Full Changelog: v0.2.0...v0.2.1