Skip to content

tensor-layouts 0.2.1

Choose a tag to compare

@jduprat jduprat released this 09 Apr 20:35
· 97 commits to main since this release

What's Changed

Negative stride support

  • Full negative stride support across Layout, Tensor, analysis, and visualization — cosize() and compose() decompose by magnitude and carry sign, matching CuTe C++; Tensor.view() preserves base offset; storage validation uses true addressed range instead of cosize alone
  • Analysis functions (coalescing_efficiency, segment_analysis, per_group_coalescing, cycles, order) rebase the addressed footprint to a local origin for negative-stride layouts
  • Visualization TV mapping rebases negative offsets; explicit cell_labels no longer use Python negative-index wraparound

CuTe conformance fixes

  • left_inverse for non-contiguous (padded) layouts — complete rewrite
  • compose to truncate unreachable modes before the divisibility check (§3.3.2 of arXiv:2603.02298v1)
  • compose and logical_divide for nested tuple tilers
  • zipped_divide, tiled_divide, flat_divide to preserve Layout tiler strides instead of silently degrading to shape tilers
  • Canonicalize stride to 0 for unit-extent modes in logical_divide
  • Layout.call(None) as full-slice identity, matching CuTe's slice(_, layout)

Tensor

  • tensor[:] whole-view full slice, matching the explicit tensor[:, :] behavior
  • Preserve swizzle attribute in slice_and_offset sublayout results

Analysis

  • explain(compose) crash on tuple tilers
  • explain(logical_product) to use cosize(B) for complement bound
  • Move exhaustive introspection helpers (image, is_injective, is_surjective, is_bijective, is_contiguous, functionally_equal) from layouts.py to analysis.py — keeps the core module efficient, O(size) enumeration is opt-in

Visualization

  • Vertical arrangement in draw_swizzle for wide layouts — before/after grids stack top-to-bottom when columns exceed a threshold

Testing

  • CuTe C++ oracle test suite — compiles regression cases directly against installed CUTLASS headers for compose, logical_divide, zipped_divide, tiled_divide, flat_divide, left_inverse, and logical_product; gracefully skips when CUTLASS or a C++ compiler is unavailable
  • Paper examples test suite (arXiv:2603.02298v1) with --draw pytest option for visual output
  • Fix duplicate test name shadowing draw_swizzle coverage

Robustness

  • Reject free coordinates (slices, None) in Tensor.__setitem__ with a clear TypeError guiding users to the slice-then-index pattern

Cleanup

  • Configure Ruff with correct src/tensor_layouts/ paths, add extend-exclude = ["*.ipynb"], fix lint warnings across the codebase

Docs & build
im2col figure and CONV→GEMM mapping clarification in applications notebook
Document shape_div strict scalar divisibility policy — intentional divergence from CuTe C++ ceil_div fallback

Full Changelog: v0.2.0...v0.2.1