Skip to content

Conversation

@LeiWang1999
Copy link
Contributor

@LeiWang1999 LeiWang1999 commented Nov 24, 2024

This pull request includes several changes to the bitblas library, focusing on improving the swizzle layout functionality and adding new methods for matrix operations. The most important changes include updating the swizzle layout methods, adding new layout transformation functions, and enhancing the matrix core intrinsic emitter.

Swizzle Layout Updates:

New Layout Transformation Functions:

  • bitblas/tl/mfma_layout.py: Added new functions for shared memory to local memory layout transformations, such as thread_id_shared_access_64x8_to_16x32_layout_A and shared_16x32_to_local_64x8_layout_A.
  • bitblas/tl/mfma_macro_generator.py: Incorporated new layout transformation functions into the MatrixCoreIntrinEmitter class and updated the get_ldmatrix_index_map method to support new layouts. [1] [2]

Enhancements to Matrix Core Intrinsic Emitter:

These changes improve the flexibility and performance of the matrix operations within the bitblas library by optimizing memory layout transformations and enhancing the matrix core intrinsic emitter.

TODO Items

  • Warp with Block Primitives
  • Block Level Test Case
  • Documentation for this optimizations

@LeiWang1999 LeiWang1999 merged commit 6f9c6ed into microsoft:main Nov 27, 2024
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant