Skip to content

Conversation

petercad
Copy link

@petercad petercad commented Aug 19, 2025

This PR introduces a new architecture for Xe CuTe atoms (CUTLASS-level changes to come later).

Current status:

  • MMA atoms:
  • 2D copy atoms:
    • loads (regular/VNNI/transpose)
    • stores
    • prefetch
  • 2D copy helpers:
    • low-level (make_block_2d_copy)
    • high-level (make_block_2d_copy_{A,B,C}).
  • Reorders:
    • generic
    • optimized reorder atoms for {int4, uint4, int8, uint8, e2m1, e4m3, e5m2} -> {half, bfloat16}.

Link to rendered documentation here.

Note

This branch requires a very recent IGC version — ci-comp_igc-30311 or later. This IGC has important bug fixes/improvements to inline vISA needed to properly implement the new atoms.

@petercad petercad force-pushed the petercad/rearchitecture branch from ca42282 to f5cf22c Compare August 20, 2025 14:49
@petercad petercad force-pushed the petercad/rearchitecture branch from 1389b60 to 42fa72b Compare August 26, 2025 23:07
@petercad petercad changed the title [WIP] Xe rearchitecture Xe rearchitecture Sep 4, 2025
@petercad
Copy link
Author

petercad commented Sep 12, 2025

@rolandschulz I will address the merge conflicts once review is done, to avoid rebasing.

@yuankuns
Copy link

yuankuns commented Sep 15, 2025

Hi @petercad , will this PR support ST_T?

@petercad
Copy link
Author

Hi @petercad , will this PR support ST_T?

No, since block 2D store messages don't support transposition. But, since these kinds of stores are occasionally useful, it might be a good idea to introduce an emulated transpose store operation, using D32 scattered writes. If you have some specific use cases, let me know.

@yuankuns
Copy link

Hi @petercad , will this PR support ST_T?

No, since block 2D store messages don't support transposition. But, since these kinds of stores are occasionally useful, it might be a good idea to introduce an emulated transpose store operation, using D32 scattered writes. If you have some specific use cases, let me know.

Yes there is a case in sdpa backward where dV=PtdQ might be calculated as dVt=dQtP. For dVt, it requires transpose write. For now, I just go dV=Pt*dQ for simplicity/delivery.

@petercad petercad force-pushed the petercad/rearchitecture branch from a82c1c2 to 97d7f82 Compare September 15, 2025 23:13
@petercad petercad force-pushed the petercad/rearchitecture branch from 97d7f82 to 910288c Compare September 16, 2025 00:10
@petercad petercad force-pushed the petercad/rearchitecture branch from 910288c to be699d0 Compare September 16, 2025 01:50
@rolandschulz
Copy link

Do you think the commit history is useful or do we not lose anything if we squash?

@anamikac-intel
Copy link

@petercad - there are couple of error I am seeing while running xe_gemm:

  1. Got error from this using SignedAccType = ensure_signed_t; --> no template named 'ensure_signed_t' defined , I checked cute/util/type_traits it has 'make_signed_t' which worked but not ensure_signed_t

  2. Also in void test_case(sycl::queue &Q, int m, int n, int k) there are some undefined identifiers like make_shared_usm_tensor, type_str, free_usm_tensor which I assume is custom and I could not find it anywhere in code defined.

@petercad petercad force-pushed the petercad/rearchitecture branch from be699d0 to 8ddf42b Compare September 16, 2025 13:45
@petercad
Copy link
Author

@petercad - there are couple of error I am seeing while running xe_gemm:

  1. Got error from this using SignedAccType = ensure_signed_t; --> no template named 'ensure_signed_t' defined , I checked cute/util/type_traits it has 'make_signed_t' which worked but not ensure_signed_t
  2. Also in void test_case(sycl::queue &Q, int m, int n, int k) there are some undefined identifiers like make_shared_usm_tensor, type_str, free_usm_tensor which I assume is custom and I could not find it anywhere in code defined.

These are all in sycl_cute_common.hpp, which is pushed now.

@petercad
Copy link
Author

Do you think the commit history is useful or do we not lose anything if we squash?

I personally like the more specific commit messages when looking back through Git history (e.g. blaming) when there are logically independent parts.

@petercad petercad force-pushed the petercad/rearchitecture branch from 8ddf42b to 3c0bc23 Compare September 16, 2025 14:44
@rolandschulz rolandschulz merged commit 0f13630 into main Sep 16, 2025
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants