Skip to content

Conversation

adityachatter
Copy link

[In progress]

  • Adds FP8 support in Chunk Prefill kernel

Changes are stacked over Chunk Prefill pull request: #498

Run test code as:

ninja 06_bmg_chunk_prefill_fp8_hdim128
./examples/06_bmg_flash_attention/06_bmg_chunk_prefill_fp8_hdim128

TODO:

  • Fix FP8 performance bottleneck
  • Enable FP8 accuracy check with reference

Valentine233 and others added 16 commits September 17, 2025 17:15
The goal is to make warnings throw error during the building process.

For this, all compilation warnings are to be handled, and any
non-serious unavoidable warnings have to be suppressed.

---------

Co-authored-by: Joy, Albin <albin.joy@intel.com>
…ded" Error in Loops (intel#511)

**Problem Description:**
In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when
calling timer.start() repeatedly in a loop (line 307-313) in
examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws :

> terminate called after throwing an instance of 'std::runtime_error'
>   what():  Event is already being recorded.
> Aborted (core dumped)

**Root Cause:**
This occurs because the SYCL event manager checks if an event is already
in use (via event.getIndex() != -1 in
tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's
start/stop events are not properly reset after each measurement in
milliseconds(), this prevents event reuse in subsequent start() calls,
leading to a runtime error.

**Proposed Fix:**
Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{}
after each measurement in milliseconds(), ensuring getIndex() returns -1
before the next start().

---------

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
…ation in SYCL (intel#515)

1. Generates random data on the host and copies it to the device via
syclcompat::memcpy, improving code reuse and supporting a broader range
of element types.
2. An optional bits parameter to control fractional precision, which was
not present in the original.

---------

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
This change imports `SYCLCompat` to cutlass-sycl repo as `compat`.
Previous dependencies on `syclcompat` are changed to `compat`. 
This PR also fix some failures of `SYCLCompat` in oneapi 2025.2.

---------

Co-authored-by: Roland Schulz <roland.schulz@intel.com>
Signed-off-by: Aditya Chatterjee <Aditya.Chatterjee@intel.com>
Signed-off-by: Aditya Chatterjee <Aditya.Chatterjee@intel.com>
Signed-off-by: Aditya Chatterjee <Aditya.Chatterjee@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants