Fix GPU_Clock Timer Reuse Issue Causing "Event is Already Being Recorded" Error in Loops #511

ClarkChin08 · 2025-09-16T02:32:56Z

Problem Description:
In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when calling timer.start() repeatedly in a loop (line 307-313) in examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws :

terminate called after throwing an instance of 'std::runtime_error'
what(): Event is already being recorded.
Aborted (core dumped)

Root Cause:
This occurs because the SYCL event manager checks if an event is already in use (via event.getIndex() != -1 in tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's start/stop events are not properly reset after each measurement in milliseconds(), this prevents event reuse in subsequent start() calls, leading to a runtime error.

Proposed Fix:
Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{} after each measurement in milliseconds(), ensuring getIndex() returns -1 before the next start().

jiyang1011 · 2025-09-16T02:49:17Z

GPU_timer could be moved into loop body which avoid re-init _stop, _start

ClarkChin08 · 2025-09-16T03:13:30Z

GPU_timer could be moved into loop body which avoid re-init _stop, _start

Moving the GPU_Clock timer inside the loop body is indeed a viable option to avoid the event reuse issue (by creating fresh events with index=-1 each iteration) with simple and local change only affects examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp.

But it avoids the timer reused in other looped scenarios and potentially higher overhead by creating/destroying events (via SYCLTimer constructor/destructor) in each iteration.

We can talk on this.

rolandschulz

This isn't nice. But this is already one big hack and we need to rewrite it. So it doesn't matter too much.

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>

taozha2 · 2025-09-16T05:24:15Z

tools/util/include/cutlass/util/sycl_timer.hpp

+    start_ = SyclEvent{};
+    stop_ = SyclEvent{};


better to call:
syclEventDestroy(start_);
syclEventDestroy(stop_);

And i think it's better to add a "reset" method and call reset if you want to reuse the clock. we shouldn't add unrelated things in method “milliseconds()".

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>

ClarkChin08 · 2025-09-16T07:59:03Z

We have decided to move the GPU Clock into the loop body and relocate the #define CUTLASS_SYCL_PROFILING_ENABLED to the top of the file to ensure the macro takes effect for the SYCL timer and Event Manager.

rolandschulz · 2025-09-17T00:41:47Z

examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp

      $ export IGC_VectorAliasBBThreshold=10000
 */
-
+#define CUTLASS_SYCL_PROFILING_ENABLED


this shouldn't be here

sorry just noticed you only moved (not added). Why do those two examples have this define? Why is it not handled like in all the other examples where CUTLASS_SYCL_PROFILING_ENABLED gets set by cmake?

This code is implemented by Codeplay team.
My guess is, the stream-k kernel execution time is short, use GPU time can get more accurate time (use wall time cannot get the benefit of stream-k).

Other examples didn't explicitly use CUTLASS_SYCL_PROFILING_ENABLED. We only turn on it when run benchmarks.

Propose to remove CUTLASS_SYCL_PROFILING_ENABLED from the two examples, it will confuse end users.

It seems a debug code which Codeplay team didn't clean up.
It will not work if we don't move it to the top of the code.
If we moved to the top of the file, it causes the example fail. (the example mis-used timer which we just fixed).

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>

…ded" Error in Loops (intel#511) **Problem Description:** In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when calling timer.start() repeatedly in a loop (line 307-313) in examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws : > terminate called after throwing an instance of 'std::runtime_error' > what(): Event is already being recorded. > Aborted (core dumped) **Root Cause:** This occurs because the SYCL event manager checks if an event is already in use (via event.getIndex() != -1 in tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's start/stop events are not properly reset after each measurement in milliseconds(), this prevents event reuse in subsequent start() calls, leading to a runtime error. **Proposed Fix:** Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{} after each measurement in milliseconds(), ensuring getIndex() returns -1 before the next start(). --------- Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>

ClarkChin08 force-pushed the fix_timer branch from 324127c to 9c4bd8f Compare September 16, 2025 03:46

rolandschulz approved these changes Sep 16, 2025

View reviewed changes

fix timer reuse when CUTLASS_SYCL_PROFILING_ENABLED

73df662

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>

ClarkChin08 force-pushed the fix_timer branch from 9c4bd8f to 73df662 Compare September 16, 2025 03:55

taozha2 reviewed Sep 16, 2025

View reviewed changes

fix the GPU Clock timer

ff9e09f

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>

taozha2 approved these changes Sep 16, 2025

View reviewed changes

tdeng5 approved these changes Sep 16, 2025

View reviewed changes

rolandschulz reviewed Sep 17, 2025

View reviewed changes

remove the macro from examples 03 and 04

2ceb9aa

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>

rolandschulz approved these changes Sep 17, 2025

View reviewed changes

rolandschulz merged commit 5089907 into intel:main Sep 17, 2025
2 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix GPU_Clock Timer Reuse Issue Causing "Event is Already Being Recorded" Error in Loops #511

Fix GPU_Clock Timer Reuse Issue Causing "Event is Already Being Recorded" Error in Loops #511

Uh oh!

ClarkChin08 commented Sep 16, 2025 •

edited

Loading

Uh oh!

jiyang1011 commented Sep 16, 2025

Uh oh!

ClarkChin08 commented Sep 16, 2025 •

edited

Loading

Uh oh!

rolandschulz left a comment

Uh oh!

taozha2 Sep 16, 2025

Uh oh!

taozha2 Sep 16, 2025

Uh oh!

ClarkChin08 commented Sep 16, 2025 •

edited

Loading

Uh oh!

rolandschulz Sep 17, 2025

Uh oh!

rolandschulz Sep 17, 2025

Uh oh!

tdeng5 Sep 17, 2025

Uh oh!

tdeng5 Sep 17, 2025

Uh oh!

Uh oh!

Uh oh!

Fix GPU_Clock Timer Reuse Issue Causing "Event is Already Being Recorded" Error in Loops #511

Fix GPU_Clock Timer Reuse Issue Causing "Event is Already Being Recorded" Error in Loops #511

Uh oh!

Conversation

ClarkChin08 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jiyang1011 commented Sep 16, 2025

Uh oh!

ClarkChin08 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolandschulz left a comment

Choose a reason for hiding this comment

Uh oh!

taozha2 Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

taozha2 Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

ClarkChin08 commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolandschulz Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

rolandschulz Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

tdeng5 Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

tdeng5 Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ClarkChin08 commented Sep 16, 2025 •

edited

Loading

ClarkChin08 commented Sep 16, 2025 •

edited

Loading

ClarkChin08 commented Sep 16, 2025 •

edited

Loading