Skip to content

Conversation

ClarkChin08
Copy link

@ClarkChin08 ClarkChin08 commented Sep 16, 2025

Problem Description:
In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when calling timer.start() repeatedly in a loop (line 307-313) in examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws :

terminate called after throwing an instance of 'std::runtime_error'
what(): Event is already being recorded.
Aborted (core dumped)

Root Cause:
This occurs because the SYCL event manager checks if an event is already in use (via event.getIndex() != -1 in tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's start/stop events are not properly reset after each measurement in milliseconds(), this prevents event reuse in subsequent start() calls, leading to a runtime error.

Proposed Fix:
Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{} after each measurement in milliseconds(), ensuring getIndex() returns -1 before the next start().

@jiyang1011
Copy link

GPU_timer could be moved into loop body which avoid re-init _stop, _start

@ClarkChin08
Copy link
Author

ClarkChin08 commented Sep 16, 2025

GPU_timer could be moved into loop body which avoid re-init _stop, _start

Moving the GPU_Clock timer inside the loop body is indeed a viable option to avoid the event reuse issue (by creating fresh events with index=-1 each iteration) with simple and local change only affects examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp.

But it avoids the timer reused in other looped scenarios and potentially higher overhead by creating/destroying events (via SYCLTimer constructor/destructor) in each iteration.

We can talk on this.

Copy link

@rolandschulz rolandschulz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't nice. But this is already one big hack and we need to rewrite it. So it doesn't matter too much.

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
Comment on lines 81 to 82
start_ = SyclEvent{};
stop_ = SyclEvent{};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better to call:
syclEventDestroy(start_);
syclEventDestroy(stop_);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And i think it's better to add a "reset" method and call reset if you want to reuse the clock. we shouldn't add unrelated things in method “milliseconds()".

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
@ClarkChin08
Copy link
Author

ClarkChin08 commented Sep 16, 2025

We have decided to move the GPU Clock into the loop body and relocate the #define CUTLASS_SYCL_PROFILING_ENABLED to the top of the file to ensure the macro takes effect for the SYCL timer and Event Manager.

$ export IGC_VectorAliasBBThreshold=10000
*/

#define CUTLASS_SYCL_PROFILING_ENABLED

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this shouldn't be here

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry just noticed you only moved (not added). Why do those two examples have this define? Why is it not handled like in all the other examples where CUTLASS_SYCL_PROFILING_ENABLED gets set by cmake?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is implemented by Codeplay team.
My guess is, the stream-k kernel execution time is short, use GPU time can get more accurate time (use wall time cannot get the benefit of stream-k).

Other examples didn't explicitly use CUTLASS_SYCL_PROFILING_ENABLED. We only turn on it when run benchmarks.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Propose to remove CUTLASS_SYCL_PROFILING_ENABLED from the two examples, it will confuse end users.

It seems a debug code which Codeplay team didn't clean up.
It will not work if we don't move it to the top of the code.
If we moved to the top of the file, it causes the example fail. (the example mis-used timer which we just fixed).

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
@rolandschulz rolandschulz merged commit 5089907 into intel:main Sep 17, 2025
2 of 3 checks passed
leonling-lly pushed a commit to leonling-lly/cutlass-sycl that referenced this pull request Sep 19, 2025
…ded" Error in Loops (intel#511)

**Problem Description:**
In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when
calling timer.start() repeatedly in a loop (line 307-313) in
examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws :

> terminate called after throwing an instance of 'std::runtime_error'
>   what():  Event is already being recorded.
> Aborted (core dumped)

**Root Cause:**
This occurs because the SYCL event manager checks if an event is already
in use (via event.getIndex() != -1 in
tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's
start/stop events are not properly reset after each measurement in
milliseconds(), this prevents event reuse in subsequent start() calls,
leading to a runtime error.

**Proposed Fix:**
Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{}
after each measurement in milliseconds(), ensuring getIndex() returns -1
before the next start().

---------

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
sunjiweiswift pushed a commit to sunjiweiswift/cutlass-sycl that referenced this pull request Sep 22, 2025
…ded" Error in Loops (intel#511)

**Problem Description:**
In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when
calling timer.start() repeatedly in a loop (line 307-313) in
examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws :

> terminate called after throwing an instance of 'std::runtime_error'
>   what():  Event is already being recorded.
> Aborted (core dumped)

**Root Cause:**
This occurs because the SYCL event manager checks if an event is already
in use (via event.getIndex() != -1 in
tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's
start/stop events are not properly reset after each measurement in
milliseconds(), this prevents event reuse in subsequent start() calls,
leading to a runtime error.

**Proposed Fix:**
Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{}
after each measurement in milliseconds(), ensuring getIndex() returns -1
before the next start().

---------

Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants