-
Notifications
You must be signed in to change notification settings - Fork 56
Fix GPU_Clock Timer Reuse Issue Causing "Event is Already Being Recorded" Error in Loops #511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
GPU_timer could be moved into loop body which avoid re-init _stop, _start |
Moving the GPU_Clock timer inside the loop body is indeed a viable option to avoid the event reuse issue (by creating fresh events with index=-1 each iteration) with simple and local change only affects examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp. But it avoids the timer reused in other looped scenarios and potentially higher overhead by creating/destroying events (via SYCLTimer constructor/destructor) in each iteration. We can talk on this. |
324127c
to
9c4bd8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't nice. But this is already one big hack and we need to rewrite it. So it doesn't matter too much.
Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
9c4bd8f
to
73df662
Compare
start_ = SyclEvent{}; | ||
stop_ = SyclEvent{}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
better to call:
syclEventDestroy(start_);
syclEventDestroy(stop_);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And i think it's better to add a "reset" method and call reset if you want to reuse the clock. we shouldn't add unrelated things in method “milliseconds()".
Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
We have decided to move the GPU Clock into the loop body and relocate the #define CUTLASS_SYCL_PROFILING_ENABLED to the top of the file to ensure the macro takes effect for the SYCL timer and Event Manager. |
$ export IGC_VectorAliasBBThreshold=10000 | ||
*/ | ||
|
||
#define CUTLASS_SYCL_PROFILING_ENABLED |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this shouldn't be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry just noticed you only moved (not added). Why do those two examples have this define? Why is it not handled like in all the other examples where CUTLASS_SYCL_PROFILING_ENABLED gets set by cmake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is implemented by Codeplay team.
My guess is, the stream-k kernel execution time is short, use GPU time can get more accurate time (use wall time cannot get the benefit of stream-k).
Other examples didn't explicitly use CUTLASS_SYCL_PROFILING_ENABLED. We only turn on it when run benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Propose to remove CUTLASS_SYCL_PROFILING_ENABLED from the two examples, it will confuse end users.
It seems a debug code which Codeplay team didn't clean up.
It will not work if we don't move it to the top of the code.
If we moved to the top of the file, it causes the example fail. (the example mis-used timer which we just fixed).
Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
…ded" Error in Loops (intel#511) **Problem Description:** In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when calling timer.start() repeatedly in a loop (line 307-313) in examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws : > terminate called after throwing an instance of 'std::runtime_error' > what(): Event is already being recorded. > Aborted (core dumped) **Root Cause:** This occurs because the SYCL event manager checks if an event is already in use (via event.getIndex() != -1 in tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's start/stop events are not properly reset after each measurement in milliseconds(), this prevents event reuse in subsequent start() calls, leading to a runtime error. **Proposed Fix:** Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{} after each measurement in milliseconds(), ensuring getIndex() returns -1 before the next start(). --------- Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
…ded" Error in Loops (intel#511) **Problem Description:** In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when calling timer.start() repeatedly in a loop (line 307-313) in examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws : > terminate called after throwing an instance of 'std::runtime_error' > what(): Event is already being recorded. > Aborted (core dumped) **Root Cause:** This occurs because the SYCL event manager checks if an event is already in use (via event.getIndex() != -1 in tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's start/stop events are not properly reset after each measurement in milliseconds(), this prevents event reuse in subsequent start() calls, leading to a runtime error. **Proposed Fix:** Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{} after each measurement in milliseconds(), ensuring getIndex() returns -1 before the next start(). --------- Signed-off-by: Chen, Xi2 <xi2.chen@intel.com>
Problem Description:
In the SYCL profiling mode (CUTLASS_SYCL_PROFILING_ENABLED), when calling timer.start() repeatedly in a loop (line 307-313) in examples/03_bmg_gemm_streamk/03_bmg_gemm_streamk.cpp, the code throws :
Root Cause:
This occurs because the SYCL event manager checks if an event is already in use (via event.getIndex() != -1 in tools/util/include/cutlass/util/sycl_event_manager.hpp), and the timer's start/stop events are not properly reset after each measurement in milliseconds(), this prevents event reuse in subsequent start() calls, leading to a runtime error.
Proposed Fix:
Update sycl_timer.hpp to resets start_ and stop_ to default SyclEvent{} after each measurement in milliseconds(), ensuring getIndex() returns -1 before the next start().