-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SYCL: Prepare Parallel* for Graphs #6988
Conversation
The SYCL build is passing |
Please briefing explain where the copy is needed in the graph implementation and why we did not run into this issue with CUDA/HIP. |
The errors look somewhat like
The respective classes in
I didn't want to try avoiding the copy in the Graphs implementation (which would likely be quite difficult) and moving the lock and scratch memory allocations to |
7275246
to
5d9cf84
Compare
3996a6e
to
3c8f121
Compare
Thanks for the suggestions! |
std::scoped_lock<std::mutex> scratch_buffers_lock( | ||
instance.m_mutexScratchSpace); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the deal with that lock?
Are you correcting an oversight/bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this looks like an oversight. We need both locks here so that we correctly deal with a TeamPolicy parallel_for or a (MD)RangePolicy parallel_reduce/parallel_scan submitted to the same execution space instance. I should have pointed this out in the pull request description.
Note that this is not necessary for the Graphs implementation but is a result of consistent refactoring. I want to deal with the acquire_team_scratch_space
logic in a separate pull request later on (when allowing to discard SYCL events).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the description to mention that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Ignoring
|
@masterleinad does this merit a changelog entry for 4.4, or is this just internal implementation details for #6912 (which will get an entry)? |
Entry needed as there is a bug associated being resolved |
Added to #6914. |
Prerequisite for #6912. The
Graphs
implementation forces us to make the parallel construct implementations to be copyable. This is what this pull request is doing. In particular,execute
member functions so that copied objects would not use the same memoryexecute
since they are not copyable.Furthermore, the
parallel_reduce
TeamPolicy
implementation was missing to lock the execution space instance' sm_mutexScratchSpace
mutex.