-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow initializing SYCL execution space from sycl::queue and SYCL::impl_static_fence #3767
Conversation
@@ -55,6 +55,8 @@ namespace Impl { | |||
|
|||
int SYCLInternal::was_finalized = 0; | |||
|
|||
std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just a vector<queue>
?
m_queue
is an optional<queue>
to get around vendor bugs when assigning one queue
to another. As long as we aren't inserting/emplacing/erasing from the beginning or middle of the vector
or assigning to elements in the vector
, we won't run into that issue, and it won't be a concern at all once the vendor bugs are fixed.
Also, is m_queue
going to eventually go away (I assume so but an checking here)? (Even now it could go away as it should always be equivalent to all_queues.back()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was running into the same (or what looked like the same) vendor bug and storing pointers seemed to be the only way to make it work, see db9dc39
(#3767) for the fixing commit and https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/4186/pipeline for the error when just using std::vector<sycl::queue>
.
Also, is m_queue going to eventually go away (I assume so but an checking here)? (Even now it could go away as it should always be equivalent to all_queues.back().
What do you mean? We still must have access to the queue
corresponding to the current instance so I don't see a way to eliminate it. Or am I misunderstanding?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I read lines 119 and 120, m_queue and all_queues.back() always refer to the same queue, don't they?
That's only true for the last instance that adds a queue
to the static
variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake. I didn't realize all_queues was a static member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm that std::vector<std::optional<sycl::queue>>
does not work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to capture this discussion
The last commit provides a remedy for the inaccessibility of memory across multiple |
d9ce787
to
5b72390
Compare
35feba8
to
ae70fdb
Compare
@@ -55,6 +55,8 @@ namespace Impl { | |||
|
|||
int SYCLInternal::was_finalized = 0; | |||
|
|||
std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please confirm that std::vector<std::optional<sycl::queue>>
does not work
|
a75fab3
to
fd0faec
Compare
Retest this please. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks ok but need to protect the pushback into all_queues. Also all_queues will strictly grow forever. would maybe a set or soemthing where we can delete again from be better? Or maybe a counter on how many are empty and if more than 20% are empty the vector gets condensed and then shrunk? I know that sucks a bit because it means the impl_global_fence then also needs to use the lock.
}; | ||
m_queue.emplace(d, exception_handler); | ||
m_queue = q; | ||
all_queues.push_back(&m_queue); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we made the counter thread safe, i.e. we try to support multi threaded environments, but here we use push_back on a global? Lets protect this with a global mutex or so. Maybe have a static member function to insert into all_queues.
I guarded access to |
@@ -55,6 +55,8 @@ namespace Impl { | |||
|
|||
int SYCLInternal::was_finalized = 0; | |||
|
|||
std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake. I didn't realize all_queues was a static member.
7fc8d01
to
4fd5d82
Compare
@nliber I believe I addressed all your comments. |
Retest this please. |
8c7376f
to
e377894
Compare
@@ -55,6 +55,8 @@ namespace Impl { | |||
|
|||
int SYCLInternal::was_finalized = 0; | |||
|
|||
std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to capture this discussion
e61205b
to
0c8fb50
Compare
@dalg24 I think I addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good
@@ -75,6 +75,12 @@ class SYCLInternal { | |||
|
|||
std::optional<sycl::queue> m_queue; | |||
|
|||
// Using std::vector<std::optional<sycl::queue>> reveals a compiler bug when | |||
// compiling for the CUDA backend. Storing pointers instead works around this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did we report this bug? @nliber
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that we have (where do SYCL+CUDA bugs get reported)?
Retest this please. |
Plz cleanup history |
2d4f340
to
65d270e
Compare
@dalg24 Done. |
Retest this please |
Based on #3759.
This pull request allows initializing SYCL execution space from a sycl::queue corresponding to streams for CUDA and SYCL. It also provides a valid implementation for
impl_static_fence
that simply fences all queues.I was running into some issues with the current TeamPolicy implementation but didn't bother to look too closely since the alternative implementation in #3759 doesn't show this problem and I believe that we should move forward with that one.
Also, we need to make sure that all memory we allocate using
sycl::allocate_*
references the correctsycl::queue
. Memory access across queues doesn't work with this kind of allocation. Thirs turned out to be a problem in theparallel_reduce
implementation.