Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow initializing SYCL execution space from sycl::queue and SYCL::impl_static_fence #3767

Merged
merged 6 commits into from
Feb 12, 2021

Conversation

masterleinad
Copy link
Contributor

Based on #3759.

This pull request allows initializing SYCL execution space from a sycl::queue corresponding to streams for CUDA and SYCL. It also provides a valid implementation for impl_static_fence that simply fences all queues.
I was running into some issues with the current TeamPolicy implementation but didn't bother to look too closely since the alternative implementation in #3759 doesn't show this problem and I believe that we should move forward with that one.
Also, we need to make sure that all memory we allocate using sycl::allocate_* references the correct sycl::queue. Memory access across queues doesn't work with this kind of allocation. Thirs turned out to be a problem in the parallel_reduce implementation.

@@ -55,6 +55,8 @@ namespace Impl {

int SYCLInternal::was_finalized = 0;

std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues;
Copy link
Contributor

@nliber nliber Feb 1, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just a vector<queue>?

m_queue is an optional<queue> to get around vendor bugs when assigning one queue to another. As long as we aren't inserting/emplacing/erasing from the beginning or middle of the vector or assigning to elements in the vector, we won't run into that issue, and it won't be a concern at all once the vendor bugs are fixed.

Also, is m_queue going to eventually go away (I assume so but an checking here)? (Even now it could go away as it should always be equivalent to all_queues.back().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was running into the same (or what looked like the same) vendor bug and storing pointers seemed to be the only way to make it work, see db9dc39 (#3767) for the fixing commit and https://cloud.cees.ornl.gov/jenkins-ci/blue/organizations/jenkins/Kokkos/detail/Kokkos/4186/pipeline for the error when just using std::vector<sycl::queue>.

Also, is m_queue going to eventually go away (I assume so but an checking here)? (Even now it could go away as it should always be equivalent to all_queues.back().

What do you mean? We still must have access to the queue corresponding to the current instance so I don't see a way to eliminate it. Or am I misunderstanding?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I read lines 119 and 120, m_queue and all_queues.back() always refer to the same queue, don't they?

That's only true for the last instance that adds a queue to the static variable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. I didn't realize all_queues was a static member.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm that std::vector<std::optional<sycl::queue>> does not work

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to capture this discussion

core/src/SYCL/Kokkos_SYCL.cpp Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

The last commit provides a remedy for the inaccessibility of memory across multiple queues. We just need to make sure that all queues use the same sycl::context. Hence, I added a member function that returns the sycl::context associated to the default queue that can be used when initializing additional queues.

@masterleinad masterleinad added the Blocks Promotion Overview issue for release-blocking bugs label Feb 3, 2021
core/src/Kokkos_SYCL.hpp Outdated Show resolved Hide resolved
@@ -55,6 +55,8 @@ namespace Impl {

int SYCLInternal::was_finalized = 0;

std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please confirm that std::vector<std::optional<sycl::queue>> does not work

core/src/SYCL/Kokkos_SYCL.cpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

Please confirm that std::vector<std::optionalsycl::queue> does not work

PI CUDA ERROR:
	Value:           4
	Name:            terminate called after throwing an instance of 'cl::sycl::runtime_error'
  what():  OpenCL API failed. OpenCL API returns: -999 (Unknown OpenCL error code) -999 (Unknown OpenCL error code)
Aborted (core dumped)

@masterleinad
Copy link
Contributor Author

Retest this please.

@dalg24 dalg24 requested a review from nliber February 5, 2021 14:57
Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks ok but need to protect the pushback into all_queues. Also all_queues will strictly grow forever. would maybe a set or soemthing where we can delete again from be better? Or maybe a counter on how many are empty and if more than 20% are empty the vector gets condensed and then shrunk? I know that sucks a bit because it means the impl_global_fence then also needs to use the lock.

};
m_queue.emplace(d, exception_handler);
m_queue = q;
all_queues.push_back(&m_queue);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we made the counter thread safe, i.e. we try to support multi threaded environments, but here we use push_back on a global? Lets protect this with a global mutex or so. Maybe have a static member function to insert into all_queues.

@masterleinad
Copy link
Contributor Author

I guarded access to all_queues with a std::mutex now. Assuming that we traverse all_queues (using impl_static_fence()) much more often than pushing into it or removing elements, I left it as a std::vector.
Calling SYCLInternal::finalize now removes the respective sycl::queue from all_queues.

@@ -55,6 +55,8 @@ namespace Impl {

int SYCLInternal::was_finalized = 0;

std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake. I didn't realize all_queues was a static member.

core/src/Kokkos_SYCL.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL.cpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL.cpp Outdated Show resolved Hide resolved
core/src/Kokkos_SYCL.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Instance.cpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Instance.cpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL.cpp Outdated Show resolved Hide resolved
core/src/Kokkos_SYCL.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Parallel_Reduce.hpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

@nliber I believe I addressed all your comments.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad masterleinad force-pushed the sycl_static_fence branch 2 times, most recently from 8c7376f to e377894 Compare February 10, 2021 13:55
core/src/SYCL/Kokkos_SYCL.cpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Instance.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Parallel_Reduce.hpp Outdated Show resolved Hide resolved
@@ -55,6 +55,8 @@ namespace Impl {

int SYCLInternal::was_finalized = 0;

std::vector<std::optional<sycl::queue>*> SYCLInternal::all_queues;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment to capture this discussion

core/src/SYCL/Kokkos_SYCL_Instance.cpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

@dalg24 I think I addressed all your comments.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@@ -75,6 +75,12 @@ class SYCLInternal {

std::optional<sycl::queue> m_queue;

// Using std::vector<std::optional<sycl::queue>> reveals a compiler bug when
// compiling for the CUDA backend. Storing pointers instead works around this.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where did we report this bug? @nliber

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know that we have (where do SYCL+CUDA bugs get reported)?

@masterleinad
Copy link
Contributor Author

Retest this please.

@dalg24
Copy link
Member

dalg24 commented Feb 11, 2021

Plz cleanup history

@masterleinad
Copy link
Contributor Author

Plz cleanup history

@dalg24 Done.

@dalg24
Copy link
Member

dalg24 commented Feb 12, 2021

Retest this please

@dalg24 dalg24 merged commit efd8560 into kokkos:develop Feb 12, 2021
@masterleinad masterleinad deleted the sycl_static_fence branch February 12, 2021 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants