-
Notifications
You must be signed in to change notification settings - Fork 407
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parallel_scan for SYCL #3577
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review
|
||
/*TEST( TEST_CATEGORY, scan_small ) | ||
{ | ||
using TestScanFunctor = | ||
TestScan< TEST_EXECSPACE, Kokkos::Impl::ThreadsExecUseScanSmall >; | ||
|
||
for ( int i = 0; i < 1000; ++i ) { | ||
TestScanFunctor( 10 ); | ||
TestScanFunctor( 10000 ); | ||
} | ||
TestScanFunctor( 1000000 ); | ||
TestScanFunctor( 10000000 ); | ||
|
||
TEST_EXECSPACE().fence(); | ||
}*/ | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -47,7 +47,7 @@ | |||
|
|||
namespace Test { | |||
|
|||
template <class Device, class WorkSpec = size_t> | |||
template <class Device> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The trailing template parameter seems to have been there since the addition of the unit test. I wouldn't necessarily have fixed but I cannot see a good reason not to remove.
573c862
to
e9c33f3
Compare
The failing test says
This looks spurious. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok, I want some comments about moving the allocations for the buffers. Also a comment about the tradeoff for memory usage here. Also the tradeoff in kernel launches is probably not good in the end. For a Scan over 2M elements we are launching like 14 kernels compared to 2 in CUDA.
Actually I am ok if we just note down all my issues in a github issue and go from there. |
Retest this please |
|
||
public: | ||
template <typename PostFunctor> | ||
void impl_execute(PostFunctor&& post_functor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use a forwarding reference here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When not forwarding, const &
is actually better, because there is a corner case if the functor has both operator()() const
and operator()()
, how you pass it determines which one gets called, which is really subtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed in 8389c54
(#3577).
|
||
public: | ||
template <typename PostFunctor> | ||
void impl_execute(PostFunctor&& post_functor) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When not forwarding, const &
is actually better, because there is a corner case if the functor has both operator()() const
and operator()()
, how you pass it determines which one gets called, which is really subtle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM please cleanup history before we merge
8389c54
to
5739f31
Compare
@dalg24 Here you go! |
Retest this please |
Retest this please. |
Retest this please. |
1 similar comment
Retest this please. |
So far only the simple case where we only use
operator+
and no arrays is tested.Not yet tested on an actual GPU.