Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallel_scan for SYCL #3577

Merged
merged 2 commits into from
Nov 20, 2020
Merged

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Nov 6, 2020

So far only the simple case where we only use operator+ and no arrays is tested.
Not yet tested on an actual GPU.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review

Comment on lines -144 to -159

/*TEST( TEST_CATEGORY, scan_small )
{
using TestScanFunctor =
TestScan< TEST_EXECSPACE, Kokkos::Impl::ThreadsExecUseScanSmall >;

for ( int i = 0; i < 1000; ++i ) {
TestScanFunctor( 10 );
TestScanFunctor( 10000 );
}
TestScanFunctor( 1000000 );
TestScanFunctor( 10000000 );

TEST_EXECSPACE().fence();
}*/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@crtrott moved this in c938e09 and it had been added in 71c69ee (already commented out with a preprocessor directive) and as far as I can tell there is no comment so I am fine with your removing it.

core/unit_test/TestScan.hpp Show resolved Hide resolved
@@ -47,7 +47,7 @@

namespace Test {

template <class Device, class WorkSpec = size_t>
template <class Device>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The trailing template parameter seems to have been there since the addition of the unit test. I wouldn't necessarily have fixed but I cannot see a good reason not to remove.

core/src/SYCL/Kokkos_SYCL_Parallel_Scan.hpp Outdated Show resolved Hide resolved
@masterleinad masterleinad marked this pull request as ready for review November 10, 2020 22:11
@masterleinad
Copy link
Contributor Author

The failing test says

/var/jenkins/workspace/Kokkos/core/perf_test/PerfTest_ExecSpacePartitioning.cpp:381: Failure
Value of: (time_end > 1.5 * time_overlap)
  Actual: false
Expected: true
[  FAILED  ] default_exec.overlap_mdrange_policy (62 ms)

This looks spurious.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks ok, I want some comments about moving the allocations for the buffers. Also a comment about the tradeoff for memory usage here. Also the tradeoff in kernel launches is probably not good in the end. For a Scan over 2M elements we are launching like 14 kernels compared to 2 in CUDA.

core/src/SYCL/Kokkos_SYCL_Parallel_Scan.hpp Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_Parallel_Scan.hpp Show resolved Hide resolved
@crtrott
Copy link
Member

crtrott commented Nov 12, 2020

Actually I am ok if we just note down all my issues in a github issue and go from there.

@crtrott
Copy link
Member

crtrott commented Nov 13, 2020

Retest this please


public:
template <typename PostFunctor>
void impl_execute(PostFunctor&& post_functor) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use a forwarding reference here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When not forwarding, const & is actually better, because there is a corner case if the functor has both operator()() const and operator()(), how you pass it determines which one gets called, which is really subtle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 8389c54 (#3577).

nliber
nliber previously requested changes Nov 13, 2020

public:
template <typename PostFunctor>
void impl_execute(PostFunctor&& post_functor) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When not forwarding, const & is actually better, because there is a corner case if the functor has both operator()() const and operator()(), how you pass it determines which one gets called, which is really subtle.

Copy link
Member

@dalg24 dalg24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM please cleanup history before we merge

@dalg24 dalg24 dismissed nliber’s stale review November 18, 2020 16:53

Change request has been addressed

@masterleinad
Copy link
Contributor Author

@dalg24 Here you go!

@dalg24
Copy link
Member

dalg24 commented Nov 19, 2020

Retest this please

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad masterleinad added this to the Tentative 3.3 Release milestone Nov 19, 2020
@masterleinad
Copy link
Contributor Author

Retest this please.

1 similar comment
@masterleinad
Copy link
Contributor Author

Retest this please.

@dalg24 dalg24 merged commit 7b74bdc into kokkos:develop Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants