Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL MDRangePolicy parallel_reduce #3801

Merged
merged 11 commits into from
Mar 15, 2021

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented Feb 15, 2021

Depends on #3802.

@masterleinad masterleinad changed the title SYCL TeamPolicy scratch memory alternative SYCL MDRangePolicy parallel_reduce Feb 15, 2021
@masterleinad masterleinad force-pushed the sycl_mdrange_reduce branch 3 times, most recently from 11512b4 to 9f02724 Compare February 18, 2021 14:19
@masterleinad
Copy link
Contributor Author

To make independent progress, I dropped the changes to TeamPolicy such that it doesn't depend on #3783 anymore but only on #3802. This also means that I couldn't enable more incremental tests here (before dropping the changes contained in #3783 all incremental tests could be activated) but the other tests activated in this pull request should be sufficient.

@masterleinad masterleinad marked this pull request as ready for review February 18, 2021 14:22
@masterleinad
Copy link
Contributor Author

It might be possible to combine the implementation a little better to avoid code duplication but I would prefer to do that in a follow-up pull request possibly after implementing the outer parallel_reduce for TeamPolicy.

@masterleinad
Copy link
Contributor Author

Retest this please.

@masterleinad
Copy link
Contributor Author

Retest this please.

Copy link
Member

@crtrott crtrott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be better to split the kernel but maybe not ...


const BarePolicy bare_policy = m_policy;

cgh.parallel_for(range, [=](sycl::nd_item<1> item) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may wanna do this in two different parallel for calls instead of having the if(first_run ) {} else {} inside the kernel. Effectively you are paying the full register cost for every subsequent call too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was also thinking about splitting it but decided to rather try to keep it similar to the other SYCL parallel_reduce implementations for now.
I prefer optimizing and maybe fusing the parallel_reduce implementations some more once they are all merged.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@masterleinad masterleinad added this to the Tentative 3.4 Release milestone Mar 9, 2021
@masterleinad masterleinad added the Blocks Promotion Overview issue for release-blocking bugs label Mar 11, 2021
@crtrott crtrott merged commit 26c67eb into kokkos:develop Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocks Promotion Overview issue for release-blocking bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants