Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SYCL: Prepare Parallel* for Graphs #6988

Merged
merged 6 commits into from
May 8, 2024

Conversation

masterleinad
Copy link
Contributor

@masterleinad masterleinad commented May 3, 2024

Prerequisite for #6912. The Graphs implementation forces us to make the parallel construct implementations to be copyable. This is what this pull request is doing. In particular,

  • we move all allocations into the execute member functions so that copied objects would not use the same memory
  • move the scratch buffer locks to execute since they are not copyable.

Furthermore, the parallel_reduce TeamPolicy implementation was missing to lock the execution space instance' s m_mutexScratchSpace mutex.

@masterleinad
Copy link
Contributor Author

The SYCL build is passing

@dalg24
Copy link
Member

dalg24 commented May 3, 2024

Please briefing explain where the copy is needed in the graph implementation and why we did not run into this issue with CUDA/HIP.
Also did you consider alternative approaches?

@masterleinad
Copy link
Contributor Author

Please briefing explain where the copy is needed in the graph implementation and why we did not run into this issue with CUDA/HIP.

The errors look somewhat like

In file included from /home/darndt/kokkos_new/core/unit_test/sycl/TestSYCL_Graph.cpp:18:
In file included from /home/darndt/kokkos_new/core/unit_test/TestGraph.hpp:18:
In file included from /home/darndt/kokkos_new/core/src/Kokkos_Graph.hpp:161:
/home/darndt/kokkos_new/core/src/impl/Kokkos_GraphNodeImpl.hpp:147:21: error: call to implicitly-deleted copy constructor of 'Kokkos::Impl::GraphNodeKernelImpl<Kokkos::Experimental::SYCL, Kokkos::RangePolicy<Kokkos::Experimental::SYCL, Kokkos::Impl::IsGraphKernelTag>, Test::CountTestFunctor<Kokkos::Experimental::SYCL>, Kokkos::ParallelForTag>'
  147 |       : base_t(ex), m_kernel((KernelDeduced &&) arg_kernel) {}
      |                     ^        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/darndt/kokkos_new/core/src/impl/Kokkos_GraphNodeImpl.hpp:243:9: note: in instantiation of function template specialization 'Kokkos::Impl::GraphNodeImpl<Kokkos::Experimental::SYCL, Kokkos::Impl::GraphNodeKernelImpl<Kokkos::Experimental::SYCL, Kokkos::RangePolicy<Kokkos::Experimental::SYCL, Kokkos::Impl::IsGraphKernelTag>, Test::CountTestFunctor<Kokkos::Experimental::SYCL>, Kokkos::ParallelForTag>, Kokkos::Experimental::TypeErasedTag>::GraphNodeImpl<Kokkos::Impl::GraphNodeKernelImpl<Kokkos::Experimental::SYCL, Kokkos::RangePolicy<Kokkos::Experimental::SYCL, Kokkos::Impl::IsGraphKernelTag>, Test::CountTestFunctor<Kokkos::Experimental::SYCL>, Kokkos::ParallelForTag>>' requested here
  243 |       : base_t(ex, _graph_node_kernel_ctor_tag{},
[...]

The respective classes in Cuda and HIP are already copyable since we are using different approaches for guarding the scratch allocations.

Also did you consider alternative approaches?

I didn't want to try avoiding the copy in the Graphs implementation (which would likely be quite difficult) and moving the lock and scratch memory allocations to execute seemed like the best solution and consistent with Cuda/HIP.

core/src/SYCL/Kokkos_SYCL_ParallelReduce_Team.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_ParallelFor_MDRange.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_ParallelFor_Range.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_ParallelFor_Team.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_ParallelFor_Team.hpp Outdated Show resolved Hide resolved
core/src/SYCL/Kokkos_SYCL_ParallelFor_Team.hpp Outdated Show resolved Hide resolved
@masterleinad
Copy link
Contributor Author

Thanks for the suggestions!

core/src/SYCL/Kokkos_SYCL_ParallelReduce_Team.hpp Outdated Show resolved Hide resolved
Comment on lines +384 to +385
std::scoped_lock<std::mutex> scratch_buffers_lock(
instance.m_mutexScratchSpace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the deal with that lock?
Are you correcting an oversight/bug?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this looks like an oversight. We need both locks here so that we correctly deal with a TeamPolicy parallel_for or a (MD)RangePolicy parallel_reduce/parallel_scan submitted to the same execution space instance. I should have pointed this out in the pull request description.

Note that this is not necessary for the Graphs implementation but is a result of consistent refactoring. I want to deal with the acquire_team_scratch_space logic in a separate pull request later on (when allowing to discard SYCL events).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the description to mention that change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@masterleinad masterleinad changed the title SYCL: Make Parallel* copyable SYCL: Prepare Parallel* for Graphs May 7, 2024
@dalg24
Copy link
Member

dalg24 commented May 8, 2024

Ignoring

4: [  FAILED  ] 1 test, listed below:
4: [  FAILED  ] openmptarget.exec_space_thread_safety_range_scan
4: 
4:  1 FAILED TEST
 4/49 Test  #4: Kokkos_CoreUnitTest_OpenMPTarget ............***Failed  Error regular expression found in output. Regex=[  FAILED  ]177.40 sec

@dalg24 dalg24 merged commit 50a862c into kokkos:develop May 8, 2024
28 of 29 checks passed
@ndellingwood
Copy link
Contributor

@masterleinad does this merit a changelog entry for 4.4, or is this just internal implementation details for #6912 (which will get an entry)?

@dalg24
Copy link
Member

dalg24 commented May 8, 2024

Entry needed as there is a bug associated being resolved

@masterleinad
Copy link
Contributor Author

@masterleinad does this merit a changelog entry for 4.4, or is this just internal implementation details for #6912 (which will get an entry)?

Added to #6914.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants