Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove local (in-group) atomic usage from __parallel_find_or (V1) #1668

Merged

Conversation

SergeyKopienko
Copy link
Contributor

@SergeyKopienko SergeyKopienko commented Jul 5, 2024

In this PR we remove local (in-group) atomic usage from __parallel_find_or implementation:

  • __found_local now is local variable (for each item);
  • barriers doesn't required anymore;
  • we updates the state of __found_local within one group through __dpl_sycl::__reduce_over_group operation :
                    __found_local = __dpl_sycl::__reduce_over_group(__item_id.get_group(), __found_local,
                                                                    typename _BrickTag::_LocalResultsReduceOp{});

where _LocalResultsReduceOp is :

  • __dpl_sycl::__minimum (for the __parallel_find_forward_tag);
  • __dpl_sycl::__maximum (for the __parallel_find_backward_tag);
  • __dpl_sycl::__bit_or (for the __parallel_or_tag).

The new operation __dpl_sycl::__bit_or has been defined in the file sycl_defs.h :

#if _ONEDPL_SYCL2020_FUNCTIONAL_OBJECTS_PRESENT
// ...
template <typename _T = void>
using __bit_or = sycl::bit_or<_T>;
#else  // _ONEDPL_SYCL2020_FUNCTIONAL_OBJECTS_PRESENT
// ...
template <typename _T = void>
using __bit_or = sycl::ONEAPI::bit_or<_T>;
#endif // _ONEDPL_SYCL2020_FUNCTIONAL_OBJECTS_PRESENT

SergeyKopienko and others added 5 commits July 5, 2024 09:16
…view comment: remove duplicated comments

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
(cherry picked from commit cbf9e7a)
…FPGA_DEVICE macro

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…view comment: rename __save_state_to for works with atomic to __save_state_to_atomic

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…sycl::bit_or instead of __dpl_sycl::__plus

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…- using sycl::bit_or instead of __dpl_sycl::__plus" - rolled back due very small or absent perf profit

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
… FoundLocalReduceOp to _LocalResultsReduceOp

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/remove_local_atomic_from_find_or_V1 branch from d6dd3da to 32a9d32 Compare July 5, 2024 10:22
… and use __found_state in the struct __parallel_or_tag

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
…dpl_sycl::__bit_or in struct __parallel_or_tag

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
@SergeyKopienko SergeyKopienko force-pushed the dev/skopienko/remove_local_atomic_from_find_or_V1 branch from 5b735ce to 9201751 Compare July 8, 2024 09:17
@@ -139,6 +141,9 @@ using __maximum = sycl::ONEAPI::maximum<_T>;

template <typename _T>
using __minimum = sycl::ONEAPI::minimum<_T>;

template <typename _T = void>
using __bit_or = sycl::ONEAPI::bit_or<_T>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -130,6 +130,8 @@ using __maximum = sycl::maximum<_T>;
template <typename _T = void>
using __minimum = sycl::minimum<_T>;

template <typename _T = void>
using __bit_or = sycl::bit_or<_T>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@SergeyKopienko SergeyKopienko Jul 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test evidence for complier icx 2021.1.2 (whene group operations isn't present) ): https://godbolt.org/z/PcnYT4vqe

…view comment: remove unexpected comment

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
@SergeyKopienko
Copy link
Contributor Author

@danhoeflinger, @julianmi is something that mandatory required to change in this PR?

Co-authored-by: Julian Miller <julian.miller@intel.com>
julianmi
julianmi previously approved these changes Jul 8, 2024
Copy link
Contributor

@julianmi julianmi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please wait for @danhoeflinger and green CI.

…NEDPL_FPGA_EMU instead of _ONEDPL_FPGA_DEVICE

Signed-off-by: Sergey Kopienko <sergey.kopienko@intel.com>
Copy link
Contributor

@danhoeflinger danhoeflinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with the comment explaining the restriction of workgroup size, it is OK.
I believe we can just make that restriction for all targets, but it should be examined, and we can do that in a separate PR.

@SergeyKopienko SergeyKopienko merged commit a1798c3 into main Jul 9, 2024
21 checks passed
@SergeyKopienko SergeyKopienko deleted the dev/skopienko/remove_local_atomic_from_find_or_V1 branch July 9, 2024 07:34
SergeyKopienko added a commit that referenced this pull request Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants