Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SYCL] Implement sycl_ext_oneapi_memcpy2d extension #7370

Merged
merged 20 commits into from
Dec 13, 2022

Conversation

steffenlarsen
Copy link
Contributor

This commit adds an implementation of the sycl_ext_oneapi_memcpy2d extension. This includes the following:

  • Three new PI API functions; piextUSMEnqueueFill2D, piextUSMEnqueueMemset2D, and piextUSMEnqueueMemcpy2D.
  • Three new PI context queries; PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMCPY2D_SUPPORT, PI_EXT_ONEAPI_CONTEXT_INFO_USM_FILL2D_SUPPORT, and PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMSET2D_SUPPORT. Each of these return a boolean value signifying if the PI context supports each of the new API functions.
  • New handler member functions; ext_oneapi_memcpy2d, ext_oneapi_copy2d, ext_oneapi_memset2d, and ext_oneapi_fill2d. These will create new commands which will use the new PI commands if they are supported. If the new PI API functions are not supported, these member functions will launch auxiliary kernels to do the required work.
  • New queue shortcuts for each of the new handler members.

This commit adds an implementation of the sycl_ext_oneapi_memcpy2d
extension. This includes the following:

* Three new PI API functions; piextUSMEnqueueFill2D,
  piextUSMEnqueueMemset2D, and piextUSMEnqueueMemcpy2D.
* Three new PI context queries;
  PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMCPY2D_SUPPORT,
  PI_EXT_ONEAPI_CONTEXT_INFO_USM_FILL2D_SUPPORT, and
  PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMSET2D_SUPPORT. Each of these return
  a boolean value signifying if the PI context supports each of the new
  API functions.
* New handler member functions; ext_oneapi_memcpy2d, ext_oneapi_copy2d,
  ext_oneapi_memset2d, and ext_oneapi_fill2d. These will create new
  commands which will use the new PI commands if they are supported.
  If the new PI API functions are not supported, these member functions
  will launch auxiliary kernels to do the required work.
* New queue shortcuts for each of the new handler members.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen steffenlarsen requested a review from a team November 11, 2022 12:40
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
sycl/include/sycl/detail/cg.hpp Outdated Show resolved Hide resolved
PI_CONTEXT_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES = 0x10011
PI_CONTEXT_INFO_ATOMIC_MEMORY_SCOPE_CAPABILITIES = 0x10011,
// Native 2D USM memory operation support
PI_EXT_ONEAPI_CONTEXT_INFO_USM_FILL2D_SUPPORT = 0x30000,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, why 0x30000 and not 0x20000, for example? Is it documented anywhere which values should be picked up for new capabilities?

Copy link
Contributor Author

@steffenlarsen steffenlarsen Nov 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we have any rules necessarily. All these values are taken from OpenCL so we normally just try and move around what values are in OpenCL when adding extension values. In theory it should be fine as long as they don't conflict with any values in the same category, but it's good to keep a buffer.

sycl/include/sycl/handler.hpp Outdated Show resolved Hide resolved
sycl/include/sycl/handler.hpp Outdated Show resolved Hide resolved
@@ -1930,6 +1930,23 @@ pi_result piextUSMEnqueueMemAdvise(pi_queue, const void *, size_t,
DIE_NO_IMPLEMENTATION;
}

pi_result piextUSMEnqueueFill2D(pi_queue, void *, size_t, size_t, const void *,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about get info for ESIMD Emulator plugin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

piContextGetInfo isn't implemented in the esimd_emulator, so I assume it takes another route.

sycl/unittests/Extensions/USMMemcpy2D.cpp Show resolved Hide resolved
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1379

@steffenlarsen
Copy link
Contributor Author

@intel/llvm-reviewers-cuda | @intel/dpcpp-esimd-reviewers | @intel/dpcpp-l0-pi-reviewers | @intel/llvm-reviewers-runtime - Friendly ping. 😄

Copy link
Contributor

@sergey-semenov sergey-semenov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comments aside, LGTM. @intel/dpcpp-l0-pi-reviewers Could you please take a look as well?

sycl/unittests/Extensions/USMMemcpy2D.cpp Outdated Show resolved Hide resolved
sycl/unittests/Extensions/USMMemcpy2D.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@againull againull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L0 plugin changes look good to me.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Copy link
Contributor

@smaslov-intel smaslov-intel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am temporarily blocking the merge until this is reviewed/approved for Unified Runtime

Copy link
Contributor

@kbenzie kbenzie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made oneapi-src/unified-runtime#83 to ensure we track these changes in the unified runtime spec.

/// \param num_events_in_waitlist is the number of events to wait on
/// \param events_waitlist is an array of events to wait on
/// \param event is the event that represents this operation
__SYCL_EXPORT pi_result piextUSMEnqueueFill2D(pi_queue queue, void *ptr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The similar "buffer" API use "Rect" instead of "2D". I think we should unify names if possible (especially for Unified Runtime). Tag @kbenzie

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smaslov-intel the difference with *Rect though is they support 2D or 3D where as this is only 2D.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not make sense to add 3D for USM?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3D could make sense, that would be outside of the scope of this extension though.

I'm less sure if combining the 2D and 3D entry points into a rect entry point is the right way to go though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we were to split 2D/3D then maybe we should do the same to the existing Rect API four buffers
OK, this discussion belongs to Unified Runtime issue, I guess

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
@steffenlarsen
Copy link
Contributor Author

/verify with intel/llvm-test-suite#1379

@aelovikov-intel
Copy link
Contributor

aelovikov-intel commented Dec 13, 2022

Failures:

OpenCL:

Timed Out Tests (5):
??, also failed in intel/llvm-test-suite#1405, looks flaky
SYCL :: Basic/accessor/get_host_task_access_deduction.cpp
??, also failed in http://icl-jenkins2.sc.intel.com:8080/blue/organizations/jenkins/SYCL_CI%2Fintel%2FWin%2FTest_Suite/detail/Test_Suite/2258/pipeline/288
SYCL :: Basic/stream/stream_copies_buffer_sync.cpp

#7741
SYCL :: HostInteropTask/host-task-dependency3.cpp
SYCL :: HostInteropTask/host-task-failure.cpp

??, also failed in http://icl-jenkins2.sc.intel.com:8080/blue/organizations/jenkins/SYCL_CI%2Fintel%2FWin%2FTest_Suite/detail/Test_Suite/2260/pipeline/298/
SYCL :: Regression/unoptimized_stream.cpp


Failed Tests (2):
#7742
SYCL :: KernelAndProgram/multiple-kernel-linking.cpp

??, but I'm sure I've seen it elsewhere in the last day or two.
also failed in http://icl-jenkins2.sc.intel.com:8080/blue/organizations/jenkins/SYCL_CI%2Fintel%2FWin%2FTest_Suite/detail/Test_Suite/2260/pipeline/298/
SYCL :: XPTI/kernel/basic.cpp


Level Zero:

Timed Out Tests (2):
#7741
SYCL :: HostInteropTask/host-task-dependency3.cpp
SYCL :: HostInteropTask/host-task.cpp


Failed Tests (19):
#7745
SYCL :: Basic/group_async_copy.cpp

#7744
SYCL :: DeviceLib/imf_fp16_trivial_test.cpp
SYCL :: DeviceLib/imf_fp32_test.cpp
SYCL :: DeviceLib/imf_half_type_cast.cpp

#7743
SYCL :: Reduction/reduction_big_data.cpp
SYCL :: Reduction/reduction_nd_N_vars.cpp
SYCL :: Reduction/reduction_nd_conditional.cpp
SYCL :: Reduction/reduction_nd_dw.cpp
SYCL :: Reduction/reduction_nd_ext_half.cpp
SYCL :: Reduction/reduction_nd_lambda.cpp
SYCL :: Reduction/reduction_range_1d_dw.cpp
SYCL :: Reduction/reduction_range_1d_dw_64bit.cpp
SYCL :: Reduction/reduction_range_1d_rw.cpp
SYCL :: Reduction/reduction_range_2d_dw.cpp
SYCL :: Reduction/reduction_range_2d_rw.cpp
SYCL :: Reduction/reduction_range_3d_dw.cpp
SYCL :: Reduction/reduction_range_N_vars.cpp
SYCL :: Reduction/reduction_range_lambda.cpp
SYCL :: Reduction/reduction_usm.cpp

@pvchupin pvchupin merged commit 516d411 into intel:sycl Dec 13, 2022
steffenlarsen added a commit to steffenlarsen/llvm that referenced this pull request Dec 16, 2022
intel#7370 implemented the proposed
sycl_ext_oneapi_memcpy2d extension. With that PR merged we can now move
the extension specification to supported.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
pvchupin pushed a commit that referenced this pull request Dec 21, 2022
#7370 implemented the proposed
sycl_ext_oneapi_memcpy2d extension. With that PR merged we can now move
the extension specification to supported.

Signed-off-by: Larsen, Steffen <steffen.larsen@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants