From 060edcd372074f6764cecfbf1a1b562fb14f6818 Mon Sep 17 00:00:00 2001 From: "Larsen, Steffen" Date: Fri, 10 Mar 2023 01:44:56 -0800 Subject: [PATCH] [SYCL][L0] Disable native 2D USM memcpy by default Due to a bug in memcpy 2D to and from non-USM pointers in the L0 native region memcpy make L0 report support for this operations based on a new SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D which defaults to disabled. Once the bug is fixed, this new environment variable should either be removed or enabled by default. Signed-off-by: Larsen, Steffen --- sycl/doc/EnvironmentVariables.md | 1 + sycl/plugins/level_zero/pi_level_zero.cpp | 15 +++++++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/sycl/doc/EnvironmentVariables.md b/sycl/doc/EnvironmentVariables.md index 3dec718db3ebc..a3148f7995bcc 100755 --- a/sycl/doc/EnvironmentVariables.md +++ b/sycl/doc/EnvironmentVariables.md @@ -259,6 +259,7 @@ variables in production code. | `SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. | | `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 20. | | `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Controls if/where to make USM allocations resident at the time of allocation. If set to 0 (default) then no special residency is forced. If set to 1 then allocation (device or shared) is made resident at the device of allocation. If set to 2 then allocation (device or shared) is made resident on all devices in the context of allocation that have P2P access to the device of allocation. For host allocation, any non-0 setting forces the allocation resident on all devices in the context. | +| `SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D` | Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. | ## Debugging variables for CUDA Plugin diff --git a/sycl/plugins/level_zero/pi_level_zero.cpp b/sycl/plugins/level_zero/pi_level_zero.cpp index 984bd342989bd..6c6f81184ba5a 100644 --- a/sycl/plugins/level_zero/pi_level_zero.cpp +++ b/sycl/plugins/level_zero/pi_level_zero.cpp @@ -97,6 +97,16 @@ static const bool IndirectAccessTrackingEnabled = [] { nullptr; }(); +// Due to a bug with 2D memory copy to and from non-USM pointers, this option is +// disabled by default. +static const bool UseMemcpy2DOperations = [] { + const char *UseMemcpy2DOperationsFlag = + std::getenv("SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D"); + if (!UseMemcpy2DOperationsFlag) + return false; + return std::stoi(UseMemcpy2DOperationsFlag) > 0; +}(); + static usm_settings::USMAllocatorConfig USMAllocatorConfigInstance; // Map from L0 to PI result. @@ -2291,8 +2301,9 @@ pi_result piContextGetInfo(pi_context Context, pi_context_info ParamName, case PI_CONTEXT_INFO_REFERENCE_COUNT: return ReturnValue(pi_uint32{Context->RefCount.load()}); case PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMCPY2D_SUPPORT: - // 2D USM memcpy is supported. - return ReturnValue(pi_bool{true}); + // 2D USM memcpy is supported unless disabled through + // SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D. + return ReturnValue(pi_bool{UseMemcpy2DOperations}); case PI_EXT_ONEAPI_CONTEXT_INFO_USM_FILL2D_SUPPORT: case PI_EXT_ONEAPI_CONTEXT_INFO_USM_MEMSET2D_SUPPORT: // 2D USM fill and memset is not supported.