Skip to content

Commit

Permalink
[SYCL][L0] Change to create all events host-visible by default (#6961)
Browse files Browse the repository at this point in the history
Signed-off-by: Sergey V Maslov <sergey.v.maslov@intel.com>
  • Loading branch information
smaslov-intel committed Oct 6, 2022
1 parent 3fd0850 commit f3d245d
Show file tree
Hide file tree
Showing 3 changed files with 29 additions and 37 deletions.
2 changes: 1 addition & 1 deletion sycl/doc/EnvironmentVariables.md
Original file line number Diff line number Diff line change
Expand Up @@ -184,7 +184,7 @@ variables in production code.</span>
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE` | Any(\*) | This environment variable enables users to control use of copy engines for copy operations. If the value is an integer, it will allow the use of copy engines, if available in the device, in Level Zero plugin to transfer SYCL buffer or image data between the host and/or device(s) and to fill SYCL buffer or image data in device or shared memory. The value of this environment variable can also be a pair of the form "lower_index:upper_index" where the indices point to copy engines in a list of all available copy engines. The default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COMPUTE_ENGINE` | Integer | It can be set to an integer (>=0) in which case all compute commands will be submitted to the command-queue with the given index in the compute command group. If it is instead set to a negative value then all available compute engines may be used. The default value is "0" |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_D2D_COPY` (experimental) | Integer | Allows the use of copy engine, if available in the device, in Level Zero plugin for device to device copy operations. The default is 0. This option is experimental and will be removed once heuristics are added to make a decision about use of copy engine for device to device copy operations. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero plugin would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 2, meaning only the last event in a batch is host-visible. |
| `SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS` | Any(\*) | Enable support of device-scope events whose state is not visible to the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=1 the Level Zero plugin would create all events having device-scope only and create proxy host-visible events for them when their status is needed (wait/query) on the host. If enabled mode is SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS=2 the Level Zero plugin would create all events having device-scope and add proxy host-visible event at the end of each command-list submission. The default is 0, meaning all events have host visibility. |
| `SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS` | Integer | When set to a positive value enables use of Level Zero immediate commandlists, which means there is no batching and all commands are immediately submitted for execution. Default is 0. Note: When immediate commandlist usage is enabled it is necessary to also set SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS to either 0 or 1. |
| `SYCL_PI_LEVEL_ZERO_USE_MULTIPLE_COMMANDLIST_BARRIERS` | Integer | When set to a positive value enables use of multiple Level Zero commandlists when submitting barriers. Default is 1. |
| `SYCL_PI_LEVEL_ZERO_USE_COPY_ENGINE_FOR_FILL` | Integer | When set to a positive value enables use of a copy engine for memory fill operations. Default is 0. |
Expand Down
61 changes: 28 additions & 33 deletions sycl/plugins/level_zero/pi_level_zero.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -234,6 +234,29 @@ static const pi_uint32 MaxNumEventsPerPool = [] {
return Result;
}();

// Get value of device scope events env var setting or default setting
static const int DeviceEventsSetting = [] {
const char *DeviceEventsSettingStr =
std::getenv("SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS");
if (DeviceEventsSettingStr) {
// Override the default if user has explicitly chosen the events scope.
switch (std::stoi(DeviceEventsSettingStr)) {
case 0:
return AllHostVisible;
case 1:
return OnDemandHostVisibleProxy;
case 2:
return LastCommandInBatchHostVisible;
default:
// fallthrough to default setting
break;
}
}
// This is our default setting, which is expected to be the fastest
// with the modern GPU drivers.
return AllHostVisible;
}();

// Helper function to implement zeHostSynchronize.
// The behavior is to avoid infinite wait during host sync under ZE_DEBUG.
// This allows for a much more responsive debugging of hangs.
Expand Down Expand Up @@ -643,7 +666,7 @@ inline static pi_result createEventAndAssociateQueue(
bool ForceHostVisible = false) {

if (!ForceHostVisible)
ForceHostVisible = Queue->Device->eventsScope() == AllHostVisible;
ForceHostVisible = DeviceEventsSetting == AllHostVisible;
PI_CALL(EventCreate(Queue->Context, Queue, ForceHostVisible, Event));

(*Event)->Queue = Queue;
Expand Down Expand Up @@ -809,33 +832,6 @@ pi_result _pi_device::initialize(int SubSubDeviceOrdinal,
return PI_SUCCESS;
}

// Get value of device scope events env var setting or -1 if unset
static const int DeviceEventsSetting = [] {
const char *DeviceEventsSettingStr =
std::getenv("SYCL_PI_LEVEL_ZERO_DEVICE_SCOPE_EVENTS");
if (!DeviceEventsSettingStr)
return -1;
return std::stoi(DeviceEventsSettingStr);
}();

// Controls the scope of events.
// If immediate commandlists are being used then use compatible event scopes.
enum EventsScope _pi_device::eventsScope() {
// Set default based on type of commandlists being used.
auto Default = useImmediateCommandLists() ? OnDemandHostVisibleProxy
: LastCommandInBatchHostVisible;
// Override the default if user has explicitly chosen the events scope.
switch (DeviceEventsSetting) {
case 0:
return AllHostVisible;
case 1:
return OnDemandHostVisibleProxy;
case 2:
return LastCommandInBatchHostVisible;
}
return Default;
}

// Get value of immediate commandlists env var setting or -1 if unset
static const int ImmediateCommandlistsSetting = [] {
const char *ImmediateCommandlistsSettingStr =
Expand Down Expand Up @@ -1627,7 +1623,7 @@ pi_result _pi_queue::executeCommandList(pi_command_list_ptr_t CommandList,
// in the command list is not empty, otherwise we are going to just create
// and remove proxy event right away and dereference deleted object
// afterwards.
if (Device->eventsScope() == LastCommandInBatchHostVisible &&
if (DeviceEventsSetting == LastCommandInBatchHostVisible &&
!CommandList->second.EventList.empty()) {
// If there are only internal events in the command list then we don't
// need to create host proxy event.
Expand Down Expand Up @@ -2007,7 +2003,7 @@ pi_result _pi_ze_event_list_t::createAndRetainPiZeEventList(
//
// Make sure that event1.wait() will wait for a host-visible
// event that is signalled before the command2 is enqueued.
if (CurQueue->Device->eventsScope() != AllHostVisible) {
if (DeviceEventsSetting != AllHostVisible) {
CurQueue->executeAllOpenCommandLists();
}
}
Expand Down Expand Up @@ -5527,7 +5523,7 @@ _pi_event::getOrCreateHostVisibleEvent(ze_event_handle_t &ZeHostVisibleEvent) {
this->Mutex);

if (!HostVisibleEvent) {
if (Queue->Device->eventsScope() != OnDemandHostVisibleProxy)
if (DeviceEventsSetting != OnDemandHostVisibleProxy)
die("getOrCreateHostVisibleEvent: missing host-visible event");

// Submit the command(s) signalling the proxy event to the queue.
Expand Down Expand Up @@ -5909,8 +5905,7 @@ pi_result piEventsWait(pi_uint32 NumEvents, const pi_event *EventList) {
return PI_ERROR_INVALID_EVENT;
}
for (uint32_t I = 0; I < NumEvents; I++) {
if (EventList[I]->Queue->Device->eventsScope() ==
OnDemandHostVisibleProxy) {
if (DeviceEventsSetting == OnDemandHostVisibleProxy) {
// Make sure to add all host-visible "proxy" event signals if needed.
// This ensures that all signalling commands are submitted below and
// thus proxy events can be waited without a deadlock.
Expand Down
3 changes: 0 additions & 3 deletions sycl/plugins/level_zero/pi_level_zero.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -564,9 +564,6 @@ struct _pi_device : _pi_object {
// For some devices (e.g. PVC) immediate commandlists are preferred.
bool ImmCommandListsPreferred;

// Return the Events scope to be used in for this device.
enum EventsScope eventsScope();

// Return whether to use immediate commandlists for this device.
bool useImmediateCommandLists();

Expand Down

0 comments on commit f3d245d

Please sign in to comment.