Skip to content

[UR][L0v2] Reuse L0 default context for full-platform SYCL contexts#21789

Open
mateuszpn wants to merge 7 commits intointel:syclfrom
mateuszpn:def-context
Open

[UR][L0v2] Reuse L0 default context for full-platform SYCL contexts#21789
mateuszpn wants to merge 7 commits intointel:syclfrom
mateuszpn:def-context

Conversation

@mateuszpn
Copy link
Copy Markdown
Contributor

When a SYCL context is created with all root devices of a Level Zero platform, the L0 v2 adapter now calls zeDriverGetDefaultContext instead of zeContextCreate. This ensures SYCL shares the same ze_context_handle_t with other L0 users (e.g. PyTorch), eliminating USM/IPC/residency conflicts from multiple independent contexts over the same driver.

@mateuszpn mateuszpn force-pushed the def-context branch 2 times, most recently from dcac904 to 1e8166f Compare April 16, 2026 14:17
@mateuszpn mateuszpn marked this pull request as ready for review April 16, 2026 14:20
@mateuszpn mateuszpn requested review from a team as code owners April 16, 2026 14:20
Copy link
Copy Markdown

@guangyey guangyey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment thread sycl/doc/extensions/supported/sycl_ext_oneapi_backend_level_zero.md Outdated
Comment on lines +13 to +16
// Before the fix: urContextCreate always calls zeContextCreate, so two
// full-platform contexts have different ze_context handles -> test FAILS.
// After the fix: urContextCreate reuses the driver default context, so both
// calls return the same handle -> test PASSES.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Before the fix: urContextCreate always calls zeContextCreate, so two
// full-platform contexts have different ze_context handles -> test FAILS.
// After the fix: urContextCreate reuses the driver default context, so both
// calls return the same handle -> test PASSES.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

std::cout << "PASS: platform default context matches.\n";
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might also be useful to check that the native handles match the one returned by zeDriverGetDefaultContext.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines +210 to +227
# Detect whether the Level Zero SDK provides zeDriverGetDefaultContext.
# Older L0 loaders do not export this symbol; the guard prevents link failures.
# We only check for the typedef in ze_ddi.h to avoid a link step.
include(CheckCXXSourceCompiles)
set(_UR_L0_SAVED_CMAKE_REQUIRED_INCLUDES "${CMAKE_REQUIRED_INCLUDES}")
get_target_property(_L0_INCLUDE_DIRS LevelZeroLoader-Headers INTERFACE_INCLUDE_DIRECTORIES)
set(CMAKE_REQUIRED_INCLUDES ${_L0_INCLUDE_DIRS})
check_cxx_source_compiles(
"#include <ze_ddi.h>\nint main() { ze_pfnDriverGetDefaultContext_t fn = nullptr; (void)fn; return 0; }"
HAVE_ZE_DRIVER_GET_DEFAULT_CONTEXT
)
set(CMAKE_REQUIRED_INCLUDES "${_UR_L0_SAVED_CMAKE_REQUIRED_INCLUDES}")
unset(_UR_L0_SAVED_CMAKE_REQUIRED_INCLUDES)
if(HAVE_ZE_DRIVER_GET_DEFAULT_CONTEXT)
target_compile_definitions(ur_adapter_level_zero_v2 PRIVATE
HAVE_ZE_DRIVER_GET_DEFAULT_CONTEXT)
endif()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you encountered link failures? It shouldn't be possible. We hardcode the level-zero loader we link against. This shouldn't be required.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed

ZE2UR_CALL(zeContextCreate, (hPlatform->ZeDriver, &contextDesc, &zeContext));
bool ownZeContext = true;

if (!pProperties && isFullPlatformRootDeviceList(deviceCount, phDevices)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why check against properties being non-null?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still being checked. This looks like an unrelated change to me.

Comment on lines +77 to +149
static bool isDriverRootDevice(ze_device_handle_t zeDevice) {
ze_device_handle_t zeRootDevice = nullptr;
ze_result_t zeResult =
ZE_CALL_NOCHECK(zeDeviceGetRootDevice, (zeDevice, &zeRootDevice));

if (zeResult == ZE_RESULT_SUCCESS) {
return zeRootDevice == nullptr || zeRootDevice == zeDevice;
}

// If unsupported, keep behavior compatible with older loaders/drivers.
if (zeResult == ZE_RESULT_ERROR_UNSUPPORTED_FEATURE) {
return true;
}

return false;
}

static bool isFullPlatformRootDeviceList(uint32_t deviceCount,
const ur_device_handle_t *phDevices) {
if (deviceCount == 0 || !phDevices) {
return false;
}

ur_platform_handle_t hPlatform = phDevices[0]->Platform;

std::vector<ze_device_handle_t> requestedDevices;
requestedDevices.reserve(deviceCount);
for (uint32_t i = 0; i < deviceCount; ++i) {
if (!phDevices[i] || phDevices[i]->Platform != hPlatform ||
phDevices[i]->RootDevice) {
return false;
}

if (!isDriverRootDevice(phDevices[i]->ZeDevice)) {
return false;
}

requestedDevices.push_back(phDevices[i]->ZeDevice);
}

sortAndUnique(requestedDevices);

uint32_t zeDeviceCount = 0;
ze_result_t zeResult = ZE_CALL_NOCHECK(
zeDeviceGet, (hPlatform->ZeDriver, &zeDeviceCount, nullptr));
if (zeResult != ZE_RESULT_SUCCESS || zeDeviceCount == 0) {
return false;
}

std::vector<ze_device_handle_t> platformDevices(zeDeviceCount);
zeResult = ZE_CALL_NOCHECK(zeDeviceGet, (hPlatform->ZeDriver, &zeDeviceCount,
platformDevices.data()));
if (zeResult != ZE_RESULT_SUCCESS) {
return false;
}

platformDevices.resize(zeDeviceCount);
platformDevices.erase(std::remove_if(platformDevices.begin(),
platformDevices.end(),
[](ze_device_handle_t zeDevice) {
return !isDriverRootDevice(zeDevice);
}),
platformDevices.end());

if (platformDevices.empty()) {
return false;
}

sortAndUnique(platformDevices);

return requestedDevices == platformDevices;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates some of the logic that already exists around urDeviceGet.

std::vector<std::unique_ptr<ur_device_handle_t_>> URDevicesCache;

I think it might be better to simply compare the devices on the user-provided device list against the list of devices returned from urDeviceGet with default parameters.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@bratpiorka bratpiorka requested a review from pbalcer April 27, 2026 09:16
@mateuszpn mateuszpn force-pushed the def-context branch 2 times, most recently from 08c8f80 to 2597696 Compare April 28, 2026 14:18
Comment on lines 79 to 81
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If !phDevices[0] is null, user gave us an incorrect device array. I'd check that earlier or propagate an error to the API.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks are moved to the top of urContextCreate

Comment on lines 85 to 94
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The checks are moved to urContextCreate, so the conditions may be removed here.

Comment on lines 226 to 227
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should fail if zeDriverGetDefaultContext doesn't succeed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Previously, we would always create a new context; now we fall back to the same behavior if zeDriverGetDefaultContext fails, which seems reasonable.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't zeDriverGetDefaultContext fail on older drivers? I see // REQUIRES-INTEL-DRIVER: lin: 36300 requirement in the SYCL e2e tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kswiecicki there was an issue with us calling a function present in the loader, but not present in the driver. The device sync if i recall correctly. Is this the same scenario?

zeDriverGetDefaultContext fails, which seems reasonable.

Right, makes sense.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remember correctly the device sync call was crashing for some reason instead of returning an error., but I believe that was a unique case.

ZE2UR_CALL(zeContextCreate, (hPlatform->ZeDriver, &contextDesc, &zeContext));
bool ownZeContext = true;

if (!pProperties && isFullPlatformRootDeviceList(deviceCount, phDevices)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still being checked. This looks like an unrelated change to me.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

phDevices and phContext param checks are already done in the validation layer here.
We could the following restriction to the spec though:

- $X_RESULT_ERROR_INVALID_SIZE:
    - "`deviceCount == 0`"

@pbalcer what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kswiecicki yes, we should just add a check to the validation layer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use UR_ASSERT(phDevices[i], UR_RESULT_ERROR_INVALID_NULL_HANDLE).
We could add something along the lines to the spec too:

- $X_RESULT_ERROR_INVALID_NULL_HANDLE
    - "If device handles in phDevices are not valid handles."

Comment on lines 226 to 227
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't zeDriverGetDefaultContext fail on older drivers? I see // REQUIRES-INTEL-DRIVER: lin: 36300 requirement in the SYCL e2e tests.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Instead of setting that, you could just create the ur_context_handle_t_ here and return UR_RESULT_SUCCESS.
Then if (!zeContext) { and bool ownZeContext = true fragments could be removed.
Also, is zeDefaultContext necessary since there's zeContext variable that can hold ze_context_handle_t type already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants