Skip to content

Conversation

lslusarczyk
Copy link
Contributor

@lslusarczyk lslusarczyk commented Jul 2, 2025

By default, the Level Zero adapter makes all device allocations resident on all "peer" devices, even if the application doesn't use those peer devices at all. USM residency adds significant overhead.

For v2 unfied-runtime adapter we switch in this PR to not calling zeContextMakeMemoryResident on all devices in the context, just the one that was specified during allocation and those specified using ext_oneapi_enable_peer_access.

When ext_oneapi_enable_peer_access / ext_oneapi_disable_peer_access is called, we retroactively call zeContextMakeMemoryResident or zeContextEvictMemory respectively, on all the active pages in the heap. This is done by calling ctl call in UMF drafted here: oneapi-src/unified-memory-framework@e6d5043

UMF part was done in oneapi-src/unified-memory-framework#1517

ZeOffsetToImageHandleMap;

// devices which user enabled p2p access by urUsmP2P(Enable|Disable)PeerAccessExp
std::unordered_set<DeviceId> p2pDeviceIds;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not a vector?

{
std::scoped_lock<ur_shared_mutex> Lock(Platform->ContextsMutex);
auto It = std::find(Contexts.begin(), Contexts.end(), hContext);
UR_ASSERT(It != Contexts.end(), UR_RESULT_ERROR_INVALID_CONTEXT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this happen? should this be just a regular assert?


void cleanupPools();
void cleanupPoolsForQueue(void *hQueue);
void addResidentDevice(ur_device_handle_t hDevice, ur_device_handle_t peerDevice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this function called from?

continue;
}

if (!p2pDevicesEnabledByUser.has_value())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this outside of the loop?

{
std::scoped_lock<ur_shared_mutex> Lock(Platform->ContextsMutex);
for (auto Context : Platform->Contexts) {
Context->addResidentDevice(commandDevice, peerDevice);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if some context does not contain commandDevice?


assert(0 = std::count_if(
std::begin(pDevices), std::end(pDevices),
[&](const auto pDevice) { return newPeerDevice->Id == pDevice->Id; }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuCtxEnablePeerAccess returns CUDA_ERROR_PEER_ACCESS_ALREADY_ENABLED. Could we have this error mapped to a UR error? It would help make the peer-to-peer SYCL API more consistent (See #19787.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants