Skip to content

Fix DLPack P2P Cross-Device Transfer (cu130)#180

Merged
pollockjj merged 1 commit intomainfrom
fix-dlpack-p2p-cu130
Mar 17, 2026
Merged

Fix DLPack P2P Cross-Device Transfer (cu130)#180
pollockjj merged 1 commit intomainfrom
fix-dlpack-p2p-cu130

Conversation

@pollockjj
Copy link
Copy Markdown
Owner

Fixes #167 and #177 by adding a proper device guard and P2P registry for DLPack staging.

Changes:

  • Bumps version to 2.6.1
  • Introduces p2p_registry.py cache for cudaDeviceCanAccessPeer
  • Offloads CPU staging logic inside _patch_comfy_kitchen_dlpack_device_guard when cuda:x to cuda:y lacks direct PCIe/NVLink P2P access.
  • Passes strict linting constraints.

Copilot AI review requested due to automatic review settings March 17, 2026 22:53
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes cu130 cross-device DLPack transfer failures by introducing a cached CUDA P2P capability check and adding a CPU-staging fallback in the comfy_kitchen DLPack wrapper when direct peer access is unavailable.

Changes:

  • Bump project version to 2.6.1.
  • Add p2p_registry.py to cache cudaDeviceCanAccessPeer results.
  • Update the comfy_kitchen _wrap_for_dlpack patch to use a P2P-aware CPU-staging fallback for cross-device transfers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
pyproject.toml Version bump to 2.6.1 for the cu130 DLPack transfer fix release.
p2p_registry.py New cached P2P-access registry using cudaDeviceCanAccessPeer via ctypes.
__init__.py Patches comfy_kitchen DLPack wrapping to stage through CPU when P2P is unavailable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread __init__.py
Comment on lines +434 to +436
return wrap_for_dlpack(staged_tensor, *args[1:], **kwargs)
else:
return wrap_for_dlpack(staged_tensor, **kwargs)
Comment thread p2p_registry.py
Comment on lines +16 to +21
def _get_libcudart():
"""Load libcudart.so once and cache the handle."""
global _libcudart
if _libcudart is None:
_libcudart = ctypes.CDLL("libcudart.so")
return _libcudart
Comment thread p2p_registry.py
Comment on lines +36 to +41
def _raw_can_access_peer(device_a: int, device_b: int) -> bool:
"""Call cudaDeviceCanAccessPeer via ctypes. Returns True if P2P is available."""
lib = _get_libcudart()
can_access = ctypes.c_int(0)
result = lib.cudaDeviceCanAccessPeer(ctypes.byref(can_access), device_a, device_b)
if result != 0:
Comment thread __init__.py
return d is not None and d.type == "cuda" and d.index is not None

if _valid_cuda(tensor_device) and _valid_cuda(exec_device):
if tensor_device.index != exec_device.index and not p2p_registry.can_access_peer(tensor_device.index, exec_device.index):
@pollockjj pollockjj merged commit 8c4034f into main Mar 17, 2026
4 checks passed
@pollockjj pollockjj deleted the fix-dlpack-p2p-cu130 branch March 17, 2026 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants