Fix DLPack P2P Cross-Device Transfer (cu130) by pollockjj · Pull Request #180 · pollockjj/ComfyUI-MultiGPU

pollockjj · 2026-03-17T22:53:10Z

Fixes #167 and #177 by adding a proper device guard and P2P registry for DLPack staging.

Changes:

Bumps version to 2.6.1
Introduces p2p_registry.py cache for cudaDeviceCanAccessPeer
Offloads CPU staging logic inside _patch_comfy_kitchen_dlpack_device_guard when cuda:x to cuda:y lacks direct PCIe/NVLink P2P access.
Passes strict linting constraints.

Copilot

Pull request overview

Fixes cu130 cross-device DLPack transfer failures by introducing a cached CUDA P2P capability check and adding a CPU-staging fallback in the comfy_kitchen DLPack wrapper when direct peer access is unavailable.

Changes:

Bump project version to 2.6.1.
Add p2p_registry.py to cache cudaDeviceCanAccessPeer results.
Update the comfy_kitchen _wrap_for_dlpack patch to use a P2P-aware CPU-staging fallback for cross-device transfers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
`pyproject.toml`	Version bump to `2.6.1` for the cu130 DLPack transfer fix release.
`p2p_registry.py`	New cached P2P-access registry using `cudaDeviceCanAccessPeer` via ctypes.
`__init__.py`	Patches comfy_kitchen DLPack wrapping to stage through CPU when P2P is unavailable.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

+                    return wrap_for_dlpack(staged_tensor, *args[1:], **kwargs)
+                else:
+                    return wrap_for_dlpack(staged_tensor, **kwargs)


+def _get_libcudart():
+    """Load libcudart.so once and cache the handle."""
+    global _libcudart
+    if _libcudart is None:
+        _libcudart = ctypes.CDLL("libcudart.so")
+    return _libcudart


+    def _raw_can_access_peer(device_a: int, device_b: int) -> bool:
+        """Call cudaDeviceCanAccessPeer via ctypes. Returns True if P2P is available."""
+        lib = _get_libcudart()
+        can_access = ctypes.c_int(0)
+        result = lib.cudaDeviceCanAccessPeer(ctypes.byref(can_access), device_a, device_b)
+        if result != 0:


+            return d is not None and d.type == "cuda" and d.index is not None
+
+        if _valid_cuda(tensor_device) and _valid_cuda(exec_device):
+            if tensor_device.index != exec_device.index and not p2p_registry.can_access_peer(tensor_device.index, exec_device.index):


Fix DLPack CPU staging constraint fallback for issue #167/#177

b526e11

Copilot AI review requested due to automatic review settings March 17, 2026 22:53

Copilot started reviewing on behalf of pollockjj March 17, 2026 22:53 View session

pollockjj linked an issue Mar 17, 2026 that may be closed by this pull request

cudaErrorIllegalAddress during cross-device DLPack transfer (RTX 3090 Ti + RTX 5070) #177

Closed

Copilot AI reviewed Mar 17, 2026

View reviewed changes

pollockjj merged commit 8c4034f into main Mar 17, 2026
4 checks passed

pollockjj deleted the fix-dlpack-p2p-cu130 branch March 17, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix DLPack P2P Cross-Device Transfer (cu130)#180

Fix DLPack P2P Cross-Device Transfer (cu130)#180
pollockjj merged 1 commit intomainfrom
fix-dlpack-p2p-cu130

pollockjj commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

pollockjj commented Mar 17, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants