-
Notifications
You must be signed in to change notification settings - Fork 4
74 fix context transport primitives for the gpu #198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
lukemartinlogan
merged 38 commits into
main
from
74-fix-context-transport-primitives-for-the-gpu
Feb 13, 2026
Merged
74 fix context transport primitives for the gpu #198
lukemartinlogan
merged 38 commits into
main
from
74-fix-context-transport-primitives-for-the-gpu
Feb 13, 2026
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Created simplified GPU-specific version of MakeCopyFuture that works correctly on GPU and allows CPU deserialization from FutureShm. Key changes: - Added MakeCopyFutureGpu() in ipc_manager.h (GPU-only function) - Made Future constructors GPU-compatible with HSHM_CROSS_FUN - Use __threadfence() for GPU memory fencing - Fixed UniqueId operators to be GPU-compatible - Test validates: GPU NewTask → MakeCopyFutureGpu → CPU deserialize The function mirrors the pattern from passing serialization tests, using task->SerializeIn(archive) directly for reliable GPU execution. Test results: 100% pass rate on GPU IPC buffer allocation tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Implemented GPU kernel task creation and serialization using MakeCopyFutureGpu. Changes: - GPU kernel now creates tasks using NewTask on GPU - Uses MakeCopyFutureGpu to serialize tasks for future processing - Added error diagnostic for MakeCopyFutureGpu failures (-14) - All GPU submission tests now pass (100% success rate) Test flow: 1. GPU kernel initializes with CHIMAERA_GPU_INIT 2. Creates task with NewTask<GpuSubmitTask> 3. Serializes with MakeCopyFutureGpu 4. Returns success (result == 1) Test results: 4/4 tests passing (gpu_init, cpu_submission, multiple_executions, kernel_task_submission) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Demonstrates that vkWaitSemaphores efficiently sleeps a CPU thread (~0ms CPU time over ~5s wall-clock) instead of busy-polling, validating it as a GPU→CPU notification primitive for the ring buffer architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GPU device code cannot signal a CPU thread to wake — all Vulkan/CUDA semaphore signaling is stream-ordered and fires only after a kernel completes, making the approach unsuitable for persistent GPU kernels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Designate one worker (N-2) as the GPU worker that polls GPU lanes, while regular workers no longer receive GPU lane assignments. Refactor ProcessNewTasks to accept a TaskLane* parameter and extract per-task logic into ProcessNewTask. The GPU worker forwards dequeued tasks to scheduler workers via round-robin in RuntimeMapTask. GPU workers never sleep to ensure continuous polling. Also remove GpuTaskQueue alias in favor of TaskQueue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace raw ZeroMQ calls with lightbeam PUSH/PULL transport for client task submission (TCP/IPC modes). Add inline bulk data serialization to LocalSaveTaskArchive/LocalLoadTaskArchive so TCP/IPC transport can transfer actual data bytes instead of ShmPtr addresses. Add real bdev task round-trip tests (Create, AllocateBlocks, Write+Read) to all transport mode tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove inline_bulk_ flag from LocalSaveTaskArchive. Instead, bulk() checks whether the ShmPtr's alloc_id_ is null to decide if data must be inlined (private memory) or if the ShmPtr itself suffices (shared memory). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace LocalSaveTaskArchive/LocalLoadTaskArchive with SaveTaskArchive/ LoadTaskArchive in SendZmq, ClientRecv, ClientSend, RecvZmqClientThread, and Recv. This eliminates the manual wire protocol and uses lightbeam's multi-frame bulk transfer (2 copies: ZMQ send + recv) instead of inlining bulk data into the serialized stream (4-5 copies). Also add ContinueBlockedTasks(true) after epoll_wait in SuspendMe() so periodic tasks like ClientRecv/Send execute immediately on wake. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add ShmClient/ShmServer that transfer data through a shared copy_space buffer with atomic flag synchronization, eliminating kernel crossings for same-node IPC. Bulks with non-null alloc_id skip the data copy and pass only the ShmPtr; the receiver sets ptr_ to nullptr for the caller to resolve. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…nsport-primitives-for-the-gpu
Fix LoadTaskArchive::bulk() to use ptr.IsNull() instead of ptr.alloc_id_.IsNull() when checking for caller-provided buffers. MallocAllocator uses null alloc_id_ for all allocations, so the old check always took the zero-copy path, causing read data to never reach the caller's buffer over TCP. Split bdev_file_explicit_backend test into three per-mode variants (SHM, TCP, IPC) that each run as separate processes. Update docker-compose to only start the runtime on node1, with run_tests.sh driving test execution via docker exec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The CTE run_tests.sh unconditionally overwrote IOWARP_CORE_ROOT with the devcontainer-internal path (/workspace), but Docker volume mounts need the host path. Respect the existing IOWARP_CORE_ROOT set by the devcontainer (matching the bdev test pattern). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add local_sched.h and local_sched.cc that were missing from git (scheduler_factory.cc includes local_sched.h) - Add #include <algorithm> to shm_transport.h for std::min with initializer_list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.