74 fix context transport primitives for the gpu #198

lukemartinlogan · 2026-02-10T18:28:38Z

No description provided.

Created simplified GPU-specific version of MakeCopyFuture that works correctly on GPU and allows CPU deserialization from FutureShm. Key changes: - Added MakeCopyFutureGpu() in ipc_manager.h (GPU-only function) - Made Future constructors GPU-compatible with HSHM_CROSS_FUN - Use __threadfence() for GPU memory fencing - Fixed UniqueId operators to be GPU-compatible - Test validates: GPU NewTask → MakeCopyFutureGpu → CPU deserialize The function mirrors the pattern from passing serialization tests, using task->SerializeIn(archive) directly for reliable GPU execution. Test results: 100% pass rate on GPU IPC buffer allocation tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Implemented GPU kernel task creation and serialization using MakeCopyFutureGpu. Changes: - GPU kernel now creates tasks using NewTask on GPU - Uses MakeCopyFutureGpu to serialize tasks for future processing - Added error diagnostic for MakeCopyFutureGpu failures (-14) - All GPU submission tests now pass (100% success rate) Test flow: 1. GPU kernel initializes with CHIMAERA_GPU_INIT 2. Creates task with NewTask<GpuSubmitTask> 3. Serializes with MakeCopyFutureGpu 4. Returns success (result == 1) Test results: 4/4 tests passing (gpu_init, cpu_submission, multiple_executions, kernel_task_submission) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…roperly

Demonstrates that vkWaitSemaphores efficiently sleeps a CPU thread (~0ms CPU time over ~5s wall-clock) instead of busy-polling, validating it as a GPU→CPU notification primitive for the ring buffer architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

GPU device code cannot signal a CPU thread to wake — all Vulkan/CUDA semaphore signaling is stream-ordered and fires only after a kernel completes, making the approach unsuitable for persistent GPU kernels. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Designate one worker (N-2) as the GPU worker that polls GPU lanes, while regular workers no longer receive GPU lane assignments. Refactor ProcessNewTasks to accept a TaskLane* parameter and extract per-task logic into ProcessNewTask. The GPU worker forwards dequeued tasks to scheduler workers via round-robin in RuntimeMapTask. GPU workers never sleep to ensure continuous polling. Also remove GpuTaskQueue alias in favor of TaskQueue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace raw ZeroMQ calls with lightbeam PUSH/PULL transport for client task submission (TCP/IPC modes). Add inline bulk data serialization to LocalSaveTaskArchive/LocalLoadTaskArchive so TCP/IPC transport can transfer actual data bytes instead of ShmPtr addresses. Add real bdev task round-trip tests (Create, AllocateBlocks, Write+Read) to all transport mode tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Remove inline_bulk_ flag from LocalSaveTaskArchive. Instead, bulk() checks whether the ShmPtr's alloc_id_ is null to decide if data must be inlined (private memory) or if the ShmPtr itself suffices (shared memory). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace LocalSaveTaskArchive/LocalLoadTaskArchive with SaveTaskArchive/ LoadTaskArchive in SendZmq, ClientRecv, ClientSend, RecvZmqClientThread, and Recv. This eliminates the manual wire protocol and uses lightbeam's multi-frame bulk transfer (2 copies: ZMQ send + recv) instead of inlining bulk data into the serialized stream (4-5 copies). Also add ContinueBlockedTasks(true) after epoll_wait in SuspendMe() so periodic tasks like ClientRecv/Send execute immediately on wake. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add ShmClient/ShmServer that transfer data through a shared copy_space buffer with atomic flag synchronization, eliminating kernel crossings for same-node IPC. Bulks with non-null alloc_id skip the data copy and pass only the ShmPtr; the receiver sets ptr_ to nullptr for the caller to resolve. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… faults

…nsport-primitives-for-the-gpu

Fix LoadTaskArchive::bulk() to use ptr.IsNull() instead of ptr.alloc_id_.IsNull() when checking for caller-provided buffers. MallocAllocator uses null alloc_id_ for all allocations, so the old check always took the zero-copy path, causing read data to never reach the caller's buffer over TCP. Split bdev_file_explicit_backend test into three per-mode variants (SHM, TCP, IPC) that each run as separate processes. Update docker-compose to only start the runtime on node1, with run_tests.sh driving test execution via docker exec. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The CTE run_tests.sh unconditionally overwrote IOWARP_CORE_ROOT with the devcontainer-internal path (/workspace), but Docker volume mounts need the host path. Respect the existing IOWARP_CORE_ROOT set by the devcontainer (matching the bdev test pattern). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add local_sched.h and local_sched.cc that were missing from git (scheduler_factory.cc includes local_sched.h) - Add #include <algorithm> to shm_transport.h for std::min with initializer_list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

lukemartinlogan and others added 19 commits February 8, 2026 01:19

GPU transfers

f618f82

IpcManager on GPU clients

bf354b5

IPC allocation tests pass

9b83012

add_cuda_executable

057486f

More ring buffer tests

35dcb1f

Submission of futures to queues seems to behave if queues allocated p…

b89169d

…roperly

Remove ai-prompts

8fc2371

Use AernaAllocator

3ac703c

CHI_IPC on GPU

e4c6a32

Allow ZMQ transport

06a083a

memfd + symlink instead

e39e7be

Allow different transport modes

063d6c1

lukemartinlogan linked an issue Feb 10, 2026 that may be closed by this pull request

Fix context transport primitives for the GPU #74

Closed

lukemartinlogan and others added 10 commits February 10, 2026 18:41

Fix compile errors on debug

585a909

More transports

2dd917d

Shm transfers work more consistently and follow lightbeam

d3b40d0

Remove debug

a1bd59d

Functinos again

73d9940

Use mmap intead of malloc to improve bdev performance and reduce page…

625ddda

… faults

Re-address resource utilization

f5f92ee

Fix BULK_EXPOSE

20582f2

lukemartinlogan and others added 7 commits February 12, 2026 19:20

Fixed crash due to use after free in future

16fe482

Use consume instead of owner

23ad918

Merge branch 'main' of github.com:iowarp/core into 74-fix-context-tra…

42ceaac

…nsport-primitives-for-the-gpu

More than one transport is working seemingly

9dad4d4

Removed AI prompts folder

826e0d3

lukemartinlogan marked this pull request as ready for review February 12, 2026 23:16

lukemartinlogan and others added 2 commits February 12, 2026 23:34

Compiler

601a82a

lukemartinlogan merged commit 790067c into main Feb 13, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

74 fix context transport primitives for the gpu #198

74 fix context transport primitives for the gpu #198

Uh oh!

lukemartinlogan commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

74 fix context transport primitives for the gpu #198

74 fix context transport primitives for the gpu #198

Uh oh!

Conversation

lukemartinlogan commented Feb 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant