Skip to content

v0.4.0: GPU Infrastructure Generalization & Python Bindings

Choose a tag to compare

@mivertowski mivertowski released this 25 Jan 21:23
· 133 commits to main since this release

Highlights

This release extracts ~7,000+ lines of proven GPU infrastructure from RustGraph into RingKernel, making these capabilities available to all RingKernel users.

New: Python Bindings (ringkernel-python)

PyO3-based Python wrapper with full async/await support:

import ringkernel
import asyncio

async def main():
    runtime = await ringkernel.RingKernel.create(backend="cpu")
    kernel = await runtime.launch("processor", ringkernel.LaunchOptions())
    await kernel.terminate()
    await runtime.shutdown()

asyncio.run(main())

Features:

  • Async/await with sync fallbacks
  • HLC timestamps and K2K messaging
  • CUDA device enumeration and GPU memory pool management
  • Benchmark framework with regression detection
  • Hybrid CPU/GPU dispatcher with adaptive thresholds
  • Resource guard for memory limit enforcement
  • Type stubs for IDE support

New: PTX Compilation Cache

Disk-based PTX caching for faster kernel loading with SHA-256 content hashing and compute capability awareness.

New: GPU Stratified Memory Pool

Size-stratified GPU VRAM pool with 6 size classes (256B-256KB), O(1) allocation from free lists.

New: Multi-Stream Execution Manager

Multi-stream CUDA execution for compute/transfer overlap with event-based synchronization.

New: Benchmark Framework

Comprehensive benchmarking with regression detection, baseline comparison, and multiple report formats (Markdown, JSON, LaTeX).

New: Hybrid CPU-GPU Dispatcher

Intelligent workload routing with adaptive threshold learning between CPU and GPU execution.

New: Resource Guard

Memory limit enforcement with safety margins and RAII reservation patterns.

New: Kernel Mode Selector

Intelligent kernel launch configuration based on workload profile and GPU architecture.


See CHANGELOG.md for full details.