v0.4.0: GPU Infrastructure Generalization & Python Bindings
Highlights
This release extracts ~7,000+ lines of proven GPU infrastructure from RustGraph into RingKernel, making these capabilities available to all RingKernel users.
New: Python Bindings (ringkernel-python)
PyO3-based Python wrapper with full async/await support:
import ringkernel
import asyncio
async def main():
runtime = await ringkernel.RingKernel.create(backend="cpu")
kernel = await runtime.launch("processor", ringkernel.LaunchOptions())
await kernel.terminate()
await runtime.shutdown()
asyncio.run(main())Features:
- Async/await with sync fallbacks
- HLC timestamps and K2K messaging
- CUDA device enumeration and GPU memory pool management
- Benchmark framework with regression detection
- Hybrid CPU/GPU dispatcher with adaptive thresholds
- Resource guard for memory limit enforcement
- Type stubs for IDE support
New: PTX Compilation Cache
Disk-based PTX caching for faster kernel loading with SHA-256 content hashing and compute capability awareness.
New: GPU Stratified Memory Pool
Size-stratified GPU VRAM pool with 6 size classes (256B-256KB), O(1) allocation from free lists.
New: Multi-Stream Execution Manager
Multi-stream CUDA execution for compute/transfer overlap with event-based synchronization.
New: Benchmark Framework
Comprehensive benchmarking with regression detection, baseline comparison, and multiple report formats (Markdown, JSON, LaTeX).
New: Hybrid CPU-GPU Dispatcher
Intelligent workload routing with adaptive threshold learning between CPU and GPU execution.
New: Resource Guard
Memory limit enforcement with safety margins and RAII reservation patterns.
New: Kernel Mode Selector
Intelligent kernel launch configuration based on workload profile and GPU architecture.
See CHANGELOG.md for full details.