Skip to content

Releases: Industrial-Algebra/Borsalino

Borsalino v0.2.1

11 Jun 20:35
54b84b0

Choose a tag to compare

Borsalino v0.2.1

Hotfix release — removes accidentally published early-research content.
No other changes from v0.2.0.

Borsalino v0.2.0

New Features

  • Async dispatchdispatch_async() returns Pulse handle for non-blocking GPU execution. VkFence (Vulkan), MTLCommandBuffer (Metal). Drop performs implicit join.
  • Persistent bufferscreate_device_buffer() keeps data on GPU across dispatches. VRAM on discrete GPUs, zero-copy on unified memory.
  • GPU timestampsgpu.timestamp() for profiling. Vulkan: vkCmdWriteTimestamp query pool.
  • 2D/3D tiled dispatch — WGSL shared memory + barriers for tiled matmul.
  • Candle integration — custom element-wise GPU kernel pattern for complementing ML frameworks.

Benchmarks

Platform Tiled Matmul 8192 Batched SAXPY 1M Dispatch
GB10 (RTX Spark) 1,403 GFLOPS 372 GFLOPS 0.4 µs
RTX 5080 523 GFLOPS 477 GFLOPS 0.5 µs
M3 Pro 186 GFLOPS 42 GFLOPS 142 µs

Breaking Changes

None. All additions are backward-compatible trait methods with default implementations.

Full details: CHANGELOG.md

Borsalino v0.1.0

04 Jun 00:50

Choose a tag to compare

Borsalino v0.1.0

Thin GPU compute abstraction for the Industrial Algebra ecosystem.
One trait, two backends (Metal + Vulkan), zero ceremony.

Highlights

  • Vulkan backend — full GpuBackend trait via ash 0.38 FFI. WGSL->SPIR-V via naga
  • WGSL shader language — write once, run on Metal + Vulkan
  • Device-local memory — auto-detects VRAM vs unified, RTX 5080: 15x improvement
  • Batched dispatch — dispatch_many() amortises overhead, 0.5 us/dispatch, 577 GFLOPS
  • Metal Apple Silicon M3 — seven root causes resolved
  • Dual license — AGPL-3.0 + commercial

Benchmarks

Platform SAXPY 16M Batched SAXPY 1M Per-dispatch
RTX 5080 92 GFLOPS 577 GFLOPS 0.5 us
GB10 49 GFLOPS 208 GFLOPS 1.0 us
M3 Pro 30 GFLOPS 136 us
AMD iGPU 17 GFLOPS 45 us

Full details: CHANGELOG.md