Skip to content

Releases: ReaNAiveD/inferaived

v0.1.0 — Initial release

07 Jun 14:07

Choose a tag to compare

Pre-release

First public release of inferaived, an LLM inference engine written in Rust on top of wgpu.

Status: early / experimental. APIs will change. Expect rough edges.

Highlights

  • GPU-resident inference on wgpu 29 — runs on Vulkan, Metal, DX12, and browser WebGPU; no CUDA or external runtime required.
  • Supported models: Qwen3.5 and MiniCPM5 (including MiniCPM5's parallel hybrid layer stack). Weights load directly from Hugging Face safetensors.
  • Custom WGSL kernels: matmul, RoPE, RMSNorm, masked attention, mamba scan, delta rule, and samplers.
  • GPU KV cache with a continuous decode loop and argmax sampler.
  • Runnable examples: generate, chat_qwen35, chat_minicpm5, parallel_minicpm5, bench_decode.

Install

[dependencies]
inferaived = "0.1"

Or:

cargo add inferaived

Links

License

MIT