Releases: gittensor-ai-lab/sparkinfer
Releases · gittensor-ai-lab/sparkinfer
v0.1.0 — Qwen3-MoE on Blackwell (prebuilt binaries)
First release of the consolidated sparkinfer monorepo (kernels + MoE + runtime + benchmarks).
Runs Qwen3-30B-A3B Q4_K_M end-to-end on NVIDIA Blackwell. Verified on an RTX 5090 (sm_120, CUDA 13): builds clean, ctest 5/5, compute-sanitizer 0 errors, 163.88 tok/s decode at 21.4 GB, and 100% top-1 token agreement with llama.cpp (KL ≈ 0.14 nats) — accuracy preserved.
Prebuilt binaries (attached)
sparkinfer-v0.1.0-linux-x86_64-cuda13-sm120.tar.gz — qwen3_gguf_bench / _generate / _score + bundled libs.
- Built for sm_120 (RTX 5090, RTX PRO 6000), CUDA 13.0, glibc 2.39 (Ubuntu 24.04).
bench/scripts/bench.shandaccuracy.shfetch and use these automatically, and fall back to a source build if the prebuilt is incompatible (different arch like sm_121, older driver/CUDA, older glibc).- Manual use:
tar xzf … && ./sparkinfer-bin/run qwen3_gguf_bench model.gguf 128.
Reproduce
bench/scripts/bench.sh --download (decode tok/s) · --compare (vs llama.cpp) · accuracy.sh --download (token-match / KL / PPL). Needs CUDA 12.8+.