Skip to content

Releases: gittensor-ai-lab/sparkinfer

v0.1.0 — Qwen3-MoE on Blackwell (prebuilt binaries)

22 Jun 21:49

Choose a tag to compare

First release of the consolidated sparkinfer monorepo (kernels + MoE + runtime + benchmarks).

Runs Qwen3-30B-A3B Q4_K_M end-to-end on NVIDIA Blackwell. Verified on an RTX 5090 (sm_120, CUDA 13): builds clean, ctest 5/5, compute-sanitizer 0 errors, 163.88 tok/s decode at 21.4 GB, and 100% top-1 token agreement with llama.cpp (KL ≈ 0.14 nats) — accuracy preserved.

Prebuilt binaries (attached)

sparkinfer-v0.1.0-linux-x86_64-cuda13-sm120.tar.gzqwen3_gguf_bench / _generate / _score + bundled libs.

  • Built for sm_120 (RTX 5090, RTX PRO 6000), CUDA 13.0, glibc 2.39 (Ubuntu 24.04).
  • bench/scripts/bench.sh and accuracy.sh fetch and use these automatically, and fall back to a source build if the prebuilt is incompatible (different arch like sm_121, older driver/CUDA, older glibc).
  • Manual use: tar xzf … && ./sparkinfer-bin/run qwen3_gguf_bench model.gguf 128.

Reproduce

bench/scripts/bench.sh --download (decode tok/s) · --compare (vs llama.cpp) · accuracy.sh --download (token-match / KL / PPL). Needs CUDA 12.8+.