docs(bench): nemotron GPU numbers on the GB10 by localai-bot · Pull Request #14 · mudler/parakeet.cpp

localai-bot · 2026-06-06T13:26:53Z

Summary

Adds the GPU benchmark for nemotron-3.5-asr-streaming-0.6b to the BENCHMARK.md nemotron section, measured on the NVIDIA GB10 (Grace-Blackwell, sm_121, CUDA 13).

Engine	RTFx	Speedup vs NeMo	Agreement WER
NeMo (PyTorch GPU)	91.8	1.00x	reference
parakeet.cpp f32	106.5	1.16x	0.0000%
parakeet.cpp q8_0	119.8	1.30x	0.0000%

Same clip and 7-pass median methodology as the existing CPU table, both engines on the device, transcripts byte-identical (WER 0). The margin is smaller than the CPU result (2.40-2.52x) because nemotron is RNN-T and NeMo's CUDA-graph greedy decode is fast on GPU; the big GPU wins in this repo are the TDT/hybrid models.

Notable: NeMo now runs natively on the GB10 via torch 2.11 + cu128, so this no longer needs the nvcr NeMo container.

🤖 Generated with Claude Code

parakeet.cpp vs NeMo on the NVIDIA GB10, same clip and methodology as the CPU table: NeMo (PyTorch GPU) RTFx 91.8, parakeet.cpp f32 106.5 (1.16x), q8_0 119.8 (1.30x), transcripts byte-identical (WER 0). The margin is smaller than on CPU because nemotron is RNN-T and NeMo's CUDA-graph greedy decode is fast there. NeMo now runs natively on the GB10 via torch 2.11 plus cu128 (no nvcr container). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

mudler merged commit 96c3177 into master Jun 6, 2026
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(bench): nemotron GPU numbers on the GB10#14

docs(bench): nemotron GPU numbers on the GB10#14
mudler merged 1 commit into
masterfrom
docs/nemotron-gpu-benchmark

localai-bot commented Jun 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

localai-bot commented Jun 6, 2026

Summary

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants