Skip to content

docs(bench): nemotron GPU numbers on the GB10#14

Merged
mudler merged 1 commit into
masterfrom
docs/nemotron-gpu-benchmark
Jun 6, 2026
Merged

docs(bench): nemotron GPU numbers on the GB10#14
mudler merged 1 commit into
masterfrom
docs/nemotron-gpu-benchmark

Conversation

@localai-bot
Copy link
Copy Markdown
Collaborator

Summary

Adds the GPU benchmark for nemotron-3.5-asr-streaming-0.6b to the BENCHMARK.md nemotron section, measured on the NVIDIA GB10 (Grace-Blackwell, sm_121, CUDA 13).

Engine RTFx Speedup vs NeMo Agreement WER
NeMo (PyTorch GPU) 91.8 1.00x reference
parakeet.cpp f32 106.5 1.16x 0.0000%
parakeet.cpp q8_0 119.8 1.30x 0.0000%

Same clip and 7-pass median methodology as the existing CPU table, both engines on the device, transcripts byte-identical (WER 0). The margin is smaller than the CPU result (2.40-2.52x) because nemotron is RNN-T and NeMo's CUDA-graph greedy decode is fast on GPU; the big GPU wins in this repo are the TDT/hybrid models.

Notable: NeMo now runs natively on the GB10 via torch 2.11 + cu128, so this no longer needs the nvcr NeMo container.

🤖 Generated with Claude Code

parakeet.cpp vs NeMo on the NVIDIA GB10, same clip and methodology as the CPU
table: NeMo (PyTorch GPU) RTFx 91.8, parakeet.cpp f32 106.5 (1.16x), q8_0 119.8
(1.30x), transcripts byte-identical (WER 0). The margin is smaller than on CPU
because nemotron is RNN-T and NeMo's CUDA-graph greedy decode is fast there.
NeMo now runs natively on the GB10 via torch 2.11 plus cu128 (no nvcr container).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@mudler mudler merged commit 96c3177 into master Jun 6, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants