server: expose speculative decoding counters in Prometheus metrics by boxcee · Pull Request #23328 · ggml-org/llama.cpp

boxcee · 2026-05-19T08:26:42Z

Adds two new counters to the /metrics Prometheus endpoint:

llamacpp:spec_tokens_drafted_total
llamacpp:spec_tokens_accepted_total

Divide accepted by drafted to get acceptance rate.

The counters reuse n_draft_total / n_draft_accepted introduced in #22673. Those fields are already tracked per slot during speculative decoding. No new tracking logic, just plumbing those values to the metrics handler via server_metrics::on_prediction().

Without this, the only way to watch acceptance rate is to grep the server logs. Grafana and similar tools can now track it directly.

Changes:

server_metrics: two new uint64_t fields, incremented in on_prediction()
server_task_result_metrics: two new fields to carry the values to the HTTP handler
Prometheus handler: two new counter entries in all_metrics_def
README: metrics table updated

AI disclosure: I used Claude Code to help locate the relevant code paths and scaffold the initial implementation. I've reviewed every line and can explain it.

ggml-gh-bot · 2026-05-19T08:31:50Z

Hi @boxcee, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.

Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

Adds two new counters to the /metrics endpoint: - llamacpp:spec_tokens_drafted_total - llamacpp:spec_tokens_accepted_total Accumulated in server_metrics::on_prediction() from the per-slot n_draft_total and n_draft_accepted fields. Divide accepted by drafted to get acceptance rate.

boxcee force-pushed the feat/prometheus-spec-metrics branch from e3f5d53 to d735725 Compare May 19, 2026 08:47

boxcee force-pushed the feat/prometheus-spec-metrics branch from d735725 to 122fb87 Compare May 19, 2026 08:59

github-actions Bot added examples server labels May 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: expose speculative decoding counters in Prometheus metrics#23328

server: expose speculative decoding counters in Prometheus metrics#23328
boxcee wants to merge 1 commit into
ggml-org:masterfrom
boxcee:feat/prometheus-spec-metrics

boxcee commented May 19, 2026 •

edited

Loading

Uh oh!

ggml-gh-bot Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

boxcee commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggml-gh-bot Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

boxcee commented May 19, 2026 •

edited

Loading