spec: add backend sampling support for eagle3 by ruixiang63 · Pull Request #24655 · ggml-org/llama.cpp

ruixiang63 · 2026-06-15T14:21:24Z

Overview

Following #23287 to add backend sampling support for eagle3.

Performance results on SpeedBench

eagle3 baseline

python tools/server/bench/speed-bench/speed_bench.py --url localhost:8080 --bench qualitative --category all --osl 512 --concurrency 1 --limit 5 --output eagle3-qwen3-baseline.json

Summary (elapsed=353.27s)
category       samples  avg_prompt_t/s  avg_pred_t/s  avg_latency  accept_rate
-------------  -------  --------------  ------------  -----------  -----------
coding         5        1804.50         96.32         6.580s       0.4810     
humanities     5        446.18          82.61         7.500s       0.3669     
math           5        46.57           82.13         6.273s       0.3605     
qa             5        605.86          92.40         4.808s       0.4444     
rag            5        4787.48         97.18         6.307s       0.5010     
reasoning      5        201.42          82.05         6.298s       0.3574     
stem           5        47.51           82.29         6.259s       0.3605     
writing        5        4234.16         91.77         6.931s       0.4415     
multilingual   5        2644.99         95.54         5.463s       0.4741     
summarization  5        628.52          87.96         3.667s       0.4104     
roleplay       5        2782.64         89.17         10.565s      0.4230     
overall        55       1657.26         89.04         6.423s       0.4185

eagle3 with backend sampling

python tools/server/bench/speed-bench/speed_bench.py --url localhost:8080 --bench qualitative --category all --osl 512 --concurrency 1 --limit 5 --output eagle3-qwen3-backend-sampling.json

Summary (elapsed=351.79s)
category       samples  avg_prompt_t/s  avg_pred_t/s  avg_latency  accept_rate
-------------  -------  --------------  ------------  -----------  -----------
coding         5        1817.48         97.18         6.577s       0.4810     
humanities     5        455.30          83.50         7.414s       0.3669     
math           5        46.17           83.14         6.196s       0.3605     
qa             5        622.08          93.79         4.736s       0.4444     
rag            5        3754.37         94.52         6.738s       0.5010     
reasoning      5        202.27          82.78         6.243s       0.3574     
stem           5        47.31           83.19         6.191s       0.3605     
writing        5        4273.63         93.15         6.835s       0.4415     
multilingual   5        2613.75         96.83         5.393s       0.4741     
summarization  5        642.39          89.23         3.602s       0.4104     
roleplay       5        2700.37         90.38         10.431s      0.4230     
overall        55       1561.37         89.79         6.396s       0.4185

comparison

python tools/server/bench/speed-bench/speed_bench_compare.py --baseline eagle3-qwen3-baseline.json --speculative eagle3-qwen3-backend-sampling.json

Comparison: baseline=eagle3-qwen3-baseline.json speculative=eagle3-qwen3-backend-sampling.json
category       base_avg_pred_t/s  spec_avg_pred_t/s  decode_speedup  base_avg_latency  spec_avg_latency  latency_speedup  accept_rate
-------------  -----------------  -----------------  --------------  ----------------  ----------------  ---------------  -----------
coding         96.32              95.56              0.99x           6.580s            6.668s            0.99x            0.4810     
humanities     82.61              83.96              1.02x           7.500s            7.381s            1.02x            0.3669     
math           82.13              83.57              1.02x           6.273s            6.163s            1.02x            0.3605     
qa             92.40              93.98              1.02x           4.808s            4.725s            1.02x            0.4444     
rag            97.18              98.35              1.01x           6.307s            6.329s            1.00x            0.5010     
reasoning      82.05              83.00              1.01x           6.298s            6.225s            1.01x            0.3574     
stem           82.29              83.26              1.01x           6.259s            6.187s            1.01x            0.3605     
writing        91.77              92.99              1.01x           6.931s            6.843s            1.01x            0.4415     
multilingual   95.54              96.79              1.01x           5.463s            5.431s            1.01x            0.4741     
summarization  87.96              89.06              1.01x           3.667s            3.612s            1.02x            0.4104     
roleplay       89.17              90.50              1.01x           10.565s           10.430s           1.01x            0.4230     
overall        89.04              90.09              1.01x           6.423s            6.363s            1.01x            0.4185

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

spec: add backend sampling support for eagle3

a38dc1a

ruixiang63 requested a review from a team as a code owner June 15, 2026 14:21

ggerganov self-assigned this Jun 15, 2026

ggerganov approved these changes Jun 16, 2026

View reviewed changes

ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 16, 2026

ggerganov merged commit a182490 into ggml-org:master Jun 16, 2026
25 checks passed

ruixiang63 deleted the eagle3-backend-sampling branch June 16, 2026 13:29

papamoose pushed a commit to papamoose/llama.cpp that referenced this pull request Jun 27, 2026

spec: add backend sampling support for eagle3 (ggml-org#24655)

c57d186

ruixiang63 mentioned this pull request Jun 30, 2026

spec: add backend sampling for DFlash #25180

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

spec: add backend sampling support for eagle3#24655

spec: add backend sampling support for eagle3#24655
ggerganov merged 1 commit into
ggml-org:masterfrom
ruixiang63:eagle3-backend-sampling

ruixiang63 commented Jun 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ruixiang63 commented Jun 15, 2026

Overview

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants