fix: reranking models limited to 512 tokens in llama.cpp backend #6344

jongames · 2025-09-25T18:05:29Z

Description

I discovered a bug with how llamacpp's grcp backend is implemented in LocalAI. Basically the batch parameter is exposed in the LocalAI model yaml file but this only changes the n_batch parameter in llama.cpp. When trying to run a reranking model in llama.cpp, it sometimes complains that the prompt input is larger than the maximum physical batch size. The default is 512 tokens and set by n_ubatch, not by n_batch. I fixed it by making the batch value from the model yaml file change both the n_batch and n_ubatch parameters in llamacpp.

Notes for Reviewers

Signed commits

Yes, I signed my commits.

netlify · 2025-09-25T18:05:34Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`6ce55a9`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/68d5c1e1a474e800089eedcb
😎 Deploy Preview	https://deploy-preview-6344--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

…ackend Signed-off-by: JonGames <18472148+jongames@users.noreply.github.com>

mudler

Good catch, thanks!

Fix reranking models being limited to 512 tokens input in llama.cpp b…

6ce55a9

…ackend Signed-off-by: JonGames <18472148+jongames@users.noreply.github.com>

jongames force-pushed the master branch from 92d7190 to 6ce55a9 Compare September 25, 2025 22:27

mudler approved these changes Sep 25, 2025

View reviewed changes

mudler enabled auto-merge (squash) September 25, 2025 22:28

mudler merged commit f2b9452 into mudler:master Sep 25, 2025
35 checks passed

mudler added the bug Something isn't working label Oct 3, 2025

BrewTestBot mentioned this pull request Oct 3, 2025

localai 3.6.0 Homebrew/homebrew-core#246760

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix: reranking models limited to 512 tokens in llama.cpp backend #6344

fix: reranking models limited to 512 tokens in llama.cpp backend #6344

Uh oh!

jongames commented Sep 25, 2025

Uh oh!

netlify bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

mudler left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fix: reranking models limited to 512 tokens in llama.cpp backend #6344

fix: reranking models limited to 512 tokens in llama.cpp backend #6344

Uh oh!

Conversation

jongames commented Sep 25, 2025

Uh oh!

netlify bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

mudler left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

netlify bot commented Sep 25, 2025 •

edited

Loading