Skip to content

Conversation

jongames
Copy link
Contributor

Description

I discovered a bug with how llamacpp's grcp backend is implemented in LocalAI. Basically the batch parameter is exposed in the LocalAI model yaml file but this only changes the n_batch parameter in llama.cpp. When trying to run a reranking model in llama.cpp, it sometimes complains that the prompt input is larger than the maximum physical batch size. The default is 512 tokens and set by n_ubatch, not by n_batch. I fixed it by making the batch value from the model yaml file change both the n_batch and n_ubatch parameters in llamacpp.

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

Copy link

netlify bot commented Sep 25, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 6ce55a9
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/68d5c1e1a474e800089eedcb
😎 Deploy Preview https://deploy-preview-6344--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

…ackend

Signed-off-by: JonGames <18472148+jongames@users.noreply.github.com>
Copy link
Owner

@mudler mudler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, thanks!

@mudler mudler enabled auto-merge (squash) September 25, 2025 22:28
@mudler mudler merged commit f2b9452 into mudler:master Sep 25, 2025
35 checks passed
@mudler mudler added the bug Something isn't working label Oct 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants