feat(llama.cpp): consolidate options and respect tokenizer template when enabled #7120

mudler · 2025-11-05T18:10:06Z

Description

Fixes: #7115
Fixes: #6117

This PR aims at two things:

llama.cpp backend to respect the use chat tokenizer setting which is already part of the YAML config of the model. This instructs LocalAI to piggyback to llama.cpp for templating, leaving inline templates as an options as well, but not strictly needed anymore.
This allows for instance a YAML config to be only like:

backend: llama-cpp
context_size: 8192
f16: true
mmap: true
name: qwen3-0.6b
parameters:
  model: Qwen3-0.6B.Q4_K_M.gguf

Which internally would automatically render as:

backend: llama-cpp
context_size: 8192
f16: true
mmap: true
name: qwen3-0.6b
parameters:
  model: Qwen3-0.6B.Q4_K_M.gguf

template:
  # Enable chat templating from llama.cpp
  use_tokenizer_template: true
function:
  grammar:
  # Disable LocalAI's engine for grammar rendering
    disable: true

moves some of the options that were passed by env as options. this allows to configure everything in the model YAML file and avoids generic envs for all loaded models.
- use_jinja / jinja: Enable Jinja2 template processing
- context_shift: Enable dynamic context window adjustment
- cache_ram: Set KV cache RAM limit (in MiB)
- parallel / n_parallel: Enable parallel request processing with continuous batching
- grpc_servers / rpc_servers: Configure distributed inference across multiple workers
  Example:
```
name: llama-model
backend: llama
parameters:
  model: model.gguf
options:
  - use_jinja:true
  - context_shift:true
  - cache_ram:4096
  - parallel:2
  - grpc_servers:localhost:50051,localhost:50052
```

netlify · 2025-11-05T18:10:12Z

✅ Deploy Preview for localai ready!

Name	Link
🔨 Latest commit	`234c2ae`
🔍 Latest deploy log	https://app.netlify.com/projects/localai/deploys/690e331f93e3aa000820921d
😎 Deploy Preview	https://deploy-preview-7120--localai.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

core/backend/llm.go

core/schema/message.go

+			resultData := []struct {
+				Text string `json:"text"`
+			}{}
+			json.Unmarshal(data, &resultData)


This allows to configure everything in the YAML file of the model rather than have global configurations Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…ating system to process messages Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the feat/llama-cpp-options branch from 8c47ffa to e66ea6e Compare November 6, 2025 08:23

mudler changed the title ~~Feat/llama cpp options~~ feat(llama.cpp): consolidate options and respect tokenizer template when enabled Nov 6, 2025

mudler force-pushed the feat/llama-cpp-options branch from e66ea6e to 607fd99 Compare November 6, 2025 09:55

github-advanced-security bot found potential problems Nov 6, 2025

View reviewed changes

core/backend/llm.go Fixed Show fixed Hide fixed

github-advanced-security bot found potential problems Nov 6, 2025

View reviewed changes

core/schema/message.go

resultData := []struct {

Text string `json:"text"`

}{}

json.Unmarshal(data, &resultData)

Check warning

Code scanning / gosec

Errors unhandled Warning

Errors unhandled

mudler force-pushed the feat/llama-cpp-options branch from 5e8e57b to ffe819c Compare November 6, 2025 21:56

mudler added the enhancement New feature or request label Nov 7, 2025

mudler added 13 commits November 7, 2025 18:48

feat(llama.cpp): expose env vars as options for consistency

7b96336

This allows to configure everything in the YAML file of the model rather than have global configurations Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

feat(llama.cpp): respect usetokenizertemplate and use llama.cpp templ…

657676e

…ating system to process messages Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

WIP

0ff7c03

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Detect template exists if use tokenizer template is enabled

1d8a9ee

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Better recognization of chat

4967919

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Fixes to support tool calls while using templates from tokenizer

6efc93e

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Fixups

3a488b7

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Drop template guessing, fix passing tools to tokenizer

ca0aec4

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Extract grammar and other options from chat template, add schema struct

c988a16

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

WIP

7c5ccba

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

WIP

270f70a

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Automatically set use_jinja

0238dcf

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Cleanups, identify by default gguf models for chat

c7379aa

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the feat/llama-cpp-options branch 2 times, most recently from 290063e to 9249f4f Compare November 7, 2025 17:56

Update docs

234c2ae

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

mudler force-pushed the feat/llama-cpp-options branch from 9249f4f to 234c2ae Compare November 7, 2025 17:57

mudler mentioned this pull request Nov 7, 2025

jinja_templates not working #6117

Closed

mudler merged commit 02cc8cb into master Nov 7, 2025
38 checks passed

mudler deleted the feat/llama-cpp-options branch November 7, 2025 20:23

BrewTestBot mentioned this pull request Nov 26, 2025

localai 3.8.0 Homebrew/homebrew-core#256134

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat(llama.cpp): consolidate options and respect tokenizer template when enabled #7120

feat(llama.cpp): consolidate options and respect tokenizer template when enabled #7120

Uh oh!

mudler commented Nov 5, 2025 •

edited

Loading

Uh oh!

netlify bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

Check warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat(llama.cpp): consolidate options and respect tokenizer template when enabled #7120

feat(llama.cpp): consolidate options and respect tokenizer template when enabled #7120

Uh oh!

Conversation

mudler commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

netlify bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for localai ready!

Uh oh!

Uh oh!

Check warning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mudler commented Nov 5, 2025 •

edited

Loading

netlify bot commented Nov 5, 2025 •

edited

Loading