Skip to content

Conversation

@mudler
Copy link
Owner

@mudler mudler commented Nov 5, 2025

Description

Fixes: #7115
Fixes: #6117

This PR aims at two things:

  • llama.cpp backend to respect the use chat tokenizer setting which is already part of the YAML config of the model. This instructs LocalAI to piggyback to llama.cpp for templating, leaving inline templates as an options as well, but not strictly needed anymore.
    This allows for instance a YAML config to be only like:

    backend: llama-cpp
    context_size: 8192
    f16: true
    mmap: true
    name: qwen3-0.6b
    parameters:
      model: Qwen3-0.6B.Q4_K_M.gguf

    Which internally would automatically render as:

    backend: llama-cpp
    context_size: 8192
    f16: true
    mmap: true
    name: qwen3-0.6b
    parameters:
      model: Qwen3-0.6B.Q4_K_M.gguf
    
    template:
      # Enable chat templating from llama.cpp
      use_tokenizer_template: true
    function:
      grammar:
      # Disable LocalAI's engine for grammar rendering
        disable: true 
  • moves some of the options that were passed by env as options. this allows to configure everything in the model YAML file and avoids generic envs for all loaded models.

    • use_jinja / jinja: Enable Jinja2 template processing
    • context_shift: Enable dynamic context window adjustment
    • cache_ram: Set KV cache RAM limit (in MiB)
    • parallel / n_parallel: Enable parallel request processing with continuous batching
    • grpc_servers / rpc_servers: Configure distributed inference across multiple workers
      Example:
      name: llama-model
      backend: llama
      parameters:
        model: model.gguf
      options:
        - use_jinja:true
        - context_shift:true
        - cache_ram:4096
        - parallel:2
        - grpc_servers:localhost:50051,localhost:50052

@netlify
Copy link

netlify bot commented Nov 5, 2025

Deploy Preview for localai ready!

Name Link
🔨 Latest commit 234c2ae
🔍 Latest deploy log https://app.netlify.com/projects/localai/deploys/690e331f93e3aa000820921d
😎 Deploy Preview https://deploy-preview-7120--localai.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@mudler mudler force-pushed the feat/llama-cpp-options branch from 8c47ffa to e66ea6e Compare November 6, 2025 08:23
@mudler mudler changed the title Feat/llama cpp options feat(llama.cpp): consolidate options and respect tokenizer template when enabled Nov 6, 2025
@mudler mudler force-pushed the feat/llama-cpp-options branch from e66ea6e to 607fd99 Compare November 6, 2025 09:55
resultData := []struct {
Text string `json:"text"`
}{}
json.Unmarshal(data, &resultData)

Check warning

Code scanning / gosec

Errors unhandled Warning

Errors unhandled
@mudler mudler force-pushed the feat/llama-cpp-options branch from 5e8e57b to ffe819c Compare November 6, 2025 21:56
@mudler mudler added the enhancement New feature or request label Nov 7, 2025
mudler added 13 commits November 7, 2025 18:48
This allows to configure everything in the YAML file of the model rather
than have global configurations

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
…ating system to process messages

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/llama-cpp-options branch 2 times, most recently from 290063e to 9249f4f Compare November 7, 2025 17:56
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
@mudler mudler force-pushed the feat/llama-cpp-options branch from 9249f4f to 234c2ae Compare November 7, 2025 17:57
@mudler mudler merged commit 02cc8cb into master Nov 7, 2025
38 checks passed
@mudler mudler deleted the feat/llama-cpp-options branch November 7, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhance llama.cpp Backend to Consume Chat Templates from Upstream jinja_templates not working

2 participants