Skip to content

Enhance llama.cpp Backend to Consume Chat Templates from Upstream #7115

@localai-bot

Description

@localai-bot

Feature Request: Enable llama.cpp Backend to Use Upstream Chat Templates

Currently, when using the llama.cpp backend in LocalAI, chat templates must be defined inline within LocalAI's configuration or model metadata. This leads to:

  • Duplication of template definitions already present in the upstream llama.cpp project
  • Increased maintenance burden for new models
  • Inconsistency between LocalAI's templates and those in llama.cpp

Proposal

Enhance the llama.cpp backend in LocalAI to automatically detect and use the native chat template from the upstream llama.cpp project when importing a .gguf model, similar to how it currently works with the vLLM backend.

Specifically:

  1. Auto-Detection of Templates:

    • When a .gguf file is loaded, check for the presence of a chat_template field in the model's metadata
    • If present, use it directly from the upstream llama.cpp project's template registry
    • If not present, fall back to an inline template defined in LocalAI (maintaining backward compatibility)
  2. Support Both Modes:

    • Allow users to either:
      • Rely on the backend's native template (recommended for standard models like llama-3, mistral, phi3)
      • Define a custom template inline (for edge cases or non-standard models)
  3. Standard Template Registry:

    • Use the official llama.cpp template mapping (e.g., llama-3, mistral, codellama, qwen) as the source of truth
    • Keep this mapping up to date with upstream releases

Benefits

Example Workflow

  1. User imports meta-llama/Llama-3-8b-instruct from Hugging Face
  2. LocalAI detects the .gguf file and checks the model metadata
  3. The chat_template field is set to llama-3
  4. LocalAI automatically applies the native llama-3 chat template from the llama.cpp upstream
  5. Model loads successfully with correct prompt formatting

This enhancement would align LocalAI's behavior with industry standards and reduce friction in the model import and usage workflow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions