-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Closed
Labels
Description
Feature Request: Enable llama.cpp Backend to Use Upstream Chat Templates
Currently, when using the llama.cpp backend in LocalAI, chat templates must be defined inline within LocalAI's configuration or model metadata. This leads to:
- Duplication of template definitions already present in the upstream
llama.cppproject - Increased maintenance burden for new models
- Inconsistency between LocalAI's templates and those in
llama.cpp
Proposal
Enhance the llama.cpp backend in LocalAI to automatically detect and use the native chat template from the upstream llama.cpp project when importing a .gguf model, similar to how it currently works with the vLLM backend.
Specifically:
-
Auto-Detection of Templates:
- When a
.gguffile is loaded, check for the presence of achat_templatefield in the model's metadata - If present, use it directly from the upstream
llama.cppproject's template registry - If not present, fall back to an inline template defined in LocalAI (maintaining backward compatibility)
- When a
-
Support Both Modes:
- Allow users to either:
- Rely on the backend's native template (recommended for standard models like
llama-3,mistral,phi3) - Define a custom template inline (for edge cases or non-standard models)
- Rely on the backend's native template (recommended for standard models like
- Allow users to either:
-
Standard Template Registry:
- Use the official
llama.cpptemplate mapping (e.g.,llama-3,mistral,codellama,qwen) as the source of truth - Keep this mapping up to date with upstream releases
- Use the official
Benefits
- Reduced duplication and maintenance overhead
- Improved consistency with upstream
llama.cppstandards - Simpler and more reliable model imports
- Better compatibility with models from Hugging Face and other sources
- Enables seamless integration with the proposed flexible import system (Issue Enhance Model Import Flexibility: Support Backend & Quantization Selection Across Sources (HF, Ollama, Files, OCI) #7114)
Example Workflow
- User imports
meta-llama/Llama-3-8b-instructfrom Hugging Face - LocalAI detects the
.gguffile and checks the model metadata - The
chat_templatefield is set tollama-3 - LocalAI automatically applies the native
llama-3chat template from thellama.cppupstream - Model loads successfully with correct prompt formatting
This enhancement would align LocalAI's behavior with industry standards and reduce friction in the model import and usage workflow.