Enhance llama.cpp Backend to Consume Chat Templates from Upstream

## Feature Request: Enable llama.cpp Backend to Use Upstream Chat Templates

Currently, when using the `llama.cpp` backend in LocalAI, chat templates must be defined inline within LocalAI's configuration or model metadata. This leads to:

- Duplication of template definitions already present in the upstream `llama.cpp` project
- Increased maintenance burden for new models
- Inconsistency between LocalAI's templates and those in `llama.cpp`

### Proposal

Enhance the `llama.cpp` backend in LocalAI to automatically detect and use the native chat template from the upstream `llama.cpp` project when importing a `.gguf` model, similar to how it currently works with the `vLLM` backend.

Specifically:

1. **Auto-Detection of Templates**:
   - When a `.gguf` file is loaded, check for the presence of a `chat_template` field in the model's metadata
   - If present, use it directly from the upstream `llama.cpp` project's template registry
   - If not present, fall back to an inline template defined in LocalAI (maintaining backward compatibility)

2. **Support Both Modes**:
   - Allow users to either:
     - Rely on the backend's native template (recommended for standard models like `llama-3`, `mistral`, `phi3`)
     - Define a custom template inline (for edge cases or non-standard models)

3. **Standard Template Registry**:
   - Use the official `llama.cpp` template mapping (e.g., `llama-3`, `mistral`, `codellama`, `qwen`) as the source of truth
   - Keep this mapping up to date with upstream releases

### Benefits
- Reduced duplication and maintenance overhead
- Improved consistency with upstream `llama.cpp` standards
- Simpler and more reliable model imports
- Better compatibility with models from Hugging Face and other sources
- Enables seamless integration with the proposed flexible import system (Issue #7114)

### Example Workflow
1. User imports `meta-llama/Llama-3-8b-instruct` from Hugging Face
2. LocalAI detects the `.gguf` file and checks the model metadata
3. The `chat_template` field is set to `llama-3`
4. LocalAI automatically applies the native `llama-3` chat template from the `llama.cpp` upstream
5. Model loads successfully with correct prompt formatting

This enhancement would align LocalAI's behavior with industry standards and reduce friction in the model import and usage workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhance llama.cpp Backend to Consume Chat Templates from Upstream #7115

Feature Request: Enable llama.cpp Backend to Use Upstream Chat Templates

Proposal

Benefits

Example Workflow

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Enhance llama.cpp Backend to Consume Chat Templates from Upstream #7115

Description

Feature Request: Enable llama.cpp Backend to Use Upstream Chat Templates

Proposal

Benefits

Example Workflow

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions