Skip to content

Conversation

@ngxson
Copy link
Collaborator

@ngxson ngxson commented Nov 26, 2025

On server, we want to enable jinja by default to allow tool calling and default system prompts. More and more models require this, so I think it's finally time to make it enabled by default.

However, we don't want to enable this for other examples (like llama-cli or llama-run), because these examples cannot yet handle rolling back tokens. This can happen when the chat template wants to modify pass tokens, for example, one can delete the reasoning content from the formatted chat.

This PR also update the auto-generated docs via llama-gen-docs command

@ngxson ngxson requested a review from ggerganov as a code owner November 26, 2025 16:15
@CISC
Copy link
Collaborator

CISC commented Nov 26, 2025

However, we don't want to enable this for other examples (like llama-cli or llama-run), because these examples cannot yet handle rolling back tokens. This can happen when the chat template wants to modify pass tokens, for example, one can delete the reasoning content from the formatted chat.

Supported in #16603 should probably make jinja default there as well.

@ngxson ngxson merged commit e509411 into ggml-org:master Nov 27, 2025
75 of 76 checks passed
@rankaiyx
Copy link

"disable jinja template for chat (default: enabled)"

Is there a mistake here?

@rankaiyx
Copy link

Hmm, that's a bit ambiguous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants