-
Notifications
You must be signed in to change notification settings - Fork 691
Description
🚀 The feature, motivation and pitch
Context
Many LLMs/VLMs today rely on structured prompting using chat_template.jinja
file defined in HuggingFace. These templates are essential for formatting multi-turn conversations and aligning with model pretraining/fine tuning.
Currently, ExecuTorch does not support chat_template.jinja
at runtime, which limits the ability to run chat-style LLMs out of the box i.e., without requiring host-side preprocessing or manual prompt formatting.
Motivation
- Consistency: Ensures the same prompt formatting used during calibration/inference on HuggingFace is preserved in runtime.
- Portability: Avoids duplicating chat template logic in runtime.
- Usability: Enables developers to pass structured chat messages (e.g., role/content pairs or system/content) directly to the runtime without manual formatting.
Details
I’m exploring whether we could integrate Jinja2Cpp as a third-party dependency in ExecuTorch to support chat_template.jinja
at runtime.
This would enable structured prompting for chat-style LLMs directly on-device, without requiring host-side preprocessing or manual prompt formatting.
References
- Jinja2Cpp GitHub: https://github.com/jinja2cpp/Jinja2Cpp
- HuggingFace Chat Templates: https://huggingface.co/docs/transformers/main/chat_templating_multimodal
Alternatives
No response
Additional context
No response
RFC (Optional)
No response