Apply Gemma 4 IT chat template in inference.py and C++ runner#19614
Apply Gemma 4 IT chat template in inference.py and C++ runner#19614mergennachin wants to merge 1 commit into
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/19614
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
There was a problem hiding this comment.
Pull request overview
Applies the Gemma 4 IT chat template to user prompts by default in both the Python inference.py and C++ runner, with an opt-out flag for pre-formatted input, to avoid degenerate output from the instruction-tuned model.
Changes:
- Add
apply_chat_templatehelper and--raw-promptflag ininference.py. - Add
--raw_promptflag and template-wrapping logic inmain.cpp(BOS prepended separately). - Document the auto-wrapping and opt-out flags in the README.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| examples/models/gemma4_31b/inference.py | Defines chat template, wraps prompt by default, adds --raw-prompt flag |
| examples/models/gemma4_31b/main.cpp | Adds --raw_prompt flag and wraps prompt text with the IT chat template |
| examples/models/gemma4_31b/README.md | Documents auto-wrapping behavior and skip flags for both runners |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@claude Review this PR |
|
Claude finished @mergennachin's task in 1m 31s —— View job Review of PR #19614
Overall this is a clean, well-scoped change. Two issues worth addressing before merge: 1.
|
Gemma 4 31B-IT is instruction-tuned and produces degenerate output
without the chat template wrapping. Auto-wrap --prompt with the IT
template (<bos><|turn>user\n{prompt}<turn|>\n<|turn>model\n
<|channel>thought\n<channel|>) by default; --raw-prompt / --raw_prompt
skips wrapping for pre-formatted input.
78ee61f to
5d5c26e
Compare
Gemma 4 31B-IT is instruction-tuned and produces degenerate output
without the chat template wrapping. Auto-wrap --prompt with the IT
template (<|turn>user\n{prompt}<turn|>\n<|turn>model\n
<|channel>thought\n<channel|>) by default; --raw-prompt / --raw_prompt
skips wrapping for pre-formatted input.