-
Notifications
You must be signed in to change notification settings - Fork 27k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
documents not being applied in apply_chat_tempplate #33421
Comments
I've been trying to pinpoint where Even if it did, I'm unsure how @Rocketknight1, I saw you implemented this in #30621—thanks for the great work, must have been a heck of a headache! |
Hi @selkordy @A-Duss, the cause of this problem is simply that However, one model that does support it is Command-R and Command-R+, using the |
@Rocketknight1 Thanks for the clarification! I think it might be helpful to use Command-R as the model in the example within the documentation then, while noting that not all models support this feature. I’m happy to assist with this if you’re short on time. |
@A-Duss sure! If you want to open a PR to update the chat template docs and tag me, that'd be great. However, we'd prefer to avoid
|
I see that when I look at the tokenizer_config there is no where it includes documents in the jinja config, and have a better understanding of how the library works. Thank you @A-Duss and @Rocketknight1 |
Noted, I'm working on it, I will open a PR once its looking decent. |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
System Info
transformers
version: 4.44.2Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm trying to apply documents in the chat template as per the chat_templating article, however it seems to be ignored. Passing documents has no effect on the chat template.
https://huggingface.co/docs/transformers/en/chat_templating
`
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("HuggingFaceH4/zephyr-7b-beta")
chat1 = [
{"role": "user", "content": "Which is bigger, the moon or the sun?"},
{"role": "assistant", "content": "The sun."}
]
chat2 = [
{"role": "user", "content": "Which is bigger, a virus or a bacterium?"},
{"role": "assistant", "content": "A bacterium."}
]
document1 = {
"title": "The Moon: Our Age-Old Foe",
"contents": "Man has always dreamed of destroying the moon. In this essay, I shall..."
}
document2 = {
"title": "The Sun: Our Age-Old Friend",
"contents": "Although often underappreciated, the sun provides several notable benefits..."
}
model_input = tokenizer.apply_chat_template([chat1,chat2], tokenize=False, add_generation_prompt=False, documents=[document1, document2])
print(model_input)
`
model_input does not include documents and hence ignored by model
['<|user|>\nWhich is bigger, the moon or the sun?\n<|assistant|>\nThe sun.\n', '<|user|>\nWhich is bigger, a virus or a bacterium?\n<|assistant|>\nA bacterium.\n']
Expected behavior
I expect the chat template to include the documents
The text was updated successfully, but these errors were encountered: