Skip to content

Conversation

@zucchini-nlp
Copy link
Member

What does this PR do?

FIxes #33459. The chat template was not being saved after the last PR for adding LLaVa-OneVision, because I fixed a small bug when chat template was being saved in both places: in its own file and in processor config.

In this RP we never serialize chat template in the dict so that it is not saved and when saving we check for self attributes. Yer, I am not 100% sure if it is ok to not add the template to the dict. Or we should pop template when serializing to json, in to_json_string maybe, WDYT?

@zucchini-nlp zucchini-nlp changed the title Chat template Chat template: save and load correctly for processors Sep 13, 2024
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

Comment on lines 54 to 58
for key, value in self.prepare_processor_dict().items():
# chat templates are popped from dict
if key != "chat_template":
self.assertEqual(obj[key], value)
self.assertEqual(getattr(processor, key, None), value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also assert the chat template is popped from the dict.

Suggested change
for key, value in self.prepare_processor_dict().items():
# chat templates are popped from dict
if key != "chat_template":
self.assertEqual(obj[key], value)
self.assertEqual(getattr(processor, key, None), value)
for key, value in self.prepare_processor_dict().items():
# chat templates are popped from dict
self.assertFalse(key == "chat_template")
self.assertEqual(obj[key], value)
self.assertEqual(getattr(processor, key, None), value)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops, no, the test fails because the processor_dict for llava has a chat_template key, and we use it in other tests for init and save the processor for ex. This test is same as the general one, with the exception that chat templates cannot pass self.assertEqual(obj[key], value) check

So we just want to test all other processor kwargs except chat template, which is tested separately. By other kwargs, I mean the ones which will be added soon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with the exception that chat templates cannot pass self.assertEqual(obj[key], value) check

I see, I missed what was happening originally in the test. Isn't self.prepare_processor_dict().items() a bit redundant, as we force self.prepare_processor_dict() to only have one key, which is "chat_template" and so all of this logic is skipped?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, same way as almost all processors skip this test. For VLMs this test will become available when we enforce new processing logic for input expansion with image tokens. Until then, we can override it to prevent failing tests, instead of unitest.skip

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmmm, I don't think overriding to make it look like tests are passing is a great idea. Skipping is far better as it's easier to spot and track.

Part of the issue here is that this new behaviour still isn't being tested then, as we want to make sure that chat_template isn't in the processor_dict when saving out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we can add one more assert in test_chat_template_is_saved to check what is the content of processor_dict

Oke, I'll skip it then with a comment explaining why and that we need to stop skipping it at some point

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect :)

zucchini-nlp and others added 3 commits September 16, 2024 13:36
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@andimarafioti andimarafioti mentioned this pull request Sep 17, 2024
5 tasks
@amyeroberts amyeroberts self-requested a review September 17, 2024 10:53
Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating!

@zucchini-nlp zucchini-nlp merged commit db72894 into huggingface:main Sep 18, 2024
20 checks passed
BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024
)

* fix

* add tests

* fix tests

* Update tests/models/llava/test_processor_llava.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* fix

* fix tests

* update tests

---------

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
@tungnt55
Copy link

Would you consider making the same changes for ImageProcessingMixin class as well? Here, to be exact:

def save_pretrained(self, save_directory: Union[str, os.PathLike], push_to_hub: bool = False, **kwargs):

Qwen-VL models use processors inheriting this class and it doesn't save the chat template into chat_template.json.

@zucchini-nlp
Copy link
Member Author

ImageProcessingMixin should not have a chat template, instead they should be serialized together with a processing class. In that case, it's better to ping the model authors so they can correct the config files

@tungnt55
Copy link

ImageProcessingMixin should not have a chat template, instead they should be serialized together with a processing class. In that case, it's better to ping the model authors so they can correct the config files

Thanks for the clarification, I'll follow with Qwen authors for the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chat_template.json is not saved when using LlavaProcessor.save_pretrained()

4 participants