Skip to content

Fix saving multiple tokenizers for custom processors#42630

Merged
yonigozlan merged 1 commit intohuggingface:mainfrom
yonigozlan:fix-saving-multiple-tokenizers-for-custom-processors
Dec 5, 2025
Merged

Fix saving multiple tokenizers for custom processors#42630
yonigozlan merged 1 commit intohuggingface:mainfrom
yonigozlan:fix-saving-multiple-tokenizers-for-custom-processors

Conversation

@yonigozlan
Copy link
Member

Fixes this issue #41816
Instead of harcoding a set of tokenizer attribute that shouldn't be saved as part of preprocessor_config, this remove all attributes that contain "tokenizer" from preprocessor_config, which allows for more flexibility when creating a custom processor with multiple tokenizers.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@zucchini-nlp zucchini-nlp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! BTW, is this is only for tokenizers and supporting several image_processors/etc is not yet fully implemented?

@yonigozlan
Copy link
Member Author

Nice! BTW, is this is only for tokenizers and supporting several image_processors/etc is not yet fully implemented?

Yes this is just for tokenizers! I'm working on another PR for image_processors and others, but I think it's best to isolate the two PRs as it might cause some issues

@yonigozlan yonigozlan merged commit e5aad21 into huggingface:main Dec 5, 2025
23 checks passed
sarathc-cerebras pushed a commit to sarathc-cerebras/transformers that referenced this pull request Dec 7, 2025
fix saving multiple tokenizers for custom
processors
leaderofARS pushed a commit to leaderofARS/transformers that referenced this pull request Dec 9, 2025
fix saving multiple tokenizers for custom
processors
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
fix saving multiple tokenizers for custom
processors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants