Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow per-version configurations #14344

Merged
merged 3 commits into from
Nov 15, 2021
Merged

Conversation

LysandreJik
Copy link
Member

Similarly to #12713, this allows per-version configurations. This is necessary for LayoutXLM, which up to now was using the configuration-defined XLMRobertaTokenizer, but which should now use the LayoutXLMTokenizer.

Updating the configuration would mean breaking all previous versions of transformers that were using LayoutXLM. Not updating this parameter means that LayoutXLM will never benefit from LayoutXLMTokenizer through the AutoTokenizer API.

Resolves #14275

This implements similar tests to the tokenizer, but instead of using bert-base-cased, it uses the actual model that is at issue (microsoft/layoutxlm-base). This model should continue using the XLMRobertaTokenizer until a new minor version is released, as the configuration I uploaded is named config.4.13.0.json: https://huggingface.co/microsoft/layoutxlm-base/blob/main/config.4.13.0.json

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition and very nice tests!
At some point we will need to brainstorm if there is a better way to deal with those situations, but since it won't be for longer term, having this fix in the meantime is important.

tests/test_configuration_common.py Outdated Show resolved Hide resolved
tests/test_configuration_common.py Outdated Show resolved Hide resolved
LysandreJik and others added 2 commits November 15, 2021 16:37
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
@LysandreJik LysandreJik merged commit 1cc453d into master Nov 15, 2021
@LysandreJik LysandreJik deleted the per-version-configuration branch November 15, 2021 21:38
Albertobegue pushed a commit to Albertobegue/transformers that referenced this pull request Jan 27, 2022
* Allow per-version configurations

* Update tests/test_configuration_common.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

* Update tests/test_configuration_common.py

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>

Co-authored-by: Sylvain Gugger <35901082+sgugger@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LayoutXLM tokenizer issues after last update
2 participants