Skip to content

Conversation

@PhilipMay
Copy link
Contributor

@PhilipMay PhilipMay commented Aug 6, 2020

This adds the option to pass parameters to the tokenizer that is created with the Pooling __init__ constructor. This way it is possible to use fast tokenizers or change the tokenizer to upper case for example.

This option was implemented on other places:

https://github.com/UKPLab/sentence-transformers/blob/c6f8c542378dc979a55355bd562c8f1401e62f14/sentence_transformers/models/BERT.py#L14-L28

... but somehow forgotten here.

@nreimers
Copy link
Collaborator

nreimers commented Aug 6, 2020

Hi,
I remember that adding tokenizer_args did not work for the Transformers model / the AutoTokenizer.from_pretrained() model when I added the class. This was with Huggingface version 2.x, it was throwing some errors when you tried to pass tokenizer_args to AutoTokenizer.

If will check if this now works with Huggingface Transformers version 3 and then add it to the repository.

Thanks for raising this point (and I hope it works with the latest transformers lib).

@PhilipMay
Copy link
Contributor Author

@nreimers When I add a test that proves that it works - would you consider merging it?

@nreimers nreimers merged commit 8823bdc into huggingface:master Aug 6, 2020
@nreimers
Copy link
Collaborator

nreimers commented Aug 6, 2020

Just tested it, it works with huggingface transformers 3.

Thanks for this PR.

@PhilipMay
Copy link
Contributor Author

Awsome! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants