-
Notifications
You must be signed in to change notification settings - Fork 31.6k
Description
🚀 Feature request
Make tokenizers created with the tokenizers library compatible with the Inference API or Auto classes
Motivation
I have trained a model on a specific domain by modeling a sequence generation problem as a language modeling problem to predict the next token in the set. The tokenizer associated with the model I used (TransformerXL) was not compatible with my domain since my tokens contained whitespace so I created my own using the WordLevelTrainer class in the tokenizers library. Now that I have a complete working solution I would like to use this tokenizer and model in the huggingface Inference API, however it does not work because it requires the tokenizer associated with the model. Making the transformers models compatible with tokenizers library could make all kinds of use cases outside of NLP possible with these libraries.
Your contribution
Is it possible to hack the saved config for a tokenizer created through the tokenizers library to work directly with the Auto classes? If so I can document this approach for other users.