Skip to content

tokenizer.Tokenizer compatibility with Inference API or Auto* classes #10340

@gstranger

Description

@gstranger

🚀 Feature request

Make tokenizers created with the tokenizers library compatible with the Inference API or Auto classes

Motivation

I have trained a model on a specific domain by modeling a sequence generation problem as a language modeling problem to predict the next token in the set. The tokenizer associated with the model I used (TransformerXL) was not compatible with my domain since my tokens contained whitespace so I created my own using the WordLevelTrainer class in the tokenizers library. Now that I have a complete working solution I would like to use this tokenizer and model in the huggingface Inference API, however it does not work because it requires the tokenizer associated with the model. Making the transformers models compatible with tokenizers library could make all kinds of use cases outside of NLP possible with these libraries.

Your contribution

Is it possible to hack the saved config for a tokenizer created through the tokenizers library to work directly with the Auto classes? If so I can document this approach for other users.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions