Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Special tokens not removed by ByteLevelBPETokenizer decoder #186
The following code snippet doesn't behave as I would have expected (compare to eg BertWordPieceTokenizer):
I would have expected all the special tokens to be stripped by the decoder?
Ah, my bad - I need to add these manually:
Are there any plans to provide some kind of wrapper around common parameters like this, as