2024.2.0.0
What's Changed
- Add support for left padding in Wordpiece, BPE and tiktoken-based tokenizers
- Enhanced handling of special tokens
- Add support for padding to a particular length
- New option to add or not add special tokens during the tokenization
- Support Punctuation Pretokenizer
- Enchanse tokenizer postprocessing parser for better model coverage
- Add StringToHashBucketFast Tensorflow Translator
- Optimize EqualStr and VocabEncoder Operations
- Add Benchmarking Script
Full Changelog: 2024.1.0.2...2024.2.0.0