Skip to content

2024.2.0.0

Compare
Choose a tag to compare
@apaniukov apaniukov released this 17 Jun 13:19
· 2 commits to releases/2024/2 since this release
c615ec5

What's Changed

  • Add support for left padding in Wordpiece, BPE and tiktoken-based tokenizers
  • Enhanced handling of special tokens
  • Add support for padding to a particular length
  • New option to add or not add special tokens during the tokenization
  • Support Punctuation Pretokenizer
  • Enchanse tokenizer postprocessing parser for better model coverage
  • Add StringToHashBucketFast Tensorflow Translator
  • Optimize EqualStr and VocabEncoder Operations
  • Add Benchmarking Script

Full Changelog: 2024.1.0.2...2024.2.0.0