Skip to content

Python v0.11.0

Compare
Choose a tag to compare
@n1t0 n1t0 released this 24 Dec 09:15

Fixed

  • [#585] Conda version should now work on old CentOS
  • [#844] Fixing interaction between is_pretokenized and trim_offsets.
  • [#851] Doc links

Added

  • [#657]: Add SplitDelimiterBehavior customization to Punctuation constructor
  • [#845]: Documentation for Decoders.

Changed

  • [#850]: Added a feature gate to enable disabling http features
  • [#718]: Fix WordLevel tokenizer determinism during training
  • [#762]: Add a way to specify the unknown token in SentencePieceUnigramTokenizer
  • [#770]: Improved documentation for UnigramTrainer
  • [#780]: Add Tokenizer.from_pretrained to load tokenizers from the Hugging Face Hub
  • [#793]: Saving a pretty JSON file by default when saving a tokenizer