Skip to content
v2.3.1: Alpha support for Nepali, updated Armenian and Japanese language data and bug fixes
Choose a tag to compare
@adrianeboyd adrianeboyd released this
· 4066 commits to master since this release

New features and improvements

  • NEW: Add alpha support for Nepali.
  • Refactor Japanese tokenizer and include additional custom tokenizer features.
  • Update Armenian language data.
  • Include spacy git commit in package and model meta for reference.

🔴 Bug fixes

  • Fix issue #5620: Skip vocab in component config overrides.
  • Fix issue #5634: Fix polarity of Token.is_oov and Lexeme.is_oov.
  • Fix issue #5643: Add strings and ENT_KB_ID to Doc serialization.
  • Fix issue #5648: Disregard special tag _SP in check for new tag map.
  • Fix issue #5658 : Move lemmatizer is_base_form to language settings.

👥 Contributors

Thanks to @myavrum, @mahnerak, @rameshhpathak, @hiroshi-matsuda-rit, @PluieElectrique, @hertelm and @alvaroabascar for the pull requests and contributions.