blingfire pypi package v0.1.3
·
172 commits
to master
since this release
Four tokenization algorithms supported: patterns, word-piece, unigram lm, bpe. Added space normalization api, Added a few more popular models, added unigram lm tokenization models trained on uniformly represented ~84 languages from wikimatrix set. Bug fixes, parity fixes.