Skip to content

Release v3.0.1

Compare
Choose a tag to compare
@himkt himkt released this 27 Sep 17:13
145f4e0
  • #41 Add whitespace tokenizer
In [1]: from tiny_tokenizer import WordTokenizer
In [2]: tk = WordTokenizer("whitespace")
In [3]: tk.tokenize("γ‚γŸγ— は 猫")
Out[3]: [私, は, 猫]