Skip to content
C++ Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.github/workflows Try setting version var Jan 7, 2020
fugashi Fix mecabrc path Jan 10, 2020
.gitignore
LICENSE
MANIFEST.in Add MANIFEST.in Oct 14, 2019
README.md Add pypi badge Dec 29, 2019
fugashi.png Add fugashi illustration Dec 3, 2019
fugashi_util.py Try to fix Windows Jan 7, 2020
requirements.txt Initial commit Oct 14, 2019
setup.cfg Add some Trove classifiers Jan 4, 2020
setup.py

README.md

Current PyPI packages

fugashi

Fugashi by Irasutoya

Fugashi is a Cython wrapper for MeCab.

See the blog post for background on why Fugashi exists and some of the design decisions.

Any reasonable version of MeCab should work, but it's recommended you install from source.

Usage

from fugashi import Tagger

tagger = Tagger('-Owakati')
text = "麩菓子(ふがし)は、麩を主材料とした日本の菓子。"
tagger.parse(text)
# => '麩 菓子 ( ふ が し ) は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
for word in tagger.parseToNodeList(text):
    print(word, word.feature.lemma, word.pos, sep='\t')
    # "feature" is the Unidic feature data as a named tuple

Dictionary Use

Fugashi is written with the assumption you'll use Unidic to process Japanese, but it supports arbitrary dictionaries.

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

from fugashi import GenericTagger
tagger = GenericTagger()

# parse can be used as normal
tagger.parse('something')
# features from the dictionary can be accessed by field numbers
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple.

from fugashi import GenericTagger, create_feature_wrapper
CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
tagger = GenericTagger(wrapper=CustomFeatures)
for word in tagger.parseToNodeList(text):
    print(word.surface, word.feature.alpha)

Alternatives

If you have a problem with Fugashi feel free to open an issue. However, there are some cases where it might be better to use a different library.

  • If you want to use MeCab but don't have a C compiler, use natto-py.
  • If you don't want to deal with installing MeCab at all, try SudachiPy.

Note that these are both slower than Fugashi according to a benchmark I wrote.

You can’t perform that action at this time.