Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
STAtistical Kana Kanji conversion engine; predictive input method; spelling correction; morphological analysis.
C++ Python C
branch: master
Failed to load latest commit information.
python import python version of stakk from nokuno repository
test debug for link duplication problem
trie
.gitignore refactoring
API update documents
Makefile implement unknown character noder in lattice
README update README
README.ja Merge branch 'master' of git@github.com:nokuno/stakk
TODO update
common.h
connection.h
converter.cc update
converter.h
server.h update indent with google c++ style guide
speller.cc refactoring and debug
speller.h
stakk.cc
stakk.h
stakk_server.cc update
stakk_server.h
trie.cc update
trie.h
trie_server.cc
trie_server.h
uconverter.cc add
uconverter.h
utable.h
utf8.h refactoring
util.h
utrie.cc refactoring
utrie.h

README

StaKK: Statistical Kana Kanji conversion engine

StaKK is Japanese Language Processer with following features.

* Current Features
Kana-Kanji converter
Predictive Input or Query Suggestion
Spelling Correction
Morphological Analyzer
HTTP API Server
Raw Trie Operations

Currently, StaKK uses dictionary of Mozc, OSS version of Google Japanese IME.

* Reverse Mode
StaKK implements two mode: normal mode and reverse mode.

Normal Mode: Input Reading or Kana, Output Word or Kanji.
Reverse Mode:   Input Word or Kanji, Output Reading or POS tag etc.

These two methods can be applied for the purpos in the following table.

| Method        | Normal Mode           | Reverse Mode          |
| Convert       | Kana-Kanji Conversion | Morphological Analyze |
| Predict       | Predictive Input      | Query Suggestion      |
| Spell Correct | Correct with Reading  | Correct with Surface  |

* Usage
Common Options:
    command [-r] [-d dictionary] [-c conneciton]
        -d dictionary:  mozc format dictionary file path. default: data/dictinoary.txt
        -c connection:  mozc format connection file path. default: data/connection.txt
        -i id.def:  mozc format class id definition file path. default: data/id.def
                This option is not enable for stakk_test command.
        -r:     "Reverse" option for morphological analyzer.
                If reverse option is set, input word surface instead of reading.

Command Specific Options:
    converter_test [-i id.def] [-o mecab|wakati|yomi]
        Execute Kana-Kanji Conversion or Morphological Analyze on Command Line.
        -i id.def:  Mozc format class id definition file path. default: data/id.def
        -o output:  Output format: one of "mecab", "wakati", "yomi". default: mecab

    stakk_test [-m predict|spell] [-t threshold]
        Execute Prediction or Spelling Correction on Command Line.
        -m mode:        Retrieval mode: one of "predict", "spell". default: spell
        -t threshold:   Spelling correction threhsold of edit distance. default: 1

    stakk_server [-i id.def] [-p port]
        Start HTTP API Server.
        -p port:    Port number of HTTP API server.

HTTP API Server Path:
    General Format
        /apiname/method/query[options]
        
        apiname:    Any name is acceptable. recommend: api
        method:     One of "convert", "predict", "spell".
        query:      Input Query of Japanese String.

    Convert Method
        /apiname/convert/query/format

        format:     One of "wakati", "mecab", "yomi". default: wakati
    
    Predict Method
        /apiname/predict/query/number

        number:     Maximum Display Number of Candidates. default: 50

    Spell Method
        /apiname/predict/query/threshold/number

        threshold:  Threshold of Edit Distance. default: 1
                    DON'T set this larger than 2, which will down the server.
        number:     Maximum Display Number of Candidates. default: 50

    

* Developer Environment
CentOS 5.5
gcc 4.1.2

* Version
v1.0 2010/11/23 first release.

* Examples
# Compilation
$ make

# Kana-Kanji Conversion
$ ./converter_test -o wakati
loading dictionary
loading connection
loading id definition
input query:
わたしのなまえはなかのです。
私 の 名前 は 中野 です 。

# Predictive Input
$ ./stakk_test -m predict
loading dictionary
loading connection
input query:
あり    (Input)
ありがとう      ありがとう
ありがとう      有難う
ありがと        ありがと
ありあんつかさいかいじょうほけん        アリアンツ火災海上保険
ありえってぃ    アリエッティ
etc.

# Spelling Correction
$ ./stakk_test -m spell
loading dictionary
loading connection
input query:
れみおめろん    (Input)
れみおろめん    レミオロメン
あみおだろん    アミオダロン
ぐーgる    (Input)
ぐーぐる        グーグル
ぐーる  グール
etc.

# Morphological Analyzer
$ ./converter_test -r -o mecab
loading dictionary
loading connection
loading id definition
input query:
東京都に住む
東京都  とうきょうと    名詞,固有名詞,地域,一般,*,*,都名        名詞,接尾,地域,*,*,*,*
に      に      助詞,格助詞,一般,*,*,*,に       助詞,格助詞,一般,*,*,*,に
住む    すむ    動詞,自立,*,*,五段動詞,基本形,5 動詞,自立,*,*,五段動詞,基本形,5
EOS

# HTTP API Server
$ ./stakk_server -p 50000 &
loading dictionary
loading connection
loading id definition
server ready
$ curl "http://localhost:50000/api/convert/わたしのなまえはなかのです。"
私 の 名前 は 中野 です 。
$ curl "http://localhost:50000/api/predict/きょうの"
きょうのうんせい        今日の運勢
きょうのてんき  今日の天気
きょうのばんぐみ        今日の番組
きょうのひとこと        今日の一言
etc.
$ curl "http://localhost:50000/api/spell/れみおめろん"
れみおろめん    レミオロメン
$ curl "http://localhost:50000/api/spell/れみおめろん/2"
れみおろめん    レミオロメン
あみおだろん    アミオダロン

# HTTP API Server with Reverse Mode
$ ./stakk_server -rp 50001 &
loading dictionary
loading connection
loading id definition
server ready
$ curl "http://localhost:50001/api/convert/東京都に住む/mecab"
東京都  とうきょうと    名詞,固有名詞,地域,一般,*,*,都名        名詞,接尾,地域,*,*,*,*
に      に      助詞,格助詞,一般,*,*,*,に       助詞,格助詞,一般,*,*,*,に
住む    すむ    動詞,自立,*,*,五段動詞,基本形,5 動詞,自立,*,*,五段動詞,基本形,5
EOS
$ curl "http://localhost:50001/api/spell/テソション"
てんしょん      テンション
ていしょん      テイション
てーしょん      テーション

Something went wrong with that request. Please try again.