Annotated Texts to Plain Texts
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci
cabocha_test
cmd/at2pt
knp_test
.gitignore
LICENSE
README.md
cabocha.go
knp.go
knp_test.go
mode.go
test_input.txt

README.md

at2pt: Annotated Texts to Plain Texts

CircleCI codecov.io GoDoc GPLv3

What's this

  • This tool converts texts annotated by NLP tools to plain texts
  • For example, you can make word2vec models using the tokenized texts
    • Use option -m 1 or -m 2

Usage

Usage of ./at2pt:
  -h, --help    Show this help message
  -i, --input=  Input file name. - or no designation means STDIN (-)
  -o, --output= Output file name. - or no designation means STDOUT (-)
  -m, --mode=   Mode {0:PLAIN, 1:TOKENIZED, 2:TOKENIZEDwPRED} (0)
  -s, --style=  Input file style {KNP, MeCab, CaboCha} (KNP)

More options can be seen with at2pt -h

INSTALL

Download from github release page or go get github.com/shirayu/at2pt/cmd/at2pt

Requirements

  • You should use UTF-8 texts for input

Licence

  • GNU General Public License version 3
  • (C) Yuta Hayashibe 2014