Korean morphological analyzer (한국어 형태소 분석기)

PyTorch implementation for Korean morphological analyzer

Dependency

PyTorch >= 1.1
torchtext
Check the requirements.txt

pip install -r requirements.txt

Getting Started

Step 1: Prepare the data

The sample data can be found data/ directory. The data consists of eojoel and pairs of morphmeme and POS tag.

Step 2: Train the model

python train.py

This will load a config file (config/kma.yaml) and run the model defined by the config file, which consists of a 3-layer LSTM with 100 hidden units on the bidirectional encoder and a Pointer-generator network and a CRF tagger. The detailed parameters can be found config/ directory.

Step 3: Tagging

python tagging.py --input_file text_file --output output_file

We have a model which you can use to tag on new data. It reads sentences line by line and executes the tagging. The tagged outputs are saved into output_file.

Pretrained model

Pretrained models can be downloaded download

Citation

@inproceedings{song-park-2019-korean,
  title="{K}orean Morphological Analysis with Tied Sequence-to-Sequence Multi-Task Model",
  author="Song, Hyun-Je and Park, Seong-Bae",
  booktitle="Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
  pages="1436--1441",
  year="2019"
}

Acknowledgement

The implementation is highly inspired from IBM's seq2seq and OpenNMT-py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Korean morphological analyzer (한국어 형태소 분석기)

Dependency

Getting Started

Step 1: Prepare the data

Step 2: Train the model

Step 3: Tagging

Pretrained model

Citation

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
data		data
kma		kma
model		model
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
tagging.py		tagging.py
train.py		train.py

License

songhyunje/kma

Folders and files

Latest commit

History

Repository files navigation

Korean morphological analyzer (한국어 형태소 분석기)

Dependency

Getting Started

Step 1: Prepare the data

Step 2: Train the model

Step 3: Tagging

Pretrained model

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages