Skip to content

songhyunje/kma

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Korean morphological analyzer (한국어 형태소 분석기)

PyTorch implementation for Korean morphological analyzer

Dependency

  • PyTorch >= 1.1
  • torchtext
  • Check the requirements.txt
pip install -r requirements.txt

Getting Started

Step 1: Prepare the data

The sample data can be found data/ directory. The data consists of eojoel and pairs of morphmeme and POS tag.

Step 2: Train the model

python train.py 

This will load a config file (config/kma.yaml) and run the model defined by the config file, which consists of a 3-layer LSTM with 100 hidden units on the bidirectional encoder and a Pointer-generator network and a CRF tagger. The detailed parameters can be found config/ directory.

Step 3: Tagging

python tagging.py --input_file text_file --output output_file

We have a model which you can use to tag on new data. It reads sentences line by line and executes the tagging. The tagged outputs are saved into output_file.

Pretrained model

  • Pretrained models can be downloaded download

Citation

@inproceedings{song-park-2019-korean,
  title="{K}orean Morphological Analysis with Tied Sequence-to-Sequence Multi-Task Model",
  author="Song, Hyun-Je and Park, Seong-Bae",
  booktitle="Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
  pages="1436--1441",
  year="2019"
}

Acknowledgement

The implementation is highly inspired from IBM's seq2seq and OpenNMT-py.

About

Korean morphological analyzer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages