Automatic Korean word spacing with R
Switch branches/tags
Nothing to show
Clone or download
haven-jeon Merge pull request #4 from mrchypark/master
change .onAttech to set_env
Latest commit 8e3d191 Oct 15, 2018


R package for automatic Korean word spacing.

Python verson can be found here.

License: GPL v3


Word spacing is one of the important parts of the preprocessing of Korean text analysis. Accurate spacing greatly affects the accuracy of subsequent text analysis. KoSpacing has fairly accurate automatic word spacing performance, especially good for online text originated from SNS or SMS.

For example.

"아버지가방에들어가신다." can be spaced both of below.

  1. "아버지가 방에 들어가신다." means "My father enters the room."
  2. "아버지 가방에 들어가신다." means "My father goes into the bag."

Common sense, the first is the right answer.

KoSpacing is based on Deep Learning model trained from large corpus(more than 100 million NEWS articles from Chan-Yub Park).


Test Set Accuracy
Sejong(colloquial style) Corpus(1M) 97.1%
OOOO(literary style) Corpus(3M) 94.3%
  • Accuracy = # correctly spaced characters/# characters in the test data.
    • Might be increased performance if normalize compound words.


You need to install conda binary from Please install Python 3.6 version or later.

To install from GitHub, use



[1] "김형호 영화시장 분석가는 '1987'의 네이버 영화 정보 네티즌 10점 평에서 언급된 단어들을 지난해 12월 27일부터 올해 1월 10일까지 통계 프로그램 R과 KoNLP 패키지로 텍스트마이닝하여 분석했다."

Model Architecture


author = {Heewon Jeon},
title = {KoSpacing: Automatic Korean word spacing},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{}}