# Korean Learner Parts of Speech Tagger
When beginning to learn the Korean language, it is sometimes hard to tell which parts of words are grammatical particles and which parts are substantive root words you can look up in the dictionary.

This project aims to make it easier for Korean language learners to decode written Korean sentences, by marking up particles and other grammatical parts. It is not a translation tool as such, but rather a project to help with sentence decoding.

In [143]:
from konlpy.tag import Okt
from konlpy.utils import pprint
from pandas import DataFrame

In [144]:
# Use 'Open Korean Text'
okt = Okt()

In [146]:
# text = '김장 행사에 참여한 아동·청소년 봉사자들은 자신들이 먹을 김치를 직접 만들면서 한편으로 다른 소외계층을 위한 나눔 활동을 한다는 사실에 뿌듯함을 느꼈다.'
text = '나는 케이크를 먹고있다'

In [148]:
# We don't want to normalise as we want to mark up the original string with information
pos = okt.pos(text, norm=False, stem=False)
pprint(pos)

[('나', 'Noun'), ('는', 'Josa'), ('케이크', 'Noun'), ('를', 'Josa'), ('먹고있다', 'Verb')]


In [149]:
words_df = DataFrame(pos, columns=['Korean', 'POS'])
words_df.drop(words_df[words_df.POS == "Punctuation"].index, inplace=True)
words_df.POS.replace({'Josa': 'Particle'}, inplace=True)
words_df

Unnamed: 0,Korean,POS
0,나,Noun
1,는,Particle
2,케이크,Noun
3,를,Particle
4,먹고있다,Verb


In [152]:
# This suppresses the following warning, but take note it will also suppress all other warnings as well.
# 'UserWarning: 샘플키로 요청합니다' means you need to get a proper developer token from developers.naver.com but it will still work with sample key.
import warnings
warnings.filterwarnings('ignore')

from naipy import sync
from naipy.model import N2mtNaipy

naipy = sync.Translation()

# Get translations for all tokens
# words_df['English'] = words_df['Korean'].apply(naipy.translation, args=('en',)).apply(getattr, args=('translatedText',))

# Only get translations for non-particle tokens since Naver translation for particles is useless
words_df['English'] = words_df.apply(lambda x: N2mtNaipy('') if x.POS == 'Particle' else naipy.translation(x.Korean, 'en'), axis=1).apply(getattr, args=('translatedText',)).apply(lambda y: '' if y == None else y)

words_df

Unnamed: 0,Korean,POS,English
0,나,Noun,I
1,는,Particle,
2,케이크,Noun,Cake
3,를,Particle,
4,먹고있다,Verb,I'm eating.


Now we have the basic information we need we want to use this mark up a visualisation of the original sentence.

### Acknowledgements
This project builds on a blog post by Niamh Kinglsey that I came across while researching my idea. They have done a good job of starting on the journey I wanted to go on to.
https://towardsdatascience.com/how-i-used-python-code-to-improve-my-korean-2f3ae09a9773

Eunjeong L. Park, Sungzoon Cho. “[KoNLPy: Korean natural language processing in Python](http://dmlab.snu.ac.kr/~lucypark/docs/2014-10-10-hclt.pdf)”, Proceedings of the 26th Annual Conference on Human & Cognitive Language Technology, Chuncheon, Korea, Oct 2014.