Skip to content
Korean grapheme-to-phone conversion in Python
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE.md Create LICENSE.md Feb 27, 2017
README.md Update README.md Jan 27, 2020
g2p.py Update g2p.py Feb 1, 2019
rulebook.txt Update rulebook.txt Feb 1, 2019
testset.txt no message Feb 22, 2017

README.md

KoG2P

Given an input of a series of Korean graphemes/letters (i.e. Hangul), KoG2P outputs the corresponding pronunciations.

한국어의 문자열로부터 발음열을 생성하는 파이썬 기반 G2P 패키지입니다.
터미널에서 원하는 문자열을 함께 입력해 사용할 수 있습니다.

How to use?

On terminal, you simply can type in your input within quotations:

$ python g2p.py '박물관'

Then you'll get /방물관/ symbolized as follows:

p0 aa ng mm uu ll k0 wa nf

NB. Your input does not necessarily need to be a lemma or a legitimate sequence of Korean; the system will provide an output based on the phonological rules of Korean for any sequences in Hangul.

Requirement

  • Python 2.7 or 3.x

Symbol table

Please check out the symbol table below for the mapping.

C/V Position Symbols in Hangul Symbols in KoG2P
consonant onset p0
consonant onset ph
consonant onset pp
consonant onset t0
consonant onset th
consonant onset tt
consonant onset k0
consonant onset kh
consonant onset kk
consonant onset s0
consonant onset ss
consonant onset h0
consonant onset c0
consonant onset ch
consonant onset cc
consonant onset mm
consonant onset nn
consonant onset rr
consonant coda pf
consonant coda ph
consonant coda tf
consonant coda th
consonant coda kf
consonant coda kh
consonant coda kk
consonant coda s0
consonant coda ss
consonant coda h0
consonant coda c0
consonant coda ch
consonant coda mf
consonant coda nf
consonant coda ng
consonant coda ll
consonant coda ㄱㅅ ks
consonant coda ㄴㅈ nc
consonant coda ㄴㅎ nh
consonant coda ㄹㄱ lk
consonant coda ㄹㅁ lm
consonant coda ㄹㅂ lb
consonant coda ㄹㅅ ls
consonant coda ㄹㅌ lt
consonant coda ㄹㅍ lp
consonant coda ㄹㅎ lh
consonant coda ㅂㅅ ps
vowel monophthong ii
vowel monophthong ee
vowel monophthong qq
vowel monophthong aa
vowel monophthong xx
vowel monophthong vv
vowel monophthong uu
vowel monophthong oo
vowel diphthong ye
vowel diphthong yq
vowel diphthong ya
vowel diphthong yv
vowel diphthong yu
vowel diphthong yo
vowel diphthong wi
vowel diphthong wo
vowel diphthong wq
vowel diphthong we
vowel diphthong wa
vowel diphthong wv
vowel diphthong xi

NB. IPA symbols for Korean phones can be found in the following page: IPA for Korean.

Reference

Please cite the following if using this code:

@misc{cho2017kog2p,
  title = {Korean Grapheme-to-Phoneme Analyzer (KoG2P)},
  author = {Yejin Cho},
  year = {2017},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/scarletcho/KoG2P}}
}

Thank you for your citations!

  • Yoon Seok Hong, Kyung Seo Ki, and Gahgene Gweon. 2018. Automatic Miscue Detection Using RNN Based Models with Data Augmentation. In Proc. Interspeech 2018. 1646-1650. [pdf]

  • Younggun Lee and Taesu Kim. 2018. Learning pronunciation from a foreign language in speech synthesis network. arXiv preprint. arXiv:1811.09364. [pdf]

You can’t perform that action at this time.