Skip to content

A Clojure library for deconstructing Korean unicode syllable characters into alphabet characters

License

Notifications You must be signed in to change notification settings

sooheon/hangul-utils

Repository files navigation

hangul-utils

A Clojure library for manipulating Korean characters and alphabets.

The project is hosted on Clojars:

Clojars Project

Usage

Hangul alphabet letters called jamo (자모) are grouped into syllabic blocks, forming a single unicode char. So when dealing with Korean text at the alphabet level, some way of deconstructing the syllabic blocks into a list of jamo, and then reconstructing them, is useful.

This library represents a deconstructed Korean syllable as a vector of letters (or jamo).

(deconstruct \안)
;; => [\ㅇ \ㅏ \ㄴ]

(deconstruct-str "안녕하세요! ㅎㅎ")
;; => [[\ㅇ \ㅏ \ㄴ] [\ㄴ \ㅕ \ㅇ] [\ㅎ \ㅏ] [\ㅅ \ㅔ] [\ㅇ \ㅛ] [\!] [\space] [\ㅎ] [\ㅎ]]

(construct [\ㅎ \ㅏ])
;; => \하

(construct-str [[\ㅋ \ㅡ \ㄹ] [\ㄹ \ㅗ] [\ㅈ \ㅕ] [\space] [\ㅈ \ㅐ] [\ㅁ \ㅣ \ㅆ] [\ㄴ \ㅔ] [\ㅇ \ㅛ]])
;; => "클로져 재밌네요"

You can also transform strings end-to-end. Since it is common to encounter Korean which is full of non-standard spellings and text-emoticons, the library is lenient about unrecognized syllables and preserves text such as 'ㅎㅎ *^.^*' in round trip transformations.

(alphabetize "오늘부터..!")
;; => "ㅇㅗㄴㅡㄹㅂㅜㅌㅓ..!"

(syllabize "ㄹㅣㅊㅣㅎㅣㅋㅣㄴㅣㅁ ㄱㅗㅁㅏㅂㅅㅡㅂㄴㅣㄷㅏ")
;; => "리치히키님 고맙습니다"

(alphabetize "아ㅏㅏㅏㅏㅏ이고")
;; => "ㅇㅏㅏㅏㅏㅏㅏㅇㅣㄱㅗ"

Be aware that alphabetize is just a simple helper function to avoid having to call (apply str ...) too often. Since the jamo are represented in plain Clojure vectors, you can always do operations such as (flatten (deconstruct-str "가나다라마")) to get '(\ㄱ \ㅏ \ㄴ \ㅏ \ㄷ \ㅏ \ㄹ \ㅏ \ㅁ \ㅏ).

Deploy

https://github.com/juxt/pack.alpha#skinny-jar

Thanks

Thanks to kaniblu for the Python hangul-utils library, which inspired this.

License

Copyright © 2017 Sooheon Kim

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.

About

A Clojure library for deconstructing Korean unicode syllable characters into alphabet characters

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published