codepoint based strings
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
encoding
tests/encoding
LICENSE
README.md
core.lisp
main.lisp
oji.asd

README.md

Oji - Character boundary recognizer

My uncle is letters. Literally. --- ENJOE Toh, "This is a pen"

  • English word "uncle" is said "oji" in Japanese, and also "letter" is said "moji".

Motivation

https://gist.github.com/t-sin/b46c9171d7d184687812f7bc03f96050

Usage

CL-USER> (setf moji (oji:load-bytes (babel:string-to-octets
                                      "これはペンです"
                                      :encoding :utf-8)
                                    :utf-8))
#S(MOJIRETSU :ENCODING :UTF-8 :BYTES (...) :MOJIS (...))
CL-USER> (oji:encoding moji)
:utf8
CL-USER> (oji:read-char moji)
#\HIRAGANA_LETTER_KO  ;; it's not #\LATIN_SMALL_LETTER_A_WITH_TILDE
CL-USER> (oji:boundary moji)
((0 . 2) (3 . 5) (6 . 8) (9 . 11) ...)

Installation

Author

Copyright

Copyright (c) 2017 Shinichi TANAKA (shinichi.tanaka@gmail.com)

License

Licensed under the Lisp GNU Lesser General Public License.