"convert" returns result as dictionary. There are keys: 'orig', 'kana', 'hira', 'hepburn', 'kunrei', 'passport'
Example:
kks = pykakasi.kakasi()
text = 'かな漢字'
result = kks.convert(text)
for item in result:
print("{}: kana '{}', hiragana '{}', romaji: '{}'".format(item['orig'], item['kana'], item['hira'], item['hepburn']))
かな: kana 'カナ', hiragana: 'かな', romaji: 'kana'
漢字: kana 'カンジ', hiragana: 'かんじ', romaji: 'kanji'
Warning
The OLD v1.2 API, wakati class, and setMode(), getConverter() and do() functions, will be deprecated when v3.0 released. Please consider to use convert() method.
These switch alphabets are derived from original Kakasi. Now it support following options:
Option | Description | Values | Note |
---|---|---|---|
K | Katakana conversion | a,H,None | roman, Hiragana or non conversion |
H | Hiragana conversion | a,K,None | roman, Katakana or non conversion |
J | Kanji conversion | a,H,K,None | roman or Hiragana, Katakana or noconv |
a | Roman conversion | E,None | JIS ROMAN or non conversion |
E | JIS ROMAN conversion | a,None | ascii roman or non conversion |
Each character means character sets as follows:
Character Sets a: ascii j: jisroman g: graphic k: kana (j,k defined in jisx0201) E: kigou K: katakana H: hiragana J: kanji (E,K,H,J defined in jisx0208)
How to Install:
pip install pykakasi
Building library, setup script build dictionary db file and generate pickled db files. Without dictionary files, a library fails to run.
Sample source code:
from pykakasi import kakasi,wakati text = u"かな漢字交じり文" kakasi = kakasi() kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion kakasi.setMode("K","a") # Katakana to ascii, default: no conversion kakasi.setMode("J","a") # Japanese to ascii, default: no conversion kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table kakasi.setMode("s", True) # add space, default: no separator kakasi.setMode("C", True) # capitalize, default: no capitalize conv = kakasi.getConverter() result = conv.do(text) print(result) wakati = wakati() conv = wakati.getConverter() result = conv.do(text) print(result)
You can use output Mode values from "H", "K", "a" which is each means "Hiragana", "Katakana" and "Alphabet". For input, you can use "J" that means "Japanese" that is mixture of Kanji, Katakana and Hiragana. Also there is values of "H", "K" that means "Hiragana", and "Katakana". You can use "Hepburn" , "Kunrei" or "Passport" as mode "r", Roman table switch. Also "s" used for separator switch, "C" for capitalize switch. "S" for separator storing option.
Transliterate Japanese text to rōmaji:
>>> import pykakasi
>>>
>>> text = u"かな漢字交じり文"
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("H","a") # Hiragana to ascii, default: no conversion
>>> kakasi.setMode("K","a") # Katakana to ascii, default: no conversion
>>> kakasi.setMode("J","a") # Japanese to ascii, default: no conversion
>>> kakasi.setMode("r","Hepburn") # default: use Hepburn Roman table
>>> kakasi.setMode("s", True) # add space, default: no separator
>>> kakasi.setMode("C", True) # capitalize, default: no capitalize
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
kana Kanji Majiri Bun
Tokenize Japanese text (split by word boundaries), equivalent to kakasi
's wakati gaki option:
>>> wakati = pykakasi.wakati()
>>> conv = wakati.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな 漢字 交じり 文
Add furigana (pronounciation aid) in rōmaji to text:
>>> kakasi = pykakasi.kakasi()
>>> kakasi.setMode("J","aF") # Japanese to furigana
>>> kakasi.setMode("H","aF") # Japanese to furigana
>>> conv = kakasi.getConverter()
>>> result = conv.do(text)
>>> print(result)
かな[kana] 漢字[Kanji] 交じり[Majiri] 文[Bun]
Input mode values: "J" (Japanese: kanji, hiragana and katakana), "H" (hiragana), "K" (katakana).
Output mode values: "H" (hiragana), "K" (katakana), "a" (alphabet / rōmaji), "aF" (furigana in rōmaji).
There are other setMode
switches which control output: