This set contains several command-line tools intended to help those studying kanji and Japanese language, and some libraries/units for Delphi to parse/write file formats commonly encountered when working with Japanese text.
AnkiList
These tools generate tab-separated files suitable for importing into Anki or updating some fields of your Anki deck.
-
AnkiKanjiList - converts raw kanji list to tab-separated list with ons/kuns/meanings (uses KANJIDIC compatible dictionary).
-
AnkiWordList - converts word/expression list to tab-separated list of words and translations (uses EDICT/CEDICT compatible dictionary)
-
AnkiExampleList - converts word/expression list to tab-separated list of words and example sentences (uses Tanaka corpus compatible corpus)
Warodai Convertor
Converts Warodai (big classical JP->RU dictionary) into EDICT2 and JMDict-style dictionaries which you can use in any program.
The translation is far from perfect yet but it does work in a way, and the resulting dictionary is usable with more than 100 000 entries.
How to use:
- Download warodai in a TXT format
- Download WarodaiConv
- Run WarodaiConv
Or download converted dictionary:
-
EDICT2 format (recommended)
Miscellaneous
These tools may be usable by itself or as an example when working with their underlying libraries.
-
KanjiStats: list kanji by frequency in a given text.
-
kanjistats_4Gb: kanji sorted by frequency, as they appeared in 21000 of books in Japanese
-
KanjiList: manipulate kanji lists (trim/merge/intersect/etc)
-
AozoraTxt: strips Aozora-Ruby from the text or gives some statistical info about it.
-
MiscTxt: gives some common statistical info about a text (# of kana, kanji, char and line count)
-
YarxiKanjiInfo: uses Yarxi database parser to extract kanji information.
Libraries
Libraries in Delphi for common CJK-related tasks.
- JWBIO - fast stream reader/writer with encoding detection and a bunch of encodings out of the box, including JIS/Shift-JIS, GB, UTF16/8 and other common japanese ones.
- KanjidicReader: KANJIDIC style dictionary parser + basic in-memory representation ("load and use")
- EdictReader: EDICT/CCEDICT dictionary format parser (very forgiving to deviations in formats) + in-memory representation
- EdictWriter - programmer friendly EDICT1/EDICT2/JMDICT file generator.
- AozoraTxt parser: - parses text files in Aozora Bunko format
- UnihanReader - simple Unihan database parser.
- KanaConv - romaji-katakana-hiragana conversions, supports common and custom romaji schemes, using multiple at once --- not yet moved here from Wakan project.
- YarxiReader
Downloads
Latest jptools.zip (AnkiKanjiList/WordList, AozoraTxt, KanjiStats and more)
Building
May be required for some projects:
- Wakan
- SQLite3.pas
- SQLite3Dataset.pas
At runtime:
- sqlite3.dll
- EDICT2
- kanjidic
- radkfile
- ewarodai.txt
- yarxi.db