This set contains several command-line tools intended to help those studying kanji and Japanese language, and some libraries/units for Delphi to parse/write file formats commonly encountered when working with Japanese text.
These tools generate tab-separated files suitable for importing into Anki or updating some fields of your Anki deck.
AnkiKanjiList - converts raw kanji list to tab-separated list with ons/kuns/meanings (uses KANJIDIC compatible dictionary).
AnkiWordList - converts word/expression list to tab-separated list of words and translations (uses EDICT/CEDICT compatible dictionary)
AnkiExampleList - converts word/expression list to tab-separated list of words and example sentences (uses Tanaka corpus compatible corpus)
Converts Warodai (big classical JP->RU dictionary) into EDICT2 and JMDict-style dictionaries which you can use in any program.
The translation is far from perfect yet but it does work in a way, and the resulting dictionary is usable with more than 100 000 entries.
How to use:
- Download warodai in a TXT format
- Download WarodaiConv
- Run WarodaiConv
Or download converted dictionary:
EDICT2 format (recommended)
These tools may be usable by itself or as an example when working with their underlying libraries.
KanjiStats: list kanji by frequency in a given text.
kanjistats_4Gb: kanji sorted by frequency, as they appeared in 21000 of books in Japanese
KanjiList: manipulate kanji lists (trim/merge/intersect/etc)
AozoraTxt: strips Aozora-Ruby from the text or gives some statistical info about it.
MiscTxt: gives some common statistical info about a text (# of kana, kanji, char and line count)
YarxiKanjiInfo: uses Yarxi database parser to extract kanji information.
Libraries in Delphi for common CJK-related tasks.
- JWBIO - fast stream reader/writer with encoding detection and a bunch of encodings out of the box, including JIS/Shift-JIS, GB, UTF16/8 and other common japanese ones.
- KanjidicReader: KANJIDIC style dictionary parser + basic in-memory representation ("load and use")
- EdictReader: EDICT/CCEDICT dictionary format parser (very forgiving to deviations in formats) + in-memory representation
- EdictWriter - programmer friendly EDICT1/EDICT2/JMDICT file generator.
- AozoraTxt parser: - parses text files in Aozora Bunko format
- UnihanReader - simple Unihan database parser.
- KanaConv - romaji-katakana-hiragana conversions, supports common and custom romaji schemes, using multiple at once --- not yet moved here from Wakan project.
Latest jptools.zip (AnkiKanjiList/WordList, AozoraTxt, KanjiStats and more)
May be required for some projects: