Skip to content

himselfv/jptools

master
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

This set contains several command-line tools intended to help those studying kanji and Japanese language, and some libraries/units for Delphi to parse/write file formats commonly encountered when working with Japanese text.

AnkiList

These tools generate tab-separated files suitable for importing into Anki or updating some fields of your Anki deck.

  • AnkiKanjiList - converts raw kanji list to tab-separated list with ons/kuns/meanings (uses KANJIDIC compatible dictionary).

  • AnkiWordList - converts word/expression list to tab-separated list of words and translations (uses EDICT/CEDICT compatible dictionary)

  • AnkiExampleList - converts word/expression list to tab-separated list of words and example sentences (uses Tanaka corpus compatible corpus)

Warodai Convertor

Converts Warodai (big classical JP->RU dictionary) into EDICT2 and JMDict-style dictionaries which you can use in any program.

The translation is far from perfect yet but it does work in a way, and the resulting dictionary is usable with more than 100 000 entries.

How to use:

Or download converted dictionary:

Miscellaneous

These tools may be usable by itself or as an example when working with their underlying libraries.

  • KanjiStats: list kanji by frequency in a given text.

  • kanjistats_4Gb: kanji sorted by frequency, as they appeared in 21000 of books in Japanese

  • KanjiList: manipulate kanji lists (trim/merge/intersect/etc)

  • AozoraTxt: strips Aozora-Ruby from the text or gives some statistical info about it.

  • MiscTxt: gives some common statistical info about a text (# of kana, kanji, char and line count)

  • YarxiKanjiInfo: uses Yarxi database parser to extract kanji information.

Libraries

Libraries in Delphi for common CJK-related tasks.

  • JWBIO - fast stream reader/writer with encoding detection and a bunch of encodings out of the box, including JIS/Shift-JIS, GB, UTF16/8 and other common japanese ones.
  • KanjidicReader: KANJIDIC style dictionary parser + basic in-memory representation ("load and use")
  • EdictReader: EDICT/CCEDICT dictionary format parser (very forgiving to deviations in formats) + in-memory representation
  • EdictWriter - programmer friendly EDICT1/EDICT2/JMDICT file generator.
  • AozoraTxt parser: - parses text files in Aozora Bunko format
  • UnihanReader - simple Unihan database parser.
  • KanaConv - romaji-katakana-hiragana conversions, supports common and custom romaji schemes, using multiple at once --- not yet moved here from Wakan project.
  • YarxiReader

Downloads

Latest jptools.zip (AnkiKanjiList/WordList, AozoraTxt, KanjiStats and more)

All downloads

Building

May be required for some projects:

  • Wakan
  • SQLite3.pas
  • SQLite3Dataset.pas

At runtime:

  • sqlite3.dll
  • EDICT2
  • kanjidic
  • radkfile
  • ewarodai.txt
  • yarxi.db

About

A set of tools for parsing and studying Japanese

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages