Just a small util tool to convert the cedict_ts.u8 into a JSON or CSV file. Additionals features are:
- Add pinyin with accent based on these rules
- Add HSK level character based fetched on mandarinbean. The HSK7-9 level is parsed from a different website by wohok
- Add zhuyin support based on this conversion rules link
- Add wade-giles support based on this conversion rules link
Clone this project and run one of the cargo command below. If needed I could provided the generate json & csv file.
cargo run -- generate -e ../cedict_ts.u8 -o ../cedict.json -f json
cargo run -- generate -e ../cedict_ts.u8 -o ../cedict.csv -f csv
A small crates is available which provided a list of utility method to interact with the cedict and doing some pinyin conversion. Below is how you can use the crate to load the cedict
use dodo_zh;
use dodo_zh::KeyVariant;
fn main() {
// The KeyVariant can either be Traditional or Simplified chinese
let cedict = dodo_zh::load_cedict_dictionary(path, KeyVariant::Traditional).unwrap();
let wo = cedict.items.get("我").unwrap();
// will return an Item struct
println!(wo.translations);
}
A set of example exist which can helps you to see how to do some pinyin manipulation. Namely convert the pinyin with tone number to a pinyin with tone marker etc...
You can run the example with the following command
cargo run --example pinyin